Claim Missing Document
Check
Articles

Found 1 Documents
Search

IPTSS Intelligent Preprocessing and Multi-Representation Analysis for Social Media Text Summarization with Clustering-Based Enhancement A. Ghanem, Fahd; C. Padma, M.; R. Naji, Wadeea
The Indonesian Journal of Computer Science Vol. 15 No. 1 (2026): The Indonesian Journal of Computer Science
Publisher : AI Society & STMIK Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33022/ijcs.v15i1.5086

Abstract

        Social media platforms generate massive volumes of noisy, informal short texts, creating significant challenges for automatic text summarization. This paper presents IPTSS (Intelligent Preprocessing and Transformation System for Social Media Summarization), a unified framework that integrates intelligent preprocessing, multi-representation text modeling, and clustering-based extractive summarization into a single end-to-end pipeline. IPTSS incorporates a four-stage intelligent preprocessing pipeline for redundancy elimination, platform-noise removal, out-of-vocabulary normalization, and linguistic standardization, a multi-representation analysis layer spanning statistical, distributional, and transformer-based models, and a hybrid TF-IDF–weighted BERT representation that fuses corpus-specific lexical importance with contextual semantic information. Summarization is performed through clustering-based representative selection with redundancy control to ensure topical diversity and coverage. Extensive experiments on large-scale datasets collected from X (formerly Twitter) across the Monkeypox, COVID-19 Vaccine, and Climate Change domains demonstrate that preprocessing alone yields a 25.8% improvement in ROUGE-1, while representation sophistication produces a 38.4% gain from Bag-of-Words to Sentence-BERT. The proposed hybrid representation further improves performance by 7.0% over the best single-representation baseline, achieving the highest scores across all ROUGE metrics. The optimal configuration (Fuzzy C-Means + IPTSS Hybrid) reaches ROUGE-1 = 0.528, outperforming state-of-the-art statistical, graph-based, crisis-specific, neural, and optimization-based methods. Cross-dataset validation confirms strong generalizability, with low performance variance (CV ≈ 2.5%) across heterogeneous domains without dataset-specific tuning. These results demonstrate that effective social media summarization is driven primarily by preprocessing quality and hybrid representation design rather than algorithmic complexity alone, establishing IPTSS as a robust, scalable, and generalizable framework for large-scale social media extractive summarization.