G-Tech : Jurnal Teknologi Terapan
Vol 9 No 4 (2025): G-Tech, Vol. 9 No. 4 October 2025

Impact of Preprocessing on Indonesian Extractive Summarization Using LexRank, TextRank, DivRank, and Cosine Similarity

Setiawan, Andri (Unknown)
Abidin, Zainal (Unknown)
Imamudin, Mochamad (Unknown)



Article Info

Publish Date
30 Oct 2025

Abstract

Extractive text summarization is a fundamental approach to tackle information overload, yet its quality is highly dependent on the pre-processing stage. Despite its crucial role, there is no consensus on the most optimal pre-processing scenario for the Indonesian language, which has a complex morphological structure. This study aims to fill this research gap by systematically analyzing the impact of seven pre-processing scenarios on four summarization methods: three graph-based methods (LexRank, TextRank, DivRank) and one topic-relevance method (Cosine Similarity against the title). Using a corpus of 3,000 Indonesian news articles and ROUGE evaluation metrics, the results show two key findings. First, the Cosine Similarity method significantly outperforms all graph-based methods, achieving the highest F1-Measure scores on ROUGE-1 (0.5073), ROUGE-2 (0.4018), and ROUGE-L (0.4574), which emphasizes the important role of the title in news texts. Second, a comprehensive pre-processing scenario involving Case Folding, Punctuation Removal, Tokenization, Normalization, Negation Handling, Stopword Removal and Stemming proves to be the most effective in improving the performance of all algorithms. These findings provide empirical evidence and practical recommendations that the combination of a title-relevancy approach with proper text normalization is the most effective strategy for optimizing extractive text summarization for the Indonesian language.

Copyrights © 2025






Journal Info

Abbrev

g-tech

Publisher

Subject

Computer Science & IT Decision Sciences, Operations Research & Management Energy Engineering

Description

Jurnal G-Tech bertujuan untuk mempublikasikan hasil penelitian asli dan review hasil penelitian tentang teknologi dan terapan pada ruang lingkup keteknikan meliputi teknik mesin, teknik elektro, teknik informatika, sistem informasi, agroteknologi, ...