Garuda - Garba Rujukan Digital

G-Tech : Jurnal Teknologi Terapan

Vol 9 No 4 (2025): G-Tech, Vol. 9 No. 4 October 2025

Setiawan, Andri (Unknown)
Abidin, Zainal (Unknown)
Imamudin, Mochamad (Unknown)

Publish Date
30 Oct 2025

Extractive text summarization is a fundamental approach to tackle information overload, yet its quality is highly dependent on the pre-processing stage. Despite its crucial role, there is no consensus on the most optimal pre-processing scenario for the Indonesian language, which has a complex morphological structure. This study aims to fill this research gap by systematically analyzing the impact of seven pre-processing scenarios on four summarization methods: three graph-based methods (LexRank, TextRank, DivRank) and one topic-relevance method (Cosine Similarity against the title). Using a corpus of 3,000 Indonesian news articles and ROUGE evaluation metrics, the results show two key findings. First, the Cosine Similarity method significantly outperforms all graph-based methods, achieving the highest F1-Measure scores on ROUGE-1 (0.5073), ROUGE-2 (0.4018), and ROUGE-L (0.4574), which emphasizes the important role of the title in news texts. Second, a comprehensive pre-processing scenario involving Case Folding, Punctuation Removal, Tokenization, Normalization, Negation Handling, Stopword Removal and Stemming proves to be the most effective in improving the performance of all algorithms. These findings provide empirical evidence and practical recommendations that the combination of a title-relevancy approach with proper text normalization is the most effective strategy for optimizing extractive text summarization for the Indonesian language.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

G-Tech : Jurnal Teknologi Terapan

Website

Abbrev

g-tech

Publisher

Universitas Islam Raden Rahmat Malang

Subject

Computer Science & IT Decision Sciences, Operations Research & Management Energy Engineering

Description

Jurnal G-Tech bertujuan untuk mempublikasikan hasil penelitian asli dan review hasil penelitian tentang teknologi dan terapan pada ruang lingkup keteknikan meliputi teknik mesin, teknik elektro, teknik informatika, sistem informasi, agroteknologi, ...

Article Info

Abstract

Impact of Preprocessing on Indonesian Extractive Summarization Using LexRank, TextRank, DivRank, and Cosine Similarity

Article Info

Abstract