The Indonesian Journal of Computer Science
Vol. 14 No. 5 (2025): The Indonesian Journal of Computer Science

Enhancing Arabic Extractive Summarization with TF-IDF-Weighted AraBERT Sentence Embeddings and Semantic Clustering

R. Naji, Wadeea (Unknown)
Suresha (Unknown)
Fahd A. Ghanem (Unknown)



Article Info

Publish Date
12 Oct 2025

Abstract

The increasing amount of textual content across digital platforms, including social media, news and education, has made it difficult for users to extract useful information efficiently. Therefore, Automatic Text Summarization (ATS) becomes an essential tool for distilling large amount of information while maintaining the core idea. Progress in Arabic ATS remains limited due to the scarcity of annotated datasets, the lack of Arabic-specific NLP tools and the high computational cost of LLM. Additionally, traditional methods often fail to capture sentence-level semantics, limiting summary quality. To address this, we propose a scalable, unsupervised framework that uses TF-IDF-weighted AraBERT embeddings to generate rich sentence representations. To further capture document structure, sentences are grouped using k-means clustering. From each cluster, we identify the most representative sentences using centroid similarity and apply Maximal Marginal Relevance (MMR) as a post-processing redundancy to eliminate sentences that are too similar. Experimental evaluation on the EASC dataset demonstrates that our weighted AraBERT model outperforms traditional embedding techniques such as FastText and Unweighted AraBERT, achieving significant improvements across multiple ROUGE metrics.

Copyrights © 2025






Journal Info

Abbrev

ijcs

Publisher

Subject

Computer Science & IT Electrical & Electronics Engineering Engineering

Description

The Indonesian Journal of Computer Science (IJCS) is a bimonthly peer-reviewed journal published by AI Society and STMIK Indonesia. IJCS editions will be published at the end of February, April, June, August, October and December. The scope of IJCS includes general computer science, information ...