Journal of Information Systems and Informatics
Vol 7 No 4 (2025): December

Semantic-Enhanced News Clustering Using TF-IDF and WordNet with K-Means

Hidayat, Mohammad Yusuf (Unknown)
Yaqin, Muhammad Ainul (Unknown)
Abidin, Zainal (Unknown)



Article Info

Publish Date
16 Dec 2025

Abstract

Text clustering of news articles falls under unsupervised learning, where models operate on unlabeled data unless partially annotated. K-Means Clustering remains one of the most commonly applied algorithms due to its efficiency and simplicity. Likewise, TF-IDF is a widely used approach for generating document feature matrices through statistical term weighting. Although still relevant, TF-IDF lacks the ability to represent contextual meaning, which often prevents semantically related news articles from forming coherent clusters when different syntactic variations are used. This limitation is evidenced by the baseline experiment, in which TF-IDF obtained a silhouette score of 0.011 at the optimal cluster configuration (k = 5). To overcome this limitation, this study introduces semantic enrichment using WordNet to improve similarity representation based on keywords extracted through TF-IDF, evaluated on 1000 documents sampled from 21,495 filtered records. The elbow method was applied to determine the optimal number of clusters. At the optimal k-value of 3, the proposed method achieved a silhouette score of 0.505, significantly outperforming the baseline TF-IDF representation despite utilizing fewer clusters. These results demonstrate that incorporating semantic information can enhance statistical text representations and produce more contextually coherent news clusters. To manage computational task, the model applies a first-POS strategy, where only the first synset derived from POS tagging is considered. While this reduces processing complexity, it may limit the model's ability to fully capture polysemy.

Copyrights © 2025






Journal Info

Abbrev

isi

Publisher

Subject

Computer Science & IT

Description

Journal-ISI is a scientific article journal that is the result of ideas, great and original thoughts about the latest research and technological developments covering the fields of information systems, information technology, informatics engineering, and computer science, and industrial engineering ...