Garuda - Garba Rujukan Digital

Brilliance: Research of Artificial Intelligence

Vol. 4 No. 2 (2024): Brilliance: Research of Artificial Intelligence, Article Research November 2024

Surianto, Dewi Fatmarani (Unknown)
Surianto, Dewi Fatmawati (Unknown)

Publish Date
07 Mar 2025

Clustering is a fundamental technique in data analysis, particularly in unsupervised learning, to group data with similar characteristics. However, the effectiveness of the K-Means algorithm in text clustering heavily depends on proper feature extraction. This study proposes an enhanced feature extraction approach by integrating Term Frequency-Inverse Document Frequency (TF-IDF) and Latent Dirichlet Allocation (LDA) to improve clustering performance on journal article datasets. The dataset consists of 427 journal article abstracts collected from Google Scholar. The preprocessing steps include tokenization, stopword removal, and TF-IDF vectorization, followed by topic extraction using LDA, which serves as input features for the K-Means clustering algorithm. The optimal number of clusters is determined using the Silhouette Score, with the best result obtained at k=9, achieving a score of 0.6806. The practical implications of this study include improved accuracy in academic document clustering, with applications in journal recommendation systems, digital library indexing, and research trend analysis. The results demonstrate that the combination of TF-IDF and LDA produces more informative text representations, significantly enhancing clustering quality. This study contributes to text mining and data science by proposing a systematic preprocessing framework for document clustering. Future research could explore its application to full-text articles, hierarchical clustering, or deep learning-based models to further improve clustering performance.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

Brilliance: Research of Artificial Intelligence

Website

Abbrev

brilliance

Publisher

Information Technology and Science

Subject

Decision Sciences, Operations Research & Management Mathematics Other

Description

Brilliance: Research of Artificial Intelligence is The Scientific Journal. Brilliance is published twice in one year, namely in February, May and November. Brilliance aims to promote research in the field of Informatics Engineering which focuses on publishing quality papers about the latest ...

Article Info

Abstract

Enhancing K-Means Clustering for Journal Articles using TF-IDF and LDA Feature Extraction

Article Info

Abstract