Clustering is a fundamental technique in data analysis, particularly in unsupervised learning, to group data with similar characteristics. However, the effectiveness of the K-Means algorithm in text clustering heavily depends on proper feature extraction. This study proposes an enhanced feature extraction approach by integrating Term Frequency-Inverse Document Frequency (TF-IDF) and Latent Dirichlet Allocation (LDA) to improve clustering performance on journal article datasets. The dataset consists of 427 journal article abstracts collected from Google Scholar. The preprocessing steps include tokenization, stopword removal, and TF-IDF vectorization, followed by topic extraction using LDA, which serves as input features for the K-Means clustering algorithm. The optimal number of clusters is determined using the Silhouette Score, with the best result obtained at k=9, achieving a score of 0.6806. The practical implications of this study include improved accuracy in academic document clustering, with applications in journal recommendation systems, digital library indexing, and research trend analysis. The results demonstrate that the combination of TF-IDF and LDA produces more informative text representations, significantly enhancing clustering quality. This study contributes to text mining and data science by proposing a systematic preprocessing framework for document clustering. Future research could explore its application to full-text articles, hierarchical clustering, or deep learning-based models to further improve clustering performance.
                        
                        
                        
                        
                            
                                Copyrights © 2024