Pratama, Farriel Arrianta Akbar
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

English English Pratama, Farriel Arrianta Akbar; Arief, Muhammad Eka Nur; Nastiti, Vinna Rahmayanti Setyaning
JUITA: Jurnal Informatika JUITA Vol. 14 Issue 1, March 2026
Publisher : Department of Informatics Engineering, Universitas Muhammadiyah Purwokerto

Show Abstract | Download Original | Original Source | Check in Google Scholar

Abstract

The exponential growth of scientific literature poses a significant challenge for manually identifying thematic trends, necessitating automated analysis methods. This study aims to determine an optimal topic modeling pipeline by conducting a comparative analysis to maximize the coherence of topics extracted from scientific research. Three distinct pipelines were implemented and evaluated on a corpus of 20,972 scientific article abstracts. These included a custom pipeline combining SBERT, UMAP, and HDBSCAN; a second configuration using RoBERTa, PCA, and KMeans; and a third using the integrated BERTopic model. Performance evaluation, quantitatively benchmarked using the C_v coherence score, revealed that the integrated BERTopic model achieved the highest score of 0.7012. This result significantly surpassed the custom SBERT-based pipeline and the RoBERTa-based pipeline, which scored 0.6079 and 0.4756, respectively. The findings demonstrate that an integrated, purpose-built model like BERTopic is superior for generating highly coherent and interpretable thematic structures from scientific text. This research provides empirical guidance for researchers, benchmarking how integrated models offer a more robust solution for large-scale literature analysis compared to modular pipeline designs.