JUITA : Jurnal Informatika
JUITA Vol. 14 Issue 1, March 2026




Article Info

Publish Date
31 Mar 2026

Abstract

The exponential growth of scientific literature poses a significant challenge for manually identifying thematic trends, necessitating automated analysis methods. This study aims to determine an optimal topic modeling pipeline by conducting a comparative analysis to maximize the coherence of topics extracted from scientific research. Three distinct pipelines were implemented and evaluated on a corpus of 20,972 scientific article abstracts. These included a custom pipeline combining SBERT, UMAP, and HDBSCAN; a second configuration using RoBERTa, PCA, and KMeans; and a third using the integrated BERTopic model. Performance evaluation, quantitatively benchmarked using the C_v coherence score, revealed that the integrated BERTopic model achieved the highest score of 0.7012. This result significantly surpassed the custom SBERT-based pipeline and the RoBERTa-based pipeline, which scored 0.6079 and 0.4756, respectively. The findings demonstrate that an integrated, purpose-built model like BERTopic is superior for generating highly coherent and interpretable thematic structures from scientific text. This research provides empirical guidance for researchers, benchmarking how integrated models offer a more robust solution for large-scale literature analysis compared to modular pipeline designs.

Copyrights © 2026






Journal Info

Abbrev

JUITA

Publisher

Subject

Computer Science & IT

Description

UITA: Jurnal Informatika is a science journal and informatics field application that presents articles on thoughts and research of the latest developments. JUITA is a journal peer reviewed and open access. JUITA is published by the Informatics Engineering Study Program, Universitas Muhammadiyah ...