Bulletin of Electrical Engineering and Informatics
Vol 14, No 5: October 2025

Text clustering for analyzing scientific article using pre-trained language model and k-means algorithm

Firdaus, Firdaus (Unknown)
Nurmaini, Siti (Unknown)
Yusliani, Novi (Unknown)
Rachmatullah, Muhammad Naufal (Unknown)
Darmawahyuni, Annisa (Unknown)
Kunang, Yesi Novaria (Unknown)
Fachrurrozi, Muhammad (Unknown)
Armansyah, Risky (Unknown)



Article Info

Publish Date
01 Oct 2025

Abstract

Text clustering is a technique in data mining that can be used for analyzing scientific articles. In Indonesia-accredited journals, SINTA, there are two languages used, Indonesian and English. This is the first research focusing on clustering Indonesian and English texts into one cluster. In this research, bidirectional encoder representations from transformers (BERT) and IndoBERT are used to represent text data into fixed feature vectors. BERT and IndoBERT are pre-trained language models (PLMs) that can produce vector representations that take care of the position and context in a sentence. To cluster the articles, the K-Means algorithm is implemented. This algorithm has good convergence and adapts to the new examples, which helps in improved clustering performance. The best k-value in the K-Means algorithm is defined by using the silhouette score, the elbow method, and the Davies-Bouldin index (DBI). The experiment shows that the silhouette score can produce the most optimal k-value in clustering the articles, which has a mean score of 0.597. The mean score for the elbow method is 0.425, and for the DBI is 0.412. Therefore, the silhouette score optimizes the performance of PLMs and the K-Means algorithm in analyzing scientific articles to determine whether in scope or out of scope.

Copyrights © 2025






Journal Info

Abbrev

EEI

Publisher

Subject

Electrical & Electronics Engineering

Description

Bulletin of Electrical Engineering and Informatics (Buletin Teknik Elektro dan Informatika) ISSN: 2089-3191, e-ISSN: 2302-9285 is open to submission from scholars and experts in the wide areas of electrical, electronics, instrumentation, control, telecommunication and computer engineering from the ...