Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : Bulletin of Electrical Engineering and Informatics

Text clustering for analyzing scientific article using pre-trained language model and k-means algorithm Firdaus, Firdaus; Nurmaini, Siti; Yusliani, Novi; Rachmatullah, Muhammad Naufal; Darmawahyuni, Annisa; Kunang, Yesi Novaria; Fachrurrozi, Muhammad; Armansyah, Risky
Bulletin of Electrical Engineering and Informatics Vol 14, No 5: October 2025
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/eei.v14i5.9670

Abstract

Text clustering is a technique in data mining that can be used for analyzing scientific articles. In Indonesia-accredited journals, SINTA, there are two languages used, Indonesian and English. This is the first research focusing on clustering Indonesian and English texts into one cluster. In this research, bidirectional encoder representations from transformers (BERT) and IndoBERT are used to represent text data into fixed feature vectors. BERT and IndoBERT are pre-trained language models (PLMs) that can produce vector representations that take care of the position and context in a sentence. To cluster the articles, the K-Means algorithm is implemented. This algorithm has good convergence and adapts to the new examples, which helps in improved clustering performance. The best k-value in the K-Means algorithm is defined by using the silhouette score, the elbow method, and the Davies-Bouldin index (DBI). The experiment shows that the silhouette score can produce the most optimal k-value in clustering the articles, which has a mean score of 0.597. The mean score for the elbow method is 0.425, and for the DBI is 0.412. Therefore, the silhouette score optimizes the performance of PLMs and the K-Means algorithm in analyzing scientific articles to determine whether in scope or out of scope.
Co-Authors Abdiansah Abdiansah, Abdiansah Abdiansyah Ahmad Fali Oklilas Aini Nabilah Al Fatih, Zaky Alvi Syahrini Alvi Syahrini Utami Angelia, Nadya Anna Dwi Marjusalinah Annisa Darmawahyuni Ari Firdaus Ari Firdaus Ari Wedhasmara Ari Widodo Ariska, Meli Armansyah, Risky Armenia Yuhafiz Aruda, Syechky Al Qodrin Aspirani Utari Astero Nandito Ayu Purwarianti Az Zahra, Lutfiah Betharia Sri Fitrianti Danny Matthew Saputra Darmawahyuni, Annisa Darmawahyuni, Annisa Deris Stiawan Desty Rodiah Desty Roodiah Dhiya Fairuz Diah Kartika Sari Dian Palupi Rini Dian Palupi Rini Dian Palupi Rini Fadel Muhammad, Fadel Fiftinova Firdaus Firdaus Fitria Khoirunnisa Ghita Athalina Gilbert Christopher Jambak, Muhammad Ihsan Kanda Januar Miraswan Kartika, Diah Lidya Irfiyani Silaban M Fachrurrozi M. Fachrurrozi . Mastura Diana Marieska Melly Ariska Meylani Utari Miftahul Falah Milka, Ikbal Adrian Muhammad Fachrurrozi Muhammad Fachurrozi Muhammad Naufal Rachmatullah Muhammad Omar Braddley Muhammad Raihan Habibullah Muhammad Rizqi Assabil Muharromi Maya Agustin Nur Hamidah Nurul Izzah Oktadini, Nabila Rizky Osvari Arsalan Plakasa, Gerald Primanita, Anggina Rahma Haniffia Rahmannisa, Amanda Rahmat Fadli Isnanto Raisha Fatiya Reyhan Navind Shaquille Ridho Putra Sufa Rifka Widyastuti Rizki Kurniati Rizki Ramadandi Rusdi Efendi Saputra, Danny Mathew Saputra, Danny Matthew Sari, Tri Kurnia septi ana Siti Nurmaini Syechky Al Qodrin syechky al qodrin aruda Tiara Dewangga Tristi Dwi Rizki Wenty Octaviani Winda Kurnia Sari Yenny Anwar Yesi Novaria Kunang YUNITA Yunita Yunita Yunita Yunita Yunita Yunita