Building of Informatics, Technology and Science
Vol 7 No 3 (2025): December 2025

Perbandingan Kinerja Model IndoBERT, IndoBERTweet, dan Algoritma Klasik pada Analisis Sentimen Isu Indonesia Gelap

Alvin, Fris (Unknown)
Winarsih, Nurul Anisa Sri (Unknown)



Article Info

Publish Date
08 Dec 2025

Abstract

This study aims to compare the performance of Transformer-based models, namely IndoBERT and IndoBERTweet, with three classical machine learning algorithms, namely Support Vector Machine (SVM), Logistic Regression, and Random Forest, in analyzing public sentiment regarding the “Indonesia Gelap” issue that has been widely discussed on social media. The dataset was collected using a crawling process on TikTok user comments containing keywords related to the issue, resulting in 5.000 comments. After the preprocessing stage, 4.667 comments were deemed suitable for analysis and were labeled into positive, negative, and neutral sentiment categories using a lexicon-based approach. To address the imbalance in class distribution, three oversampling strategies were applied: without oversampling, oversampling before data splitting, and oversampling after data splitting applied only to the training data. Each model was evaluated using four performance metrics: accuracy, precision, recall, and F1-score. The results show that oversampling before data splitting yielded the best performance across all models, with IndoBERT achieving the highest F1-score of 0.93, followed by IndoBERTweet with 0.91, while the classical algorithms achieved average F1-scores ranging from 0.89 to 0.90. Meanwhile, both the non-oversampling scenario and oversampling after data splitting on the training data resulted in lower performance, with average F1-scores ranging from 0.70 to 0.78. These findings indicate that Transformer-based models are more effective in capturing informal language characteristics commonly found in social media comments. Furthermore, balancing the dataset before model training significantly improves the stability and performance of sentiment classification on imbalanced data.

Copyrights © 2025






Journal Info

Abbrev

bits

Publisher

Subject

Computer Science & IT

Description

Building of Informatics, Technology and Science (BITS) is an open access media in publishing scientific articles that contain the results of research in information technology and computers. Paper that enters this journal will be checked for plagiarism and peer-rewiew first to maintain its quality. ...