This study aims to compare the performance of Transformer-based models, namely IndoBERT and IndoBERTweet, with three classical machine learning algorithms, namely Support Vector Machine (SVM), Logistic Regression, and Random Forest, in analyzing public sentiment regarding the “Indonesia Gelap” issue that has been widely discussed on social media. The dataset was collected using a crawling process on TikTok user comments containing keywords related to the issue, resulting in 5.000 comments. After the preprocessing stage, 4.667 comments were deemed suitable for analysis and were labeled into positive, negative, and neutral sentiment categories using a lexicon-based approach. To address the imbalance in class distribution, three oversampling strategies were applied: without oversampling, oversampling before data splitting, and oversampling after data splitting applied only to the training data. Each model was evaluated using four performance metrics: accuracy, precision, recall, and F1-score. The results show that oversampling before data splitting yielded the best performance across all models, with IndoBERT achieving the highest F1-score of 0.93, followed by IndoBERTweet with 0.91, while the classical algorithms achieved average F1-scores ranging from 0.89 to 0.90. Meanwhile, both the non-oversampling scenario and oversampling after data splitting on the training data resulted in lower performance, with average F1-scores ranging from 0.70 to 0.78. These findings indicate that Transformer-based models are more effective in capturing informal language characteristics commonly found in social media comments. Furthermore, balancing the dataset before model training significantly improves the stability and performance of sentiment classification on imbalanced data.
Copyrights © 2025