Imbalanced data distribution is a common issue in Indonesian sentiment classification and significantly affects the performance of classification models. This study investigates three approaches, namely SMOTE combined with Support Vector Machine (SMOTE + SVM), Baseline IndoBERT, and Class-Weighted IndoBERT. The dataset consists of Google Maps reviews, which are categorized into positive, neutral, and negative sentiments. Prior to model training, the data undergo preprocessing steps including cleaning, normalization, and tokenization. Model performance is evaluated using confusion matrix analysis and macro-averaged F1-score. The results show that Baseline IndoBERT achieves a macro F1-score of 0.598, followed by Class-Weighted IndoBERT with 0.582, while SMOTE + SVM obtains the lowest performance at 0.545. Despite having slightly lower overall performance, Class-Weighted IndoBERT demonstrates a more balanced capability in recognizing minority classes. These findings indicate that incorporating class-weighting mechanisms into transformer-based models can help mitigate bias toward majority classes and improve minority class recognition. From a scientific perspective, this study provides empirical evidence on how imbalance-aware learning strategies influence the behavior of transformer-based models in imbalanced text classification tasks. Furthermore, this study highlights the importance of using macro-averaged evaluation metrics to ensure a more comprehensive and fair assessment of model performance, particularly in low-resource and imbalanced language settings.
Copyrights © 2026