Fitriana Cendra Kasih
Department of Informatics, Faculty of Computer Science, Esa Unggul University, Jakarta, Indonesia

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Comparative Analysis of Baseline IndoBERT, Class-Weighted IndoBERT, and SMOTE with Support Vector Machine for Handling Imbalanced Sentiment Classification in Indonesian Riya Widayanti; Fitriana Cendra Kasih
Jurnal Teknik Informatika (Jutif) Vol. 7 No. 3 (2026): JUTIF Volume 7, Number 3, June 2026
Publisher : Informatika, Universitas Jenderal Soedirman

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52436/1.jutif.2026.7.3.5692

Abstract

Imbalanced data distribution is a common issue in Indonesian sentiment classification and significantly affects the performance of classification models. This study investigates three approaches, namely SMOTE combined with Support Vector Machine (SMOTE + SVM), Baseline IndoBERT, and Class-Weighted IndoBERT. The dataset consists of Google Maps reviews, which are categorized into positive, neutral, and negative sentiments. Prior to model training, the data undergo preprocessing steps including cleaning, normalization, and tokenization. Model performance is evaluated using confusion matrix analysis and macro-averaged F1-score. The results show that Baseline IndoBERT achieves a macro F1-score of 0.598, followed by Class-Weighted IndoBERT with 0.582, while SMOTE + SVM obtains the lowest performance at 0.545. Despite having slightly lower overall performance, Class-Weighted IndoBERT demonstrates a more balanced capability in recognizing minority classes. These findings indicate that incorporating class-weighting mechanisms into transformer-based models can help mitigate bias toward majority classes and improve minority class recognition. From a scientific perspective, this study provides empirical evidence on how imbalance-aware learning strategies influence the behavior of transformer-based models in imbalanced text classification tasks. Furthermore, this study highlights the importance of using macro-averaged evaluation metrics to ensure a more comprehensive and fair assessment of model performance, particularly in low-resource and imbalanced language settings.