Jurnal Teknik Informatika (JUTIF)
Vol. 7 No. 3 (2026): JUTIF Volume 7, Number 3, June 2026

Comparative Analysis of Baseline IndoBERT, Class-Weighted IndoBERT, and SMOTE with Support Vector Machine for Handling Imbalanced Sentiment Classification in Indonesian

Riya Widayanti (Department of Informatics, Faculty of Computer Science, Esa Unggul University, Jakarta, Indonesia)
Fitriana Cendra Kasih (Department of Informatics, Faculty of Computer Science, Esa Unggul University, Jakarta, Indonesia)



Article Info

Publish Date
15 Jun 2026

Abstract

Imbalanced data distribution is a common issue in Indonesian sentiment classification and significantly affects the performance of classification models. This study investigates three approaches, namely SMOTE combined with Support Vector Machine (SMOTE + SVM), Baseline IndoBERT, and Class-Weighted IndoBERT. The dataset consists of Google Maps reviews, which are categorized into positive, neutral, and negative sentiments. Prior to model training, the data undergo preprocessing steps including cleaning, normalization, and tokenization. Model performance is evaluated using confusion matrix analysis and macro-averaged F1-score. The results show that Baseline IndoBERT achieves a macro F1-score of 0.598, followed by Class-Weighted IndoBERT with 0.582, while SMOTE + SVM obtains the lowest performance at 0.545. Despite having slightly lower overall performance, Class-Weighted IndoBERT demonstrates a more balanced capability in recognizing minority classes. These findings indicate that incorporating class-weighting mechanisms into transformer-based models can help mitigate bias toward majority classes and improve minority class recognition. From a scientific perspective, this study provides empirical evidence on how imbalance-aware learning strategies influence the behavior of transformer-based models in imbalanced text classification tasks. Furthermore, this study highlights the importance of using macro-averaged evaluation metrics to ensure a more comprehensive and fair assessment of model performance, particularly in low-resource and imbalanced language settings.

Copyrights © 2026






Journal Info

Abbrev

jurnal

Publisher

Subject

Computer Science & IT

Description

Jurnal Teknik Informatika (JUTIF) is an Indonesian national journal, publishes high-quality research papers in the broad field of Informatics, Information Systems and Computer Science, which encompasses software engineering, information system development, computer systems, computer network, ...