Purwanti, Wahyu Noviani
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Evaluating SMOTE Performance for Imbalanced Multi-Label Sentiment Classification in MLSE Usability Testing of Mobile App Reviews Basri, Hasan; Purwanti, Wahyu Noviani; Alparisi, Ihsan
Jurnal Teknik Informatika (Jutif) Vol. 7 No. 2 (2026): JUTIF Volume 7, Number 2, April 2026
Publisher : Informatika, Universitas Jenderal Soedirman

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52436/1.jutif.2026.7.2.5351

Abstract

Imbalanced data poses a significant challenge in multi-label classification tasks, especially when combining sentiment analysis with usability testing of mobile application reviews. This study investigates the effectiveness of the Synthetic Minority Over-sampling Technique (SMOTE) in improving classification performance on a multi-label dataset consisting of 10,000 Indonesian language user reviews from the Google Play store. The classification labels represent a combination of usability criteria and sentiment polarity, with strong imbalance observed across several classes. Three machine learning algorithms SVM, Decision Tree, and Random Forest were evaluated on datasets of increasing sizes (1,000 to 10,000 entries), each tested under both original and SMOTE-balanced conditions using stratified 10-fold cross-validation with accuracy and F1-score as the primary metrics. Experimental results show that SMOTE significantly improves the performance of Decision Tree mainly on smaller datasets but exhibits inconsistent gains as the dataset grows, provides modest and stable improvements for Random Forest, and negatively impacts SVM, whose performance remains consistently better without SMOTE. This study concludes that SMOTE is not a universally effective solution and must be applied selectively based on model characteristics. These findings contribute to the Machine Learning for Software Engineering (ML4SE) domain and the field of informatics by highlighting the importance of aligning resampling techniques with algorithmic behaviour when dealing with highly imbalanced multi-label text classification tasks.