Claim Missing Document
Check
Articles

Found 1 Documents
Search

Comparative Evaluation of Automatic Labeling and Modeling Strategies for Indonesian Sentiment Analysis: Methodology and Performance Evaluation Khoiriya Latifa; Agung Handayanto; Nur Latifah Dwi M.S; Rahul Bhandari; Ton Nguyen Trong Hien; Doston Pirnazarov
Advance Sustainable Science Engineering and Technology Vol. 8 No. 3 (2026): May - July
Publisher : Science and Technology Research Centre Universitas PGRI Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.26877/asset.v8i3.2862

Abstract

Sentiment analysis is vital for understanding consumer perception, yet Indonesian sentiment classification faces challenges due to labeled data scarcity and computational constraints. This study advances automatic labeling techniques and establishes performance benchmarks for Indonesian text. The research compares two labeling approaches InSet Lexicon and IndoBERT based Hugging Face pipeline on 8,447 Tapera-related opinions. Results show InSet Lexicon produced a highly skewed distribution (89.66% neutral), while the IndoBERT pipeline achieved a more balanced distribution (47.66% neutral, 38.43% positive, 13.91% negative).. Evaluation of various modeling strategies revealed that combining InSet Lexicon + TF-IDF with Naïve Bayes or Random Forest achieved scores above 85%. While RNN-LSTM reached >90% accuracy, it required significant resources. Notably, fine-tuning IndoBERT with optimal hyperparameters yielded the most robust performance, achieving 80–90% accuracy with a low validation loss of 0.1. The study concludes that for small datasets (<12,000 samples), the most effective strategies for Indonesian sentiment analysis are either the InSet Lexicon paired with traditional Machine Learning or automatic labeling using pre-trained models followed by rigorous fine-tuning.