Garuda - Garba Rujukan Digital

Article Per Year (5 Year)

p-Index From 2021 - 2026

0.23

P-Index

This Author published in this journals

All Journal Advance Sustainable Science, Engineering and Technology (ASSET)

Nur Latifah Dwi M.S

Universitas PGRI Semarang

Author-ID : 10207861

Chemistry Computer Science & IT Control & Systems Engineering Electrical & Electronics Engineering Industrial & Manufacturing Engineering

Published : 1 Documents Claim Missing Document

Claim Missing Document

Articles

Comparative Evaluation of Automatic Labeling and Modeling Strategies for Indonesian Sentiment Analysis: Methodology and Performance Evaluation Khoiriya Latifa; Agung Handayanto; Nur Latifah Dwi M.S; Rahul Bhandari; Ton Nguyen Trong Hien; Doston Pirnazarov
Advance Sustainable Science Engineering and Technology Vol. 8 No. 3 (2026): May - July
Publisher : Science and Technology Research Centre Universitas PGRI Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.26877/asset.v8i3.2862

Sentiment analysis is vital for understanding consumer perception, yet Indonesian sentiment classification faces challenges due to labeled data scarcity and computational constraints. This study advances automatic labeling techniques and establishes performance benchmarks for Indonesian text. The research compares two labeling approaches InSet Lexicon and IndoBERT based Hugging Face pipeline on 8,447 Tapera-related opinions. Results show InSet Lexicon produced a highly skewed distribution (89.66% neutral), while the IndoBERT pipeline achieved a more balanced distribution (47.66% neutral, 38.43% positive, 13.91% negative).. Evaluation of various modeling strategies revealed that combining InSet Lexicon + TF-IDF with Naïve Bayes or Random Forest achieved scores above 85%. While RNN-LSTM reached >90% accuracy, it required significant resources. Notably, fine-tuning IndoBERT with optimal hyperparameters yielded the most robust performance, achieving 80–90% accuracy with a low validation loss of 0.1. The study concludes that for small datasets (<12,000 samples), the most effective strategies for Indonesian sentiment analysis are either the InSet Lexicon paired with traditional Machine Learning or automatic labeling using pre-trained models followed by rigorous fine-tuning.

Co-Authors Agung Handayanto Doston Pirnazarov Khoiriya Latifa Rahul Bhandari Ton Nguyen Trong Hien

Title

Found 1 Documents
Search

Abstract

Title Search

Found 1 Documents Search

Abstract

Title

Found 1 Documents
Search