International Journal of Advances in Data and Information Systems
Vol. 6 No. 3 (2025): December 2025 - International Journal of Advances in Data and Information Syste

Sentiment Analysis of Tokopedia Customer Reviews Using BiLSTM and IndoBERT with Comparative Analysis of Preprocessing and Labeling Methods

Anadra, Rahmi (Unknown)
Wijayanto, Hari (Unknown)
Sadik, Kusman (Unknown)



Article Info

Publish Date
01 Dec 2025

Abstract

This study addresses key challenges in Indonesian sentiment analysis related to preprocessing, labeling strategies, and class imbalance. It compares the performance of BiLSTM and IndoBERT using user reviews collected from Tokopedia. The dataset was manually and automatically labeled, then processed under three preprocessing schemes. Both models were trained with tuned hyperparameters and imbalance-handling techniques and evaluated through twenty rounds of stratified five-fold cross-validation. Performance was assessed using balanced accuracy and F1-score. IndoBERT achieved the highest results, with balanced accuracy up to 0.85 and F1-scores up to 0.83, while BiLSTM reached balanced accuracy up to 0.78 and F1-scores up to 0.76. Applying class weight and focal loss improved model performance by approximately 2% to 11% over the baseline. BiLSTM demonstrated greater training efficiency, requiring only 1 to 2.5 minutes per epoch, compared with IndoBERT’s 2.6 to 3.6 minutes. Although manual labeling remained superior in capturing contextual nuance and emotional cues, GPT-based labeling showed strong agreement with the human annotations. A four-way ANOVA revealed that all main factors and several interactions significantly influenced classification outcomes. Overall, BiLSTM provides faster training efficiency, whereas IndoBERT delivers higher predictive accuracy.

Copyrights © 2025






Journal Info

Abbrev

IJADIS

Publisher

Subject

Computer Science & IT Electrical & Electronics Engineering

Description

International Journal of Advances in Data and Information Systems (IJADIS) (e-ISSN: 2721-3056) is a peer-reviewed journal in the field of data science and information system that is published twice a year; scheduled in April and October. The journal is published for those who wish to share ...