Br Sembiring, Aldina Bonaria Siva
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Comparison of IndoBERT and SVM Performance in Sentiment Analysis of Digital Education Platforms Br Sembiring, Aldina Bonaria Siva; Robet M.Kom; S.Kom., S.A.B., M.M, Leony Hoki,
Sinkron : jurnal dan penelitian teknik informatika Vol. 10 No. 1 (2026): Article Research January 2026
Publisher : Politeknik Ganesha Medan

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33395/sinkron.v10i1.15472

Abstract

Sentiment analysis on user-generated reviews is essential for understanding the quality and effectiveness of digital education platforms. This study compares the performance of Support Vector Machine (SVM) and IndoBERT in classifying sentiments from Ruangguru user reviews. The original dataset contains 111,838 reviews, from which a stratified sample of 10,000 entries was selected for experimentation to maintain class proportion. Text preprocessing applied standard/light normalization (case folding and light cleaning, handling URLs/users/hashtags and repetition) without stopword removal to preserve polarity cues. Auto labels are validated on 139 manually annotated samples (accuracy 0.763, Cohen’s κ 0.644), indicating reliable yet imperfect alignment. To ensure a fair, leakage-safe comparison, we use a fixed 20% standard test split for all models; within the remaining data, 10% is used for validation, and IndoBERT checkpoints are selected based on validation macro-F1 (early stopping). The SVM baseline combines word- and character-level TF-IDF with class-balanced LinearSVC and grid search, achieving accuracy 0.888 and macro-F1 0.543, strong on positives but limited for the neutral class. IndoBERT yields more balanced performance: the class-weighted variant attains the best macro-F1 0.601 (accuracy 0.857), while the baseline reaches the highest IndoBERT accuracy (0.867) with macro-F1 0.596. These results show that Transformer models provide a more balanced trade-off under severe imbalance, whereas SVM remains a competitive accuracy-oriented baseline. In practice, platforms should prioritize macro-F1, use optimized IndoBERT when minority opinions matter, and invest in expanded manual labeling and advanced imbalance handling to improve neutral detection further.