This research develops a cross-lingual sentiment analysis system (RoBERTa-IndoBERT) to monitor public opinion on Bank Indonesia’s 2025 monetary policy from X (Twitter), addressing the scarcity of Indonesian labels and noisy social media text. We introduce a "translate-then-classify" pipeline: Indonesian posts are translated into English, auto-labeled by a mature English RoBERTa model, and these labels are used to fine-tune IndoBERT on the original texts. We compare this cross-lingual (CL) approach, with and without back-translation (BT) augmentation, against a baseline Indo-only model. Performance measured by Accuracy and Macro-F1 indicates the CL pipeline is substantially better than the baseline. The complete model (IndoBERT + CL + BT) yields a Macro-F1 of 98.1%, a 2.8 percentage point (pp) improvement over the baseline (95.3%). Qualitative error analysis corroborates the CL model is more stable, less prone to extreme polarity flips, and better at detecting implicit sentiment. This research demonstrates that a CL auto-labeling pipeline is an efficient and resilient solution for Indonesian sentiment analysis in low-resource scenarios.
Copyrights © 2025