Sentiment analysis for Indonesian regional languages faces two persistent challenges: labeled training data is extremely limited for most regional varieties, and transformer models pre-trained on Bahasa Indonesia do not generalize reliably to languages with substantially different morphological structures. Prior work on the NusaX benchmark has primarily relied on direct fine-tuning, treating each regional language independently and without exploiting linguistic proximity between related languages as a transfer signal. This paper proposes Language-Similarity-Guided Transfer (LSGT), a sequential fine-tuning strategy that first adapts a pre-trained model to a pivot language selected using character trigram similarity, followed by fine-tuning on the target language. Four transformer models are evaluated across all 12 NusaX languages using the official train/validation/test splits: IndoBERT, NusaBERT, mBERT, and XLM-R. Performance is evaluated using four metrics: accuracy, macro F1, macro precision, and macro recall. Experimental results show that LSGT improves macro F1 in 44 of 48 model-language combinations, demonstrating that the fine-tuning strategy itself is a major factor in low-resource cross-lingual sentiment classification. XLM-R benefits most strongly from LSGT, achieving an average improvement of +0.137 macro F1 and a peak gain of +0.298 on Madurese. SHAP-based token attribution analysis further reveals that predictions rely heavily on named entities and domain-specific nouns rather than sentiment-bearing vocabulary, indicating a dataset-level bias inherited from the original SmSA corpus and propagated through the NusaX translation pipeline.
Copyrights © 2026