Journal of Computing Theories and Applications
Vol. 3 No. 4 (2026): JCTA 3(4) 2026

Language-Similarity-Guided Transfer Fine-Tuning of Pre-trained Transformer Models for Sentiment Analysis Across 12 Indonesian Regional Languages

Brian Rizqi Paradisiaca Darnoto (University of Jember)
Dony Bahtera Firmawan (University of Jember)



Article Info

Publish Date
07 May 2026

Abstract

Sentiment analysis for Indonesian regional languages faces two persistent challenges: labeled training data is extremely limited for most regional varieties, and transformer models pre-trained on Bahasa Indonesia do not generalize reliably to languages with substantially different morphological structures. Prior work on the NusaX benchmark has primarily relied on direct fine-tuning, treating each regional language independently and without exploiting linguistic proximity between related languages as a transfer signal. This paper proposes Language-Similarity-Guided Transfer (LSGT), a sequential fine-tuning strategy that first adapts a pre-trained model to a pivot language selected using character trigram similarity, followed by fine-tuning on the target language. Four transformer models are evaluated across all 12 NusaX languages using the official train/validation/test splits: IndoBERT, NusaBERT, mBERT, and XLM-R. Performance is evaluated using four metrics: accuracy, macro F1, macro precision, and macro recall. Experimental results show that LSGT improves macro F1 in 44 of 48 model-language combinations, demonstrating that the fine-tuning strategy itself is a major factor in low-resource cross-lingual sentiment classification. XLM-R benefits most strongly from LSGT, achieving an average improvement of +0.137 macro F1 and a peak gain of +0.298 on Madurese. SHAP-based token attribution analysis further reveals that predictions rely heavily on named entities and domain-specific nouns rather than sentiment-bearing vocabulary, indicating a dataset-level bias inherited from the original SmSA corpus and propagated through the NusaX translation pipeline.

Copyrights © 2026






Journal Info

Abbrev

jcta

Publisher

Subject

Computer Science & IT Decision Sciences, Operations Research & Management

Description

Journal of Computing Theories and Applications (JCTA) is a refereed, international journal that covers all aspects of foundations, theories and the practical applications of computer science. FREE OF CHARGE for submission and publication. All accepted articles will be published online and accessed ...