CommIT (Communication & Information Technology)
Vol. 20 No. 1 (2026): CommIT Journal (in press)

Advancing Cross-Cultural Natural Language Processing with a Focus on Sundanese Language and Contextual Nuances

Anggi Muhammad Rifai (Universitas Pelita Bangsa)
Ema Utami (Universitas Amikom Yogyakarta)
Amali Amali (Universitas Pelita Bangsa)
Muhamad Fatchan (Universitas Pelita Bangsa)
Muhamad Ekhsan (Universitas Pelita Bangsa)



Article Info

Publish Date
09 Apr 2026

Abstract

The Sundanese language, as one of Indonesia’s regional tongues, holds deep cultural value but is still underrepresented in computational linguistics. The research addresses this gap by developing a translation model between Sundanese and Indonesian using a transformer-based sequence-to-sequence (Seq2Seq) architecture. With a parallel dataset of 3,616 sentence pairs, the model is fine-tuned to capture linguistic and contextual subtleties. The evaluation yields strong results: Bilingual Evaluation Understudy (BLEU) score of 44.12, Recall - Oriented Understudy for Gisting Evaluation (ROUGE)-1 F1-Score of 0.72, and ROUGE-L F1-Score of 0.71. Those demonstrate high translation quality despite limited data. Unlike earlier Sundanese translation studies that rely on Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), or standard transformer models, this research uniquely leverages the multilingual pretrained M2M100 Transformer, enabling transfer learning from high-resource languages to improve low-resource performance. These outcomes highlight the model’s potential for real-world applications, such as translation tools for education and cultural exchange. The research emphasizes the importance of improving access to Sundanese texts and promoting its digital presence to aid in language preservation. Overall, the research not only advances Natural Language Processing (NLP) research for low-resource languages but also reinforces the importance of integrating regional languages like Sundanese into modern technology. Building upon prior studies on Indonesian–Sundanese translation, the research novelty lies in fine-tuning a multilingual Seq2Seq Transformer that captures both linguistic and contextual nuances, thereby setting a new benchmark for lowresource language processing.

Copyrights © 2026






Journal Info

Abbrev

COMMIT

Publisher

Subject

Computer Science & IT

Description

Journal of Communication and Information Technology (CommIT) focuses on various issues spanning: software engineering, mobile technology and applications, robotics, database system, information engineering, artificial intelligent, interactive multimedia, computer networking, information system ...