Anggi Muhammad Rifai
Universitas Pelita Bangsa

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Advancing Cross-Cultural Natural Language Processing with a Focus on Sundanese Language and Contextual Nuances Anggi Muhammad Rifai; Ema Utami; Amali Amali; Muhamad Fatchan; Muhamad Ekhsan
CommIT (Communication and Information Technology) Journal Vol. 20 No. 1 (2026): CommIT Journal (in press)
Publisher : Bina Nusantara University

Show Abstract | Download Original | Original Source | Check in Google Scholar

Abstract

The Sundanese language, as one of Indonesia’s regional tongues, holds deep cultural value but is still underrepresented in computational linguistics. The research addresses this gap by developing a translation model between Sundanese and Indonesian using a transformer-based sequence-to-sequence (Seq2Seq) architecture. With a parallel dataset of 3,616 sentence pairs, the model is fine-tuned to capture linguistic and contextual subtleties. The evaluation yields strong results: Bilingual Evaluation Understudy (BLEU) score of 44.12, Recall - Oriented Understudy for Gisting Evaluation (ROUGE)-1 F1-Score of 0.72, and ROUGE-L F1-Score of 0.71. Those demonstrate high translation quality despite limited data. Unlike earlier Sundanese translation studies that rely on Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), or standard transformer models, this research uniquely leverages the multilingual pretrained M2M100 Transformer, enabling transfer learning from high-resource languages to improve low-resource performance. These outcomes highlight the model’s potential for real-world applications, such as translation tools for education and cultural exchange. The research emphasizes the importance of improving access to Sundanese texts and promoting its digital presence to aid in language preservation. Overall, the research not only advances Natural Language Processing (NLP) research for low-resource languages but also reinforces the importance of integrating regional languages like Sundanese into modern technology. Building upon prior studies on Indonesian–Sundanese translation, the research novelty lies in fine-tuning a multilingual Seq2Seq Transformer that captures both linguistic and contextual nuances, thereby setting a new benchmark for lowresource language processing.