Ramadhan, Rizky Surya
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

The Bridging the Lingustic Gap: Challenges in Building AI Models For Non-Standard Dialects Ramadhan, Rizky Surya; Ria Kusrini, Nurul Azizah; Ardianto, Ardianto
Attaqwa: Jurnal Ilmu Pendidikan Islam Vol. 21 No. 1 (2025): Ilmu Pendidikan Islam
Publisher : Prodi Pendidikan Agama Islam Sekolah Tinggi Agama Islam Daruttaqwa Gresik

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.54069/attaqwa.v21i1.978

Abstract

This study examines the challenges of developing Natural Language Processing (NLP) models for non-standard and low-resource Indonesian dialects, with a focus on code-mixing, slang, and regional variations commonly encountered in digital communication. Using a synthetic dataset (NusaDialect benchmark) for sentiment analysis and Named Entity Recognition (NER), we examined the performance of widely used models, including mBERT, IndoBERT, XLM-RoBERTa, and GPT-4. Quantitative results reveal a significant performance gap when models trained on standard Indonesian are applied to dialectal input, with IndoBERT outperforming mBERT but being surpassed by XLM-RoBERTa. In contrast, GPT-4 demonstrates strong resilience in zero-shot settings. Qualitative error analysis further reveals systematic weaknesses related to out-of-vocabulary slang, code-switching ambiguity, morphological complexity, and pragmatic or culturally embedded expressions. To address these limitations, two mitigation strategies were tested: continued pretraining on social media data and data augmentation with back-translation. Findings indicate that while continued pretraining yields the most significant performance gains, augmentation offers a more balanced trade-off by improving dialectal robustness without degrading performance on formal Indonesian. The study concludes that overcoming these linguistic challenges requires not only technical solutions but also culturally informed approaches. Practical implications extend to AI applications in customer service, social media analysis, and digital governance, where inclusivity and accessibility for diverse language users are essential.