Garuda - Garba Rujukan Digital

Article Per Year (5 Year)

p-Index From 2021 - 2026

0.23

P-Index

This Author published in this journals

All Journal Attaqwa:Jurnal Ilmu Pendidikan Islam

Ramadhan, Rizky Surya

Unknown Affiliation

Author-ID : 9238171

Religion Humanities Education Social Sciences Other

Published : 1 Documents Claim Missing Document

Claim Missing Document

Articles

The Bridging the Lingustic Gap: Challenges in Building AI Models For Non-Standard Dialects Ramadhan, Rizky Surya; Ria Kusrini, Nurul Azizah; Ardianto, Ardianto
Attaqwa: Jurnal Ilmu Pendidikan Islam Vol. 21 No. 1 (2025): Ilmu Pendidikan Islam
Publisher : Prodi Pendidikan Agama Islam Sekolah Tinggi Agama Islam Daruttaqwa Gresik

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.54069/attaqwa.v21i1.978

This study examines the challenges of developing Natural Language Processing (NLP) models for non-standard and low-resource Indonesian dialects, with a focus on code-mixing, slang, and regional variations commonly encountered in digital communication. Using a synthetic dataset (NusaDialect benchmark) for sentiment analysis and Named Entity Recognition (NER), we examined the performance of widely used models, including mBERT, IndoBERT, XLM-RoBERTa, and GPT-4. Quantitative results reveal a significant performance gap when models trained on standard Indonesian are applied to dialectal input, with IndoBERT outperforming mBERT but being surpassed by XLM-RoBERTa. In contrast, GPT-4 demonstrates strong resilience in zero-shot settings. Qualitative error analysis further reveals systematic weaknesses related to out-of-vocabulary slang, code-switching ambiguity, morphological complexity, and pragmatic or culturally embedded expressions. To address these limitations, two mitigation strategies were tested: continued pretraining on social media data and data augmentation with back-translation. Findings indicate that while continued pretraining yields the most significant performance gains, augmentation offers a more balanced trade-off by improving dialectal robustness without degrading performance on formal Indonesian. The study concludes that overcoming these linguistic challenges requires not only technical solutions but also culturally informed approaches. Practical implications extend to AI applications in customer service, social media analysis, and digital governance, where inclusivity and accessibility for diverse language users are essential.

Co-Authors Ardianto Ardianto Ria Kusrini, Nurul Azizah

Title

Found 1 Documents
Search

Abstract

Title Search

Found 1 Documents Search

Abstract

Title

Found 1 Documents
Search