This systematic literature review (SLR) investigates the evolution of Natural Language Processing (NLP) for Indonesian regional languages from 2020 to 2025. Analyzing 13 pivotal studies, the research identifies a significant transition from fragmented studies of high-population languages, such as Sundanese and Madurese, toward inclusive, archipelago-wide frameworks covering low-resource dialects like Acehnese and Nias. Architecturally, the field has progressed from classical machine learning to Transformer-based Large Language Models (LLMs), including IndoBART and GPT. Furthermore, data provenance has evolved from unstructured social media corpora to standardized multilingual benchmarks like NusaX and NusaCrowd. Despite these advancements, persistent gaps in data standardization and large-scale pretraining resources remain. Future research should prioritize cross-lingual transfer learning and specialized benchmarks to ensure the technological sustainability of Indonesia’s diverse linguistic heritage
Copyrights © 2026