JOIV : International Journal on Informatics Visualization
Vol 9, No 5 (2025)

Multilingual Parallel Corpus for Indonesian Low-Resource Languages

Sulistyo, Danang Arbian (Unknown)
Wibawa, Aji Prasetya (Unknown)
Prasetya, Didik Dwi (Unknown)
Ahda, Fadhli Almu’iini (Unknown)
Arya Astawa, I Nyoman Gede (Unknown)
Andika Dwiyanto, Felix (Unknown)



Article Info

Publish Date
30 Sep 2025

Abstract

Indonesia has an extraordinary number of languages, with more than 700 regional languages such as Javanese, Madurese, Balinese, Sundanese, and Bugis. Despite the wealth of languages, digital resources for these languages remain scarce, making the preservation and accessibility of digital languages a significant challenge. Research was conducted to address this gap by building a multilingual parallel corpus consisting of more than 150,000 phrase pairs extracted from Bible translations in five regional languages in Indonesia. Rigorous preprocessing, normalization, and Unicode tokenization were performed to improve data quality and consistency. The encoder-decoder architecture was a key focus in the development of the NMT model. Evaluation focused on forward and backward translation directions, which were measured using BLEU scores. The results show that forward translation consistently outperforms backward translation. The Indonesian Javanese model produced a score of 0.9939 for BLEU-1 and 0.9844 for BLEU-4, indicating a high level of translation quality. In contrast, reverse translation tasks, such as translating from Sundanese to Indonesian, presented significant challenges, with BLEU-4 scores as low as 0.3173. This illustrates the complexity of the translation system from Indonesian to local languages. If future research focuses on transformer-based models and incorporates additional linguistic parameters to enhance the accuracy of natural language processing (NLP) models for Indonesia's underrepresented regional languages, this work provides a dataset that can be utilized for that purpose.

Copyrights © 2025






Journal Info

Abbrev

joiv

Publisher

Subject

Computer Science & IT

Description

JOIV : International Journal on Informatics Visualization is an international peer-reviewed journal dedicated to interchange for the results of high quality research in all aspect of Computer Science, Computer Engineering, Information Technology and Visualization. The journal publishes state-of-art ...