Garuda - Garba Rujukan Digital

Article Per Year (5 Year)

p-Index From 2021 - 2026

0.23

P-Index

This Author published in this journals

All Journal JOIV : International Journal on Informatics Visualization

Andika Dwiyanto, Felix

Unknown Affiliation

Author-ID : 8945469

Computer Science & IT

Published : 1 Documents Claim Missing Document

Claim Missing Document

Articles

Multilingual Parallel Corpus for Indonesian Low-Resource Languages Sulistyo, Danang Arbian; Wibawa, Aji Prasetya; Prasetya, Didik Dwi; Ahda, Fadhli Almu’iini; Arya Astawa, I Nyoman Gede; Andika Dwiyanto, Felix
JOIV : International Journal on Informatics Visualization Vol 9, No 5 (2025)
Publisher : Society of Visual Informatics

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.62527/joiv.9.5.3412

Indonesia has an extraordinary number of languages, with more than 700 regional languages such as Javanese, Madurese, Balinese, Sundanese, and Bugis. Despite the wealth of languages, digital resources for these languages remain scarce, making the preservation and accessibility of digital languages a significant challenge. Research was conducted to address this gap by building a multilingual parallel corpus consisting of more than 150,000 phrase pairs extracted from Bible translations in five regional languages in Indonesia. Rigorous preprocessing, normalization, and Unicode tokenization were performed to improve data quality and consistency. The encoder-decoder architecture was a key focus in the development of the NMT model. Evaluation focused on forward and backward translation directions, which were measured using BLEU scores. The results show that forward translation consistently outperforms backward translation. The Indonesian Javanese model produced a score of 0.9939 for BLEU-1 and 0.9844 for BLEU-4, indicating a high level of translation quality. In contrast, reverse translation tasks, such as translating from Sundanese to Indonesian, presented significant challenges, with BLEU-4 scores as low as 0.3173. This illustrates the complexity of the translation system from Indonesian to local languages. If future research focuses on transformer-based models and incorporates additional linguistic parameters to enhance the accuracy of natural language processing (NLP) models for Indonesia's underrepresented regional languages, this work provides a dataset that can be utilized for that purpose.

Co-Authors Aji Prasetya Wibawa Didik Dwi Prasetya Fadhli Almu’iini Ahda I Nyoman Gede Arya Astawa Sulistyo, Danang Arbian

Title

Found 1 Documents
Search

Abstract

Title Search

Found 1 Documents Search

Abstract

Title

Found 1 Documents
Search