Claim Missing Document
Check
Articles

Found 2 Documents
Search
Journal : JOIV : International Journal on Informatics Visualization

Multilingual Parallel Corpus for Indonesian Low-Resource Languages Sulistyo, Danang Arbian; Wibawa, Aji Prasetya; Prasetya, Didik Dwi; Ahda, Fadhli Almu’iini; Arya Astawa, I Nyoman Gede; Andika Dwiyanto, Felix
JOIV : International Journal on Informatics Visualization Vol 9, No 5 (2025)
Publisher : Society of Visual Informatics

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.62527/joiv.9.5.3412

Abstract

Indonesia has an extraordinary number of languages, with more than 700 regional languages such as Javanese, Madurese, Balinese, Sundanese, and Bugis. Despite the wealth of languages, digital resources for these languages remain scarce, making the preservation and accessibility of digital languages a significant challenge. Research was conducted to address this gap by building a multilingual parallel corpus consisting of more than 150,000 phrase pairs extracted from Bible translations in five regional languages in Indonesia. Rigorous preprocessing, normalization, and Unicode tokenization were performed to improve data quality and consistency. The encoder-decoder architecture was a key focus in the development of the NMT model. Evaluation focused on forward and backward translation directions, which were measured using BLEU scores. The results show that forward translation consistently outperforms backward translation. The Indonesian Javanese model produced a score of 0.9939 for BLEU-1 and 0.9844 for BLEU-4, indicating a high level of translation quality. In contrast, reverse translation tasks, such as translating from Sundanese to Indonesian, presented significant challenges, with BLEU-4 scores as low as 0.3173. This illustrates the complexity of the translation system from Indonesian to local languages. If future research focuses on transformer-based models and incorporates additional linguistic parameters to enhance the accuracy of natural language processing (NLP) models for Indonesia's underrepresented regional languages, this work provides a dataset that can be utilized for that purpose.
Comparison of Adam Optimization and RMS prop in Minangkabau-Indonesian Bidirectional Translation with Neural Machine Translation Ahda, Fadhli Almu'iini; Wibawa, Aji Prasetya; Dwi Prasetya, Didik; Arbian Sulistyo, Danang
JOIV : International Journal on Informatics Visualization Vol 8, No 1 (2024)
Publisher : Society of Visual Informatics

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.62527/joiv.8.1.1818

Abstract

Language is a tool humans use to establish communication. Still, the language used is one language and between regions or nations with their languages. Indonesia is a country that has a diversity of second languages and is the fourth most populous country in the world. It is recorded that Indonesia has nearly 800 regional languages, but research activities in natural language processing are still lacking. Minangkabau is an endangered language spoken by the Minangkabau people in Indonesia's West Sumatra province. According to UNESCO, the Minangkabau language is listed as a language that is "definitely endangered," with only around 5 million speakers worldwide. This study uses neural machine translation (NMT) to create a formula based on this information. Neural machine translation, in contrast to conventional statistical machine translation, intends to build a single neural network that can be built up to achieve the best performance. Because it can simultaneously hold memory for a long time, comprehend complicated relationships in data, and provide information that is very important in determining the outcome of translation, LSTM is one of the most powerful machine-learning techniques for translating languages. The BLUE score is utilized in the NMT evaluation. The test results use 520 Minangkabau sentences, conducting tests based on the number of epochs ranging from 100-1000, resulting in optimization using Adam being better than optimization RMSprop. This is evidenced by the results of the best BLUE-1 score of 0.997816 using 1000 epochs.