Journal of Applied Data Sciences
Vol 4, No 3: SEPTEMBER 2023

LSTM-Based Machine Translation for Madurese-Indonesian

Sulistyo, Danang Arbian (Unknown)
Wibawa, Aji Prasetya (Unknown)
Prasetya, Didik Dwi (Unknown)
Ahda, Fadhli Almu'iini (Unknown)



Article Info

Publish Date
03 Sep 2023

Abstract

Madurese is one of the regional languages in Indonesia, which dominates East Java and Madura Island in particular. The use of Madurese as a daily language has declined significantly due to a language shift in children and adolescents, some of which are caused by a sense of prestige and difficulty in learning Madurese. The scarcity of research or scientific titles that raises the Madurese language also helps reduce literacy in the language. Our research focuses on creating a translation machine for Madurese to Indonesian to maintain and preserve the existence of the Madurese language so that learning can be done through digital media. This study use the latest dataset for the Madurese-Indonesian language by using a corpus of 30,000 Madura-Indonesian sentence pairs from the online Bible. This study scrapped online Bible pages to organize the corpus based on the Indonesian and Madurese bilingual Bible. Then This study manually process text to match the two languages' scrapping results, normalization, and tokenization to remove non-printable characters and punctuation from the corpus. To perform neural machine translation (NMT), This study connected the RNN encoder with the RNN decoder of the language model, while for training and testing, This study used a sequential model with LSTM, while the BLEU measure was used to assess the accuracy of the translation results. This study used the SoftMax optimization function with Adam Optimizer and added some settings, including using 128 layers in the training process and adding a Dropout layer so that This study got the average evaluation result for BLEU-1 is 0.798068, BLEU-2 is 0.680932, BLEU-3 is 0.623489, and for BLEU-4 is 0.523546 from five tests conducted. Given the language differences between Madurese and Indonesian, this can be the best approach for machine translation of Indonesian to Madurese.

Copyrights © 2023






Journal Info

Abbrev

JADS

Publisher

Subject

Computer Science & IT Control & Systems Engineering Decision Sciences, Operations Research & Management

Description

One of the current hot topics in science is data: how can datasets be used in scientific and scholarly research in a more reliable, citable and accountable way? Data is of paramount importance to scientific progress, yet most research data remains private. Enhancing the transparency of the processes ...