International Journal of Advances in Applied Sciences
Vol 13, No 4: December 2024

Batak Toba language-Indonesian machine translation with transfer learning using no language left behind

Samuel, Cevin (Unknown)
Ali, Irsan Taufik (Unknown)



Article Info

Publish Date
01 Dec 2024

Abstract

This study focuses on neural machine translation (NMT) for low-resource languages (LRLs) pair, Batak Toba-Indonesian (bbc↔ind). The Batak Toba language is a critically endangered dialect of an Indonesian ethnic group, Batak. Recent advances in machine translation offer potential solutions, with transfer learning emerging as a promising approach for this language pair. We used a publicly available bbc↔ind parallel corpora from the Hugging Face datasets hub and employed the NLLB-200's distilled 600M variant model as the baseline model. Our models achieved sacreBLEU scores as follows: i) for bbc→ind, it achieved a score of 37.10 (+25.67, up from 11.43) and ii) for ind→bbc, it achieved a score of 30.84 (+25.82, up from 5.02). These results outperform all previous works in the task bbc↔ind machine translation and prove the validity of our approach.

Copyrights © 2024






Journal Info

Abbrev

IJAAS

Publisher

Subject

Earth & Planetary Sciences Environmental Science Materials Science & Nanotechnology Mathematics Physics

Description

International Journal of Advances in Applied Sciences (IJAAS) is a peer-reviewed and open access journal dedicated to publish significant research findings in the field of applied and theoretical sciences. The journal is designed to serve researchers, developers, professionals, graduate students and ...