This study focuses on neural machine translation (NMT) for low-resource languages (LRLs) pair, Batak Toba-Indonesian (bbc↔ind). The Batak Toba language is a critically endangered dialect of an Indonesian ethnic group, Batak. Recent advances in machine translation offer potential solutions, with transfer learning emerging as a promising approach for this language pair. We used a publicly available bbc↔ind parallel corpora from the Hugging Face datasets hub and employed the NLLB-200's distilled 600M variant model as the baseline model. Our models achieved sacreBLEU scores as follows: i) for bbc→ind, it achieved a score of 37.10 (+25.67, up from 11.43) and ii) for ind→bbc, it achieved a score of 30.84 (+25.82, up from 5.02). These results outperform all previous works in the task bbc↔ind machine translation and prove the validity of our approach.
Copyrights © 2024