The Indonesian Journal of Computer Science
Vol. 14 No. 3 (2025): The Indonesian Journal of Computer Science

A Culture-Aware Bidirectional IsiXhosa-English Neural Machine Translation Model Using MarianMT

Moape, Tebatso (Unknown)
Mohale, Thuto Siyamthanda (Unknown)
Bester, Chimbo (Unknown)



Article Info

Publish Date
09 Jun 2025

Abstract

Machine translation for low-resource African languages faces significant challenges due to limited data availability and complex linguistic features such as rich morphology, agglutinative grammar, and rich cultural expressions. This study proposes and develops a culturally aware machine translation model for isiXhosa-English language pairs using the MarianMT transformer-based model. We combine traditional parallel corpora with culturally enriched datasets, addressing the unique challenges of isiXhosa's linguistic intricacies. The proposed model was trained on a carefully curated dataset of 127,690 parallel sentences and used SentencePiece tokenization for handling agglutinative morphology. Our approach achieved a BLEU score of 58.79, marking a substantial improvement over previous methods, typically scoring between 20.9 and 37.11. The results demonstrate that integrating cultural context and linguistic specificities into the translation model substantially improves translation quality for low-resource languages. The study's findings suggest that considering cultural context, combined with appropriate model architecture and data preprocessing strategies, can lead to more accurate and culturally aware machine translation systems.

Copyrights © 2025






Journal Info

Abbrev

ijcs

Publisher

Subject

Computer Science & IT Electrical & Electronics Engineering Engineering

Description

The Indonesian Journal of Computer Science (IJCS) is a bimonthly peer-reviewed journal published by AI Society and STMIK Indonesia. IJCS editions will be published at the end of February, April, June, August, October and December. The scope of IJCS includes general computer science, information ...