Journal of Information Systems and Informatics
Vol 6 No 4 (2024): December

Comparison of Conversational Corpus and News Corpus on Gender Bias in Indonesian-English Transformer Model Translation

Wijanarko, Andik (Unknown)
Al Haura, Adzkiyatun Nisa (Unknown)
Puspitaningrum, Indar (Unknown)
Saputra, Dhanar Intan Surya (Unknown)



Article Info

Publish Date
31 Dec 2024

Abstract

Gender bias in machine translation is a significant issue that affects text translation and gender perception, often leading to misunderstandings, such as the tendency to default to using male pronouns. For example, the word "dia" in Indonesian is often translated as "he" rather than "she," even when the context suggests otherwise, as seen in the case of President Megawati. Reducing this bias requires ongoing research, particularly in understanding how different corpora affect translation accuracy. Studies have shown that formal news corpora, which have less gender bias, produce different results compared to conversational corpora that are more informal and exhibit gender bias. This research uses a training dataset of the Indonesian-English conversational parallel corpus from Open Subtitles, which contains many gendered pronouns. Additionally, a news corpus from Tanzil, with fewer gendered words, was also used. These corpora were sourced from Opus, widely used by previous researchers. For the testing dataset, biographies of female presidents were used, which are often translated as masculine by popular machine translation systems by default. Each corpus was trained using a Transformer model, resulting in a translation model. Each sentence from the generated translations was then detected for gender, and compared with the gender of sentences from the test data to evaluate accuracy. The results showed that the accuracy of gender translation from the conversational corpus was 84%, while the news corpus achieved an accuracy of 8%.

Copyrights © 2024






Journal Info

Abbrev

isi

Publisher

Subject

Computer Science & IT

Description

Journal-ISI is a scientific article journal that is the result of ideas, great and original thoughts about the latest research and technological developments covering the fields of information systems, information technology, informatics engineering, and computer science, and industrial engineering ...