Claim Missing Document
Check
Articles

Found 2 Documents
Search
Journal : Journal of Information Systems and Informatics

Recurrent Neural Network-Gated Recurrent Unit for Indonesia-Sentani Papua Machine Translation Achmad, Rizkial; Tokoro, Yokelin; Haurissa, Jusuf; Wijanarko, Andik
Journal of Information System and Informatics Vol 5 No 4 (2023): Journal of Information Systems and Informatics
Publisher : Universitas Bina Darma

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.51519/journalisi.v5i4.597

Abstract

The Papuan Sentani language is spoken in the city of Jayapura, Papua. The law states the need to preserve regional languages. One of them is by building an Indonesian-Sentani Papua translation machine. The problem is how to build a translation machine and what model to choose in doing so. The model chosen is Recurrent Neural Network – Gated Recurrent Units (RNN-GRU) which has been widely used to build regional languages in Indonesia. The method used is an experiment starting from creating a parallel corpus, followed by corpus training using the RNN-GRU model, and the final step is conducting an evaluation using Bilingual Evaluation Understudy (BLEU) to find out the score. The parallel corpus used contains 281 sentences, each sentence has an average length of 8 words. The training time required is 3 hours without using a GPU. The result of this research was that a fairly good BLEU score was obtained, namely 35.3, which means that the RNN-GRU model and parallel corpus produced sufficient translation quality and could still be improved.
Comparison of Conversational Corpus and News Corpus on Gender Bias in Indonesian-English Transformer Model Translation Wijanarko, Andik; Al Haura, Adzkiyatun Nisa; Puspitaningrum, Indar; Saputra, Dhanar Intan Surya
Journal of Information System and Informatics Vol 6 No 4 (2024): December
Publisher : Universitas Bina Darma

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.51519/journalisi.v6i4.918

Abstract

Gender bias in machine translation is a significant issue that affects text translation and gender perception, often leading to misunderstandings, such as the tendency to default to using male pronouns. For example, the word "dia" in Indonesian is often translated as "he" rather than "she," even when the context suggests otherwise, as seen in the case of President Megawati. Reducing this bias requires ongoing research, particularly in understanding how different corpora affect translation accuracy. Studies have shown that formal news corpora, which have less gender bias, produce different results compared to conversational corpora that are more informal and exhibit gender bias. This research uses a training dataset of the Indonesian-English conversational parallel corpus from Open Subtitles, which contains many gendered pronouns. Additionally, a news corpus from Tanzil, with fewer gendered words, was also used. These corpora were sourced from Opus, widely used by previous researchers. For the testing dataset, biographies of female presidents were used, which are often translated as masculine by popular machine translation systems by default. Each corpus was trained using a Transformer model, resulting in a translation model. Each sentence from the generated translations was then detected for gender, and compared with the gender of sentences from the test data to evaluate accuracy. The results showed that the accuracy of gender translation from the conversational corpus was 84%, while the news corpus achieved an accuracy of 8%.