Machine translation of local languages to Indonesian is a challenge in natural language processing (NLP) due to data limitations and the complexity of language structures. This research aims to build an machine translation model of Tegalan language to Indonesian language using Seq2seq (Sequence to Sequence) based on GRU (Gated Recurrent Unit). A structured methodology is applied to develop the Seq2seq GRU-based machine translation model including the process of parallel corpus building, preprocessing, hyperparameter tuning, model training, model testing, and model evaluation. The model is trained with the Tegalan-Indonesian sentence pair dataset and optimized through experiments with hyperparameter variations including epochs, learning rate, batch size, and dropout. Each variation of the hyperparameters is measured through the BLEU (Bilingual Evaluation Understudy) score. The evaluation results show that increasing the number of epochs increases the accuracy to the optimal point before overfitting occurs. Meanwhile, too small a learning rate slows down convergence, while too large a value causes instability. In addition, smaller batch sizes perform better than larger ones, and higher dropouts improve model generalization. The results show that the configuration of 70 epochs, learning rate 0.005, batch size 64, and dropout 0.5 provides the best performance with a BLEU score of 17.11.
Copyrights © 2024