Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : Jurnal Teknik Informatika (JUTIF)

Comparison of IndoNanoT5 and IndoGPT for Advancing Indonesian Text Formalization in Low-Resource Settings Firdausillah, Fahri; Luthfiarta, Ardytha; Nugraha, Adhitya; Dewi, Ika Novita; Hafiizhudin, Lutfi Azis; Mumtaz, Najma Amira; Syarifah, Ulima Muna
Jurnal Teknik Informatika (Jutif) Vol. 6 No. 5 (2025): JUTIF Volume 6, Number 5, Oktober 2025
Publisher : Informatika, Universitas Jenderal Soedirman

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52436/1.jutif.2025.6.5.4935

Abstract

The rapid growth of digital communication in Indonesia has led to a distinct informal linguistic style that poses significant challenges for Natural Language Processing (NLP) systems trained on formal text. This discrepancy often degrades the performance of downstream tasks like machine translation and sentiment analysis. This study aims to provide the first systematic comparison of IndoNanoT5 (encoder-decoder) and IndoGPT (decoder-only) architectures for Indonesian informal-to-formal text style transfer. We conduct comprehensive experiments using the STIF-INDONESIA dataset through rigorous hyperparameter optimization, multiple evaluation metrics, and statistical significance testing. The results demonstrate clear superiority of the encoder-decoder architecture, with IndoNanoT5-base achieving a peak BLEU score of 55.99, significantly outperforming IndoGPT's highest score of 51.13 by 4.86 points—a statistically significant improvement (p<0.001) with large effect size (Cohen's d = 0.847). This establishes new performance benchmarks with 28.49 BLEU points improvement over previous methods, representing a 103.6% relative gain. Architectural analysis reveals that bidirectional context processing, explicit input-output separation, and cross-attention mechanisms provide critical advantages for handling Indonesian morphological complexity. Computational efficiency analysis shows important trade-offs between inference speed and output quality. This research advances Indonesian text normalization capabilities and provides empirical evidence for architectural selection in sequence-to-sequence tasks for morphologically rich, low-resource languages.