Claim Missing Document
Check
Articles

Found 1 Documents
Search

Implementation of IndoRoBERTa to Improve the Clarity of the Context of Homograph Words in the Text-to-Speech System for Education Chatbot Early Marriage in Lombok Fikri Rahmanda Noor; Rifki Wijaya; Ade Romadhony
Indonesian Journal on Computing (Indo-JC) Vol. 10 No. 2 (2026): February, 2026
Publisher : Telkom University

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.21108/indojc.v10i2.9709

Abstract

This study presents the implementation of IndoRoBERTa, a pre-trained Indonesian language model, to improve the contextual clarity of homograph words in Text-to-Speech (TTS) systems, particularly for virtual chatbot applications addressing early marriage education in Lombok. The proposed system integrates IndoRoBERTa into the TTS pipeline to classify the context of homographs prior to grapheme-to-phoneme (G2P) conversion, ensuring accurate pronunciation based on meaning. The research was conducted in two fine-tuning phases: the first utilized 500 manually labeled conversational samples, achieving 96% test accuracy, while the second expanded the dataset with 2,000 auto-labeled samples and yielded 88% accuracy. Evaluation metrics including precision, recall, and F1-score demonstrated the model’s effectiveness across 20 homograph categories. Despite strong results, the study acknowledges limitations in data authenticity and challenges in underrepresented classes. Future work is recommended to incorporate real-world dialogue data and enhance the system’s generalization in more complex linguistic settings. This research contributes to the advancement of Indonesian NLP in TTS systems, particularly in socially impactful educational contexts.