This study presents the implementation of IndoRoBERTa, a pre-trained Indonesian language model, to improve the contextual clarity of homograph words in Text-to-Speech (TTS) systems, particularly for virtual chatbot applications addressing early marriage education in Lombok. The proposed system integrates IndoRoBERTa into the TTS pipeline to classify the context of homographs prior to grapheme-to-phoneme (G2P) conversion, ensuring accurate pronunciation based on meaning. The research was conducted in two fine-tuning phases: the first utilized 500 manually labeled conversational samples, achieving 96% test accuracy, while the second expanded the dataset with 2,000 auto-labeled samples and yielded 88% accuracy. Evaluation metrics including precision, recall, and F1-score demonstrated the model’s effectiveness across 20 homograph categories. Despite strong results, the study acknowledges limitations in data authenticity and challenges in underrepresented classes. Future work is recommended to incorporate real-world dialogue data and enhance the system’s generalization in more complex linguistic settings. This research contributes to the advancement of Indonesian NLP in TTS systems, particularly in socially impactful educational contexts.
Copyrights © 2026