Mental health is a significant global concern. Indonesia has reported high rates of depression and anxiety, compounded by limited emotional outlets. Although AI virtual assistants are prevalent in e-commerce and education, their application in mental health remains underexplored. Existing solutions are predominantly text-based and transactional, which restricts empathetic and natural interactions. This study develops a voice-based assistant by integrating Automatic Speech Recognition (ASR), a generative AI for empathetic responses, and a Text-to-Speech (TTS) module fine-tuned on an Indonesian dataset to adapt accent and prosody. The system underwent both technical evaluation and human testing to assess its feasibility and user experience. The results showed that the TTS model converged effectively with low loss. Human evaluation indicated 'good' interaction (MS = 3.91, SD = 0.02), 'good' AI responses (MS = 3.83, SD = 0.26), and 'fair' TTS naturalness (MOS = 3.27, SD = 0.05). Most participants found the assistant's responses meaningful, pleasant, and helpful in managing low to moderate anxiety. These results suggest that a voice-based assistant has the potential to support mental health in Indonesia. Future work should enhance speech naturalness and utilize a larger participant pool for evaluation.
Copyrights © 2025