This research analyzes changing patterns of human response to machine speech within the framework of education-enabled AI technologies. This phenomenon is framed within the Media Equation Theory, Cognitive Load Theory, and the Uncanny Valley Hypothesis to study the effects of prolonged exposure to synthetic voices on emotion and trust. With a phenomenological approach, the author interviewed and asked ten Indonesian university students to respond to reflective journals. Students reported that AI voices may facilitate task-focused listening (e.g., to provide the definitions or pronounce words) but do not engage emotionally. Their lack of response adaptability and dynamic responsiveness evoke passive participation, attention drifting, and trust erosion. Students reported feeling emotionally detached, discomfort with hyper-realistic voice mimicry, and cognitive strain attributable to monotonically robotic pacing and tones. Compliance deviated from the voice of authority phenomenon; stronger collective participants expressed a greater need for emotionally responsive signalling. AI tools, while functionally convenient, do not inspire trust as conversational agents. This highlights the need for redesigning listening engagement interfaces and educational AI that adapt not only to the content but also to the user’s emotional context. Research on cross-cultural preferences and the dependence of AI voices on listening and communication skills is necessary.
Copyrights © 2025