This study investigates the effect of vowel-reduction dubbing practice on the Word Error Rate (WER) of automatic speech recognition (ASR)-generated captions in Zoom-based English presentations. Employing a convergent mixed-methods design, the research examines how misarticulation related to vowel reduction—particularly the realization of schwa—shapes ASR performance and caption accuracy. Twentyseven English teacher candidates completed structured dubbing tasks followed by ASR transcription analysis using Zoom’s auto-captioning system. WER was calculated using standard computational formulae, and both quantitative data (percentages, accuracy ratios, and error rates) and qualitative data (error types, substitution patterns, and prosodic deviations) were systematically analyzed. Results indicate that substitution errors remained the most dominant ASR error type, especially when vowel reduction occurred without adequate prosodic support. A substantial improvement was observed following the intervention: the average WER decreased by approximately 52% (from 27.4% to 13.2%), demonstrating a strong effect of targeted dubbing practice on caption intelligibility. Participants who maintained clearer vowel quality and rhythmic timing achieved consistently lower WER scores, while those applying excessive or inconsistent vowel reduction produced higher rates of substitution and deletion errors. The study highlights the pedagogical value of integrating ASR-based feedback with dubbing activities to strengthen learners’ pronunciation intelligibility and schwa awareness. It further underscores the importance of TPACK-informed instruction that cultivates learners’ sensitivity to both human comprehensibility and machine recognition. Overall, these findings contribute to ongoing developments in pronunciation pedagogy by bridging technology-enhanced learning with EFL teacher education.
Copyrights © 2025