This study examined the effectiveness of AI-assisted transcription as instructional scaffolding for Arabic listening comprehension in Indonesian pesantren, a context where listening skills receive limited pedagogical attention and empirical research on AI-mediated support remains scarce. While previous studies on artificial intelligence in language learning have focused mainly on vocabulary, grammar, or speaking skills in general EFL settings, the pedagogical role of AI transcription in Arabic listening within faith-based education has been largely unexplored. Using a mixed-methods quasi-experimental design, this study involved forty-five male pesantren students (aged 16–18) divided into experimental and control groups during a twelve-week intervention. Quantitative data were collected through standardized Arabic listening pre-tests and post-tests, while qualitative data were obtained from semi-structured interviews, focus group discussions, classroom observations, and transcription usage logs. Quantitative analysis employed paired and independent samples t-tests with effect size calculation (Cohen’s d), whereas qualitative data were analyzed using thematic analysis. The findings indicate that students supported by AI-assisted transcription achieved significantly higher listening comprehension gains (d = 0.42) and reported reduced anxiety, increased confidence, and greater engagement. The study contributed theoretically by extending scaffolding and cognitive load principles to AI-assisted Arabic listening instruction, and practically by offering a culturally grounded model for integrating AI transcription in pesantren classrooms.