EDUVELOP (Journal of English Education and Development)
Vol 9 No 1 (2026): Eduvelop: Journal of English Education and Development

Prompt Engineering to CEFR Alignment: Investigating Generative AI for the Creation of English Listening Assessments

Wigati, Fikri Asih (Unknown)
Putri Kamalia Hakim (Unknown)
Nia Pujiawati (Unknown)
Maya Rahmawati (Unknown)



Article Info

Publish Date
31 Mar 2026

Abstract

Meeting the increasing demand for internationally benchmarked English listening exams is difficult, especially in educational settings with limited resources. In a human–AI collaboration framework, this study investigates the feasibility of using generative artificial intelligence, specifically ChatGPT-4, to support the early development of English listening scripts and test items aligned with the CEFR. Using an exploratory research design, the study generated 20 listening scripts and matching multiple-choice questions across CEFR reference levels A2, B1, B2, and C1 using an iterative prompt engineering technique called Progressive-Hint Prompting (PHP). The produced materials were examined using Text Inspector's descriptive linguistic metrics, which included qualitative assessments of spoken discourse characteristics, topical coverage, and distractor plausibility, as well as lexical profile, readability, and script length. The results show that when guided by structured prompts and ongoing human evaluation, ChatGPT-4 can perform well as a drafting aid. The created scripts demonstrated systematic linguistic variance across CEFR reference levels, particularly in lexical range and text complexity. Nevertheless, several drawbacks were noted, including unequal topical distribution, decreased pragmatic naturalness at higher competence levels, and inconsistent calibration of spoken discourse features. To ensure that distractors were text-based and aligned with assessment criteria, item quality needed to be refined iteratively. These results imply that iterative human-AI interaction, rather than automated generation alone, determines the quality of AI-generated listening materials. The study emphasizes the ongoing importance of professional human oversight while highlighting the potential of generative AI as a resource-efficient support tool for the development of listening assessments. To investigate the efficacy of AI-assisted materials in operational assessment contexts, future research should focus on empirical validation with test takers.

Copyrights © 2026






Journal Info

Abbrev

eduvelop

Publisher

Subject

Arts Humanities Education Languange, Linguistic, Communication & Media

Description

EDUVELOP (Journal of English Education and Development) Universitas Sulawesi Barat invites the researchers, academics, to publish the results of research related to English education. This journal will be publishing the articles September 2018 with ISSN 2597-713X (print), ISSN 2597-7148 ...