This study investigates the accuracy of Artificial Intelligence (AI) in assessing the IELTS Academic Writing Task essays by comparing AI-generated and human examiner scores and feedback. Despite the increasing adoption of AI-based assessment tools, limited empirical evidence exists regarding their validity and reliability in high-stakes IELTS writing evaluation. Therefore, this study aims to determine whether significant differences exist between AI and human scoring and to examine the qualitative characteristics of the feedback provided. This research employed a mixed-method explanatory design involving ten participants who completed a computer-based IELTS prediction test. Their essays were independently evaluated by an AI scoring system and a human rater using IELTS band descriptors. Quantitative analysis using a paired-sample t-test measured differences in assigned scores, while qualitative content analysis examined patterns, depth, and focus of the feedback provided. The findings indicate a statistically significant difference between AI-generated and human-assigned scores (p = 0.022), with a mean difference of 0.4 points, suggesting that AI tended to assign higher scores. The feedback analysis reveals that AI primarily focuses on technical aspects such as grammar, vocabulary, and sentence structure, offering general improvement suggestions, whereas human feedback demonstrates greater depth and personalization. These results suggest that while AI enhances scoring efficiency, it cannot fully replace human evaluative judgment in complex academic writing assessment.
Copyrights © 2026