Garuda - Garba Rujukan Digital

Article Per Year (5 Year)

p-Index From 2021 - 2026

0.23

P-Index

This Author published in this journals

All Journal Journal of Applied Artificial Intelligence in Education

Qi Xin

University of Pittsburgh

Author-ID : 9880987

Computer Science & IT Education

Published : 1 Documents Claim Missing Document

Claim Missing Document

Articles

Auditable Automated Essay Scoring and Formative Feedback: A Rubric-Grounded Pipeline for Secondary and Higher Education Qi Xin
Journal of Applied Artificial Intelligence in Education Vol 2, No 1 (2026): July 2026
Publisher : Academic Bright Collaboration

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.66053/jaaie.v2i1.348

Automated essay scoring in education is increasingly expected to do more than reproduce human holistic scores; classroom use also demands rubric-aligned feedback, transparent evidence, and a way to route uncertain cases to teachers. In this study, “LLM-ready” refers to a system that outputs structured score evidence, weak-trait signals, and document-level anchors that can later be verbalized by a language model without changing the underlying decision trace. This study aimed to evaluate whether a rubric-grounded, LLM-ready pipeline can achieve competitive scoring accuracy while also generating auditable formative feedback and a teacher-controllable review signal. The evaluation used the public ASAP corpus of 12,976 essays across eight prompts and prompt-wise five-fold cross-validation. Four holistic scorers were compared: length-only, rubric forest, prompt-adaptive centroid regressor (PACR), and the final RG-Score ensemble with trait grounding, isotonic calibration, and audit control. Auxiliary analytic scoring was examined on Prompts 2 and 7–8, and feedback experiments were conducted on all 2,292 essays from Prompts 7 and 8. PACR obtained the highest macro QWK of 0.739, while RG-Score reached 0.738 and provided a calibrated, auditable path to feedback. The prompt-level QWK for RG-Score ranged from 0.66 to 0.82, with particularly strong gains on Prompts 6 and 7. Auxiliary analytic scoring yielded QWK values of 0.623 for Prompt 2 domain2, 0.604 on average for Prompt 7 traits, and 0.506 on average for Prompt 8 traits. The rubric-grounded evidence feedback template achieved a Trait Recall@2 of 0.829, a valid evidence rate of 0.912, and an auditability index of 0.893 on Prompts 7 and 8. These findings support rubric-grounded AES as a practical assessment-support approach for secondary-school writing and as a structured foundation for higher-education formative feedback workflows, while also indicating that weaker trait models should be treated as advisory rather than fully autonomous.

Co-Authors

Title

Found 1 Documents
Search

Abstract

Title Search

Found 1 Documents Search

Abstract

Title

Found 1 Documents
Search