The purpose of this study is to describe the validity, reliability, discriminating power, and difficulty level of an RPA-based assessment instrument designed to measure students' higher-order thinking skills. Using the 4-D model: Define, Design, Development, and Disseminate the instrument, consisting of 30 questions, was tested on 240 Grade VIII students in several Junior High Schools in Jember during the odd semester of the 2024/2025 academic year. From the 25 essay questions tested, an average score of 81% was achieved, with all questions meeting high validity criteria, indicating that the instrument measured the intended constructs accurately. The instrument demonstrated high reliability, with a coefficient of 0.86, ensuring consistent and dependable results. Its discriminating power was strong, with 20 questions rated as very good and 5 as good in distinguishing student abilities, ensuring meaningful differences in performance. The distribution of question difficulty levels, namely 9 easy, 11 medium, and 5 hard provided a balanced challenge to assess a comprehensive range of student abilities. This study contributes to authentic assessment design by integrating real-world images to foster higher-order thinking skills and can serve as a benchmark for further research or application.
Copyrights © 2025