This research is motivated by the problem in the preparation of Arabic midterm questions that have not followed the correct procedure, so that the quality in measuring Higher Order Thinking Skills (HOTS) has not been guaranteed. This study aims to analyze the HOTS dimensions in Arabic questions, assess the validity, reliability, difficulty level, and differentiating power of the questions, and provide recommendations for developing HOTS-based questions. The method used is descriptive quantitative with the subject 172 students of class XI SMA Nusantara Plus. Data were obtained from 35 multiple-choice questions, which were analyzed using Multidimensional Item Response Theory (MIRT) with R software, and validity and reliability were tested with SPSS. The results showed that 70% of the questions were effective in measuring HOTS, with the dominance of questions measuring the analysis dimension. However, some questions still show bias in the evaluation dimension that needs to be corrected. The validity of the questions was proven with a loading factor > 0.3, and high reliability with Cronbach’s Alpha 0.85. In addition, 50% of the questions have a medium level of difficulty and good differentiating power. In conclusion, the Arabic test questions are quite effective in measuring HOTS, but need improvement in a more even distribution of HOTS dimensions. Further research is recommended to develop more varied and quality HOTS-based questions.
Copyrights © 2025