Evaluation is the process of giving a value to an object based on certain criteria. This study aims to analyze the quality of Arabic language skills test items at SD Islam MI Thoriqul Huda Malang and assess their suitability with the principles of compiling language skills test items. This study uses a qualitative approach with a case study design. Primary data sources include Arabic language skills test documents, while secondary data are in the form of interviews with teachers and other supporting documents. Data collection was carried out through documentation and interviews, and analyzed using the Miles and Huberman model (data reduction, data presentation, and drawing conclusions). The results of the study showed that the test items used covered four main language skills, namely listening, speaking, reading, and writing. The test items were presented in various forms such as multiple choice, essay questions, and performance. The questions were compiled by the teacher with reference to the learning objectives and student abilities, and using a question bank as a reference. Daily assessments were carried out separately for each skill, while summative assessments tended to combine all skills into one test that focused more on reading and writing. Judging from the assessment principles, the test items were generally valid and relevant; however, there were shortcomings in the evaluation of speaking and listening skills due to limited resources available. This study recommends refining evaluation tools, increasing the clarity of instructions, and aligning test formats to optimally measure all language skills.