This study aims to analyze the quality of multiple-choice test items based on Higher Order Thinking Skills (HOTS) in the subjects of Business Economics and Public Administration for 11th-grade students using ANATES software. The analysis was conducted based on four main components: validity, reliability, difficulty level, and discriminating power. The research employed a quantitative descriptive approach with 20 students as subjects. Data were collected through tests developed based on curriculum competency indicators. The results indicate that in terms of validity, most items were classified as low (33.33%) and moderate (30.56%), while very high and very low categories each accounted for only 8.33%. Instrument reliability ranged from moderate to very high, with the highest value of 0.94 on indicator 3 and the lowest value of 0.45 on indicator 7. The difficulty level distribution was dominated by the moderate category (66.67%), followed by difficult (16.67%), easy (9.72%), and very easy (6.94%). Regarding discriminating power, the majority of items fell into the good category (54.17%) and very good (30.56%), although 4.17% of the items had poor discrimination and required revision. Overall, the test instrument met the eligibility criteria; however, improvements are still necessary for items with low validity and low discriminating power to enhance the quality of HOTS-based assessment. ABSTRAK Penelitian ini bertujuan un tuk menganalisis kualitas butir soal pilihan ganda berbasis Higher Order Thinking Skills (HOTS) pada mata pelajaran Ekonomi Bisnis dan Administrasi Umum kelas XI menggunakan software ANATES. Analisis dilakukan berdasarkan empat komponen utama, yaitu validitas, reliabilitas, tingkat kesukaran, dan daya pembeda. Penelitian menggunakan pendekatan deskriptif kuantitatif dengan subjek 20 siswa. Data dikumpulkan melalui tes berbasis indikator kompetensi kurikulum. Hasil penelitian menunjukkan bahwa pada aspek validitas, sebagian besar butir berada pada kategori rendah (33,33%) dan cukup (30,56%), sementara kategori sangat tinggi dan sangat rendah masing-masing hanya mencapai 8,33%. Reliabilitas instrumen berada pada rentang cukup hingga sangat tinggi, dengan nilai tertinggi pada indikator 3 sebesar 0,94 dan terendah pada indikator 7 sebesar 0,45. Tingkat kesukaran memperlihatkan distribusi yang dominan pada kategori sedang (66,67%), diikuti kategori sukar (16,67%), mudah (9,72%), dan sangat mudah (6,94%). Pada aspek daya pembeda, kategori baik mendominasi sebanyak 54,17% dan sangat baik 30,56%, meskipun terdapat 4,17% butir soal dengan daya pembeda jelek yang perlu direvisi. Secara keseluruhan, instrumen soal telah memenuhi kriteria kelayakan, namun masih diperlukan perbaikan pada butir dengan validitas dan daya pembeda rendah untuk meningkatkan kualitas asesmen berbasis HOTS.