This study addresses the need for a valid and reliable instrument to assess students’ cognitive abilities in physics, particularly in the areas of temperature and heat, which are often associated with conceptual difficulties and misconceptions. The study aimed to evaluate the quality of a cognitive ability test instrument developed based on Marzano’s taxonomy by applying the Rasch measurement model. A quantitative design was employed, involving 138 high school students in grades XI and XII from one school in Bandung City, comprising 74 females and 64 males. The instrument comprised 25 multiple-choice items representing five cognitive aspects of Marzano’s taxonomy: retrieval, comprehension, analysis, knowledge utilization, and metacognition. Data were analyzed using Winsteps 3.73 to examine item fit, item difficulty, unidimensionality, reliability, person–item distribution, and Differential Item Functioning (DIF) based on gender. The results showed that the instrument generally met Rasch model expectations, with good internal consistency (Cronbach’s alpha = 0.76), very high item reliability (0.92), and fair person reliability (0.69). Most items fit the model, although several items showed overfit or unexpected response patterns and require refinement. The item difficulty distribution was dominated by difficult items; the raw variance explained by the measures was 20.3%, and the Wright map indicated that the instrument was reasonably aligned with students’ ability levels, though it was less optimal for very high-ability students. DIF analysis showed that most items were gender-neutral, while a small number indicated potential differential functioning. The novelty of this study lies in the systematic operationalization of Marzano’s taxonomy into item construction and its evaluation using Rasch analysis in the context of temperature and heat. Overall, the instrument is sufficiently valid for measuring cognitive ability and provides a useful contribution to physics education by offering a psychometrically informed framework for developing more rigorous and meaningful assessment instruments.