This study aims to examine the effect of Peer Instruction combined with Concept Tests on reducing students’ misconceptions in model evaluation topics, specifically Confusion Matrix and ROC-AUC. Students frequently misinterpret evaluation metrics, particularly in imbalanced datasets, leading to flawed analytical reasoning. This study argues that structured peer discussion and targeted conceptual questioning significantly reduce such misconceptions compared to conventional lecture-based instruction. Design/methods/approach – A quasi-experimental non-equivalent control group pretest–posttest design was employed involving 68 undergraduate students (35 experimental, 33 control) enrolled in a Machine Learning course. A validated two-tier diagnostic test consisting of 20 items was used to measure misconceptions. The experimental group received Peer Instruction with 15 Concept Tests across three sessions, while the control group received conventional lectures. Data were analyzed using paired and independent samples t-tests and normalized gain (α = 0.05). Findings – The experimental group’s misconception level decreased from 58.43% to 21.57%, while the control group decreased from 56.88% to 39.64%. The normalized gain was significantly higher in the experimental group (g = 0.74) compared to the control group (g = 0.38), t(66) = 11.62, p < 0.001, with a large effect size (d = 1.82). Research implications/limitations – The study was limited to one institution and short-term intervention, which may restrict generalizability and long-term retention conclusions. Originality/value – This study provides empirical evidence supporting the effectiveness of Peer Instruction in machine learning education and introduces a diagnostic framework for measuring misconception reduction in Confusion Matrix and ROC-AUC topics.