Phishing remains a dominant cyber-crime vector in higher-education settings, yet most Indonesian campus studies stop at descriptive awareness surveys. This study sets out (i) to build a fully interpretable predictive model that can classify students’ phishing-awareness levels from a concise questionnaire and (ii) to demonstrate how the model’s rules can be mapped to established behavioural theory for targeted educational intervention. Guided by the Cross-Industry Standard Process for Data Mining (CRISP-DM), we transformed a ten-item phishing-awareness instrument into a 153 × 10 binary matrix drawn from 153 undergraduate responses (82 male; 71 female) and analysed the data with a cost-complexity–pruned Classification-and-Regression Tree (CART). The optimal tree (depth = 5, 19 leaves) achieved 94.9 % accuracy, 93.4 % recall, 95.8 % precision, and a 0.971 ROC-AUC under stratified 10-fold cross-validation—metrics comparable to ensemble methods but obtained with a glass-box structure that exposes explicit IF-THEN rules. The three most salient splits—URL-domain mismatch, urgency cues, and misconceptions about the HTTPS lock icon—directly align with Protection Motivation Theory constructs, providing actionable targets for micro-learning modules. Because the dataset originates from a single campus and governance prerequisites (fairness audit, GDPR impact assessment, SOP alignment) are pending, the model will run in “shadow mode” next term to collect longitudinal evidence and monitor concept drift. Overall, the findings show that concise, theory-grounded instruments combined with pruned decision trees can achieve high predictive power and immediate pedagogical value without sacrificing transparency.
Copyrights © 2025