Student graduation prediction supports early academic intervention but commonly suffers from class imbalance, where on-time graduates dominate the dataset. This study evaluates five classifiers—Random Forest (RF), XGBoost, Logistic Regression (LR), k-Nearest Neighbors (k-NN), and Gaussian Naïve Bayes (GNB)—under five class-imbalance handling scenarios: Baseline (no resampling), Random Undersampling (RUS), SMOTE, ADASYN, and Borderline-SMOTE. Experiments were conducted on 796 student records (10 attributes) with an imbalanced distribution (634 on-time vs. 162 not on-time; ratio 1:3.9) using Stratified 5-Fold Cross-Validation. Performance was assessed using confusion-matrix metrics and AUC-ROC to reflect minority-class detection. Under baseline, RF achieved the highest accuracy (0.873) but limited minority recall (0.573), confirming majority-class bias. Resampling consistently improved minority recall across models; for example, LR recall increased to 0.802 with RUS, while GNB reached 0.833 with ADASYN, although accuracy decreased due to the sensitivity–specificity trade-off. Overall, RF and XGBoost showed the most stable discrimination across resampling scenarios based on AUC (RF: 0.870–0.883; XGBoost: 0.847–0.866). The main contribution is a systematic, reproducible comparative evaluation of classifier–resampling combinations for imbalanced graduation prediction, providing practical guidance for selecting robust models to identify students at risk of delayed graduation.
Copyrights © 2026