Student dropout is a critical issue in higher education because it affects institutional performance, resource allocation, and student success. Early identification of students with a high risk of dropout enables institutions to design timely academic and non-academic interventions. However, predicting dropout is challenging due to the complexity of influencing factors and class imbalance in educational data. This study presents a comparative performance analysis of four machine learning algorithms—K-Nearest Neighbor (KNN), Decision Tree (DT), Naive Bayes (NB), and an Ensemble Weighted Voting classifier—to support the development of an effective dropout prediction model. Due to restricted access to complete non-dropout student records, this study integrates real institutional withdrawal data from 2023–2024 to calibrate dropout characteristics and employs a transparently generated synthetic dataset for methodological validation. The dataset consists of 300 instances and is processed using the SMOTE technique to address class imbalance. Model performance is evaluated using accuracy, precision, recall, F1-score, and AUC. The experimental results obtained from synthetic validation indicate that the ensemble model outperforms individual classifiers, achieving an accuracy of 0.97, precision of 1.00, recall of 0.86, F1-score of 0.92, and AUC of 0.93. These findings highlight the potential of ensemble learning as a robust approach for early-warning systems in higher education while providing a transparent framework for predictive modeling under data-access constraints.
Copyrights © 2026