Dropout among university students remains a major challenge in higher education because it affects study continuity, institutional performance, and the efficiency of academic service planning. This study develops a machine learning–based Early Warning System (EWS) that leverages data available at enrollment and is updated after the first semester. Using the public dataset “Predict Students’ Dropout and Academic Success” (n = 4,424), the original three-class outcome (Dropout, Enrolled, Graduate) is simplified into a binary target, with dropout treated as the positive class. Two feature scenarios are evaluated: (1) enrollment-only for pre-entry screening and (2) enrollment plus first-semester indicators to update risk scores. Three models are compared: class-balanced Logistic Regression, class-balanced Random Forest, and Gradient Boosting. Model performance is assessed using accuracy, precision/recall/F1score for the dropout class, balanced accuracy, and ROC-AUC. Under the enrollment-only setting, Logistic Regression achieves the best early-warning performance (recall = 0.697; F1 score = 0.651). After incorporating first-semester features, performance improves (recall = 0.792; F1score = 0.779). Beyond model comparison, this study adds an operational perspective through confusion-matrix simulation and probability-threshold analysis to balance missed at-risk cases and follow-up workload.
Copyrights © 2025