Student dropout in STEM programs remains a persistent challenge for higher education institutions, reducing educational quality, weakening retention outcomes, and increasing inefficiencies in resource utilization. This study develops an interpretable Stacking Ensemble Learning approach to predict STEM student dropout risk and identify key academic and socioeconomic determinants that can support data-driven early intervention. Following the CRISP-DM framework, we analyze 3,630 student records from the UCI Machine Learning Repository containing demographic, academic, and socioeconomic attributes. The proposed stacking architecture combines Random Forest, Gradient Boosting, and XGBoost as base learners with Logistic Regression as a meta-learner, while SMOTE–Tomek Links is employed to address class imbalance and reduce boundary noise. Experimental results show that the model achieves strong predictive performance with 90.91% accuracy and ROC–AUC of 95.72%, demonstrating stable discrimination and outperforming individual base models. Feature importance analysis indicates that early academic trajectory variables—especially first- and second-semester success rates, total approved units, and average grades—are the most influential predictors of dropout risk. The proposed framework contributes a practical, interpretable early warning model by integrating stacking ensemble learning with imbalance handling and trajectory-based feature engineering, supporting actionable intervention planning in higher education.
Copyrights © 2026