Student dropout is a critical issue in higher education, affecting both institutional performance and student success. This study aims to develop a classification model for predicting student dropout risk and to compare the performance of several machine learning algorithms. A quantitative experimental approach was employed using a dataset that integrates academic records and Learning Management System (LMS) activity. The dataset exhibits imbalanced characteristics, with approximately 20% of instances belonging to the dropout class. The classification algorithms evaluated in this study include Naïve Bayes, Decision Tree, Random Forest, and K-Nearest Neighbor (KNN). Model performance was assessed using Accuracy, Precision, Recall, and ROC-AUC metrics to ensure a comprehensive evaluation. The results indicate that Naïve Bayes achieved the best performance with an accuracy of 86.40% and a ROC-AUC value of 0.934, followed by Random Forest with a ROC-AUC of 0.907. All models demonstrated high recall values (above 90%), indicating strong capability in identifying students at risk of dropout. These findings highlight the importance of selecting appropriate algorithms and evaluation metrics when dealing with imbalanced datasets. This study contributes by utilizing a more realistic dataset with noise and imbalance, as well as integrating academic and behavioral data to improve prediction performance. The proposed approach can support early intervention strategies to reduce student dropout rates in higher education.
Copyrights © 2026