The high dropout rate of students in higher education is a problem faced by educational institutions, impacting quality assessments and accreditation evaluations by BAN-PT. This study aims to develop an early prediction model of potential dropout students using demographic data with a learning analytics approach. Five classification algorithms are used in this research, namely Random Forest (RF), Decision Tree (DT), Logistic Regression (LR), Light Gradient Boosting Machine (LGBM), and Support Vector Machine (SVM). The dataset used consists of undergraduate student data of Sebelas Maret University in 2013 (n=2476) which is processed through preprocessing techniques, resampling with SMOTE, and validation using K-Fold Cross-Validation. The results showed that the RF model gave the best performance with an accuracy of 96.01%, followed by LGBM (95.26%), DT (91.24%), LR (83.68%), and SVM (83.19%). The use of the Recursive Feature Elimination with Cross-Validation (RFE-CV) feature selection method was able to improve the efficiency of the model by reducing the number of features without significantly degrading performance. The best feature selection was obtained when using 75% features, which provided an optimal balance between the number of features and model accuracy. The most contributing features include IPS_range (Semester GPA range), parents' income, students' regional origin, as well as several other demographic factors. This study contributes to the development of early warning systems in higher education by providing accurate predictive models and identifying key risk factors.
Copyrights © 2025