Low accuracy in the early identification of at-risk students often hinders timely academic intervention. This study analyzes and compares seven machine learning algorithms to predict student academic achievement, aiming to provide a foundation for a reliable early warning model. The dataset includes 2.392 students with 15 features covering demographics, learning behavior, and environmental support. Model training was performed using GridSearchCV optimization combined with stratified cross-validation to mitigate overfitting.Performance was evaluated using MAE, RMSE, and R². The results show CatBoost performed the best R² = 0,774; RMSE = 0,581; MAE = 0,306) followed by LightGBM (R² = 0,771) and Gradient Boosting (R² = 0,767), while MLP showed the lowest performance. Feature importance analysis placed GPA as the dominant predictor, followed by absenteeism and weekly study time. These findings affirm the superiority of boosting-based models in capturing complex nonlinear relationships and provide a practical framework for educational institutions to build data-driven early warning systems.
Copyrights © 2025