This study developed a model for predicting student academic achievement based on learning habits using the XGBoost algorithm and SHAP interpretability techniques. The secondary dataset contains 1,000 entries and 16 variables (for example, hours of study per day, mental health, frequency of exercise, social media use, hours of sleep) pre-processed including cleaning, imputation, encoding, and normalization before being divided into train–test (80:20) and validated using 5-fold CV. Three models were tested: Linear Regression, Random Forest, and XGBoost. Evaluation using RMSE, MAE, and R² showed that XGBoost achieved RMSE = 0.335, MAE = 0.266, and R² = 0.882, while Linear Regression showed the best performance according to R² in certain configurations (R² = 0.888; RMSE = 0.326). SHAP analysis revealed that the most influential features were hours of study per day, mental health scores, exercise frequency, duration of social media use, and hours spent watching Netflix. The findings confirm that students' study habits and psychological conditions are the main determinants of academic achievement variation; the use of interpretable features strengthens the readability of the model for education stakeholders. Research recommendations include testing the model on longitudinal datasets, integrating socioeconomic factors, and implementing data privacy procedures before institutional-scale implementation.
Copyrights © 2026