BAREKENG: Jurnal Ilmu Matematika dan Terapan
Vol 19 No 4 (2025): BAREKENG: Journal of Mathematics and Its Application

THE EFFECT OF SAMPLE SIZE ON THE STABILITY OF XGBOOST MODEL PERFORMANCE IN PREDICTING STUDENT STUDY PERIOD

Damar Sakti, Muhammad Lintang (Unknown)
Jailani, Jailani (Unknown)
Retnawati, Heri (Unknown)
Hidayati, Kana (Unknown)
Waryanto, Nur Hadi (Unknown)
Ibrahim, Zulfa Safina (Unknown)
Khoirunnisa, Asma’ (Unknown)
Satiranandi Wibowo, Firdaus Amruzain (Unknown)
Berlian, Miftah Okta (Unknown)
Batubara, Angella Ananta (Unknown)



Article Info

Publish Date
01 Sep 2025

Abstract

Student success can be defined based on the period of study taken until graduation from college. Machine learning can be used to predict the factors that are thought to influence student success. To achieve optimal machine learning model performance, attention is needed on the sample size. This study aims to determine the effect of student sample size on the stability of model performance to predict student success. This research is quantitative. The data used is student data from a university in Yogyakarta from 2014 to 2019, totaling 19061 students. The target variable is the student study period in months, while the predictor variables are college entrance pathways, GPA from semester 1 to semester 6, and family socioeconomic conditions based on the father’s and mother’s income. This research uses the XGBoost model with the best hyperparameters and the bootstrap approach. Bootstrapping was performed on the original data by sampling twenty different sample sizes: 250, 500, 750, 1000, 1250, 1500, 1750, 2000, 2250, 2500, 2750, 3000, 3250, 3500, 3750, 4000, 4250, 4500, 4750, and 5000. The resulting bootstrap samples were replicated ten times. Model performance evaluation uses the Root Mean Square Error (RMSE) value. The result of this research is the XGBoost model with the best hyperparameters, obtained through the training data division scheme of 90% and testing data of 10%, which has the smallest RMSE value of 8.318. The model uses the best hyperparameters: n_estimators of 75, max_depth of 8, min_child_weight of 5, eta of 0.07, gamma of 0.2, subsample of 0.8, and colsample_bylevel of 1. The XGBoost model with optimal hyperparameters demonstrates peak performance stability at a sample size of 1750 students, as evidenced by consistent RMSE values across 10 bootstrap replications, confirming that this data quantity provides the ideal balance between prediction accuracy and stability for estimating study duration.

Copyrights © 2025






Journal Info

Abbrev

barekeng

Publisher

Subject

Computer Science & IT Control & Systems Engineering Economics, Econometrics & Finance Energy Engineering Mathematics Mechanical Engineering Physics Transportation

Description

BAREKENG: Jurnal ilmu Matematika dan Terapan is one of the scientific publication media, which publish the article related to the result of research or study in the field of Pure Mathematics and Applied Mathematics. Focus and scope of BAREKENG: Jurnal ilmu Matematika dan Terapan, as follows: - Pure ...