Claim Missing Document
Check
Articles

Found 2 Documents
Search

IMPACT OF FEATURE SELECTION ON DECISION TREE AND RANDOM FOREST FOR CLASSIFYING STUDENT STUDY SUCCESS Satiranandi Wibowo, Firdaus Amruzain; Retnawati, Heri; Sakti, Muhammad Lintang Damar; Khoirunnisa, Asma; Batubara, Angella Ananta; Berlian, Miftah Okta; Ibrahim, Zulfa Safina; Jailani, Jailani; Sumaryanto, Sumaryanto; Prasojo, Lantip Diat
BAREKENG: Jurnal Ilmu Matematika dan Terapan Vol 19 No 3 (2025): BAREKENG: Journal of Mathematics and Its Application
Publisher : PATTIMURA UNIVERSITY

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30598/barekengvol19iss3pp2083-2096

Abstract

The advancement of technology has a profound impact on the field of education. Education plays a crucial role in enhancing quality of life, particularly in higher education, where one of the key parameters is student success. This study investigates the influence of feature selection on the performance of machine learning models, particularly Decision Tree and Random Forest, in classifying student academic success. Utilizing a dataset of 19,061 students, the research aims to identify significant variables impacting classification outcomes. Feature selection was conducted using LASSO regression, resulting in a refined dataset of critical predictors. To address data imbalance, Synthetic Minority Over-sampling Technique (SMOTE) was applied, improving the representation of underrepresented classes. Both Decision Tree and Random Forest models were trained on balanced datasets, with performance evaluated using accuracy, precision, recall, and F1-score metrics. The Random Forest model demonstrated superior accuracy (96.41%) compared to the Decision Tree (67.15%), as well as higher AUC values. Model interpretability was enhanced using SHAP (SHapley Additive exPlanations). This study underscores the utility of advanced machine learning techniques in educational analytics, paving the way for data-driven decision-making to support student achievement.
THE EFFECT OF SAMPLE SIZE ON THE STABILITY OF XGBOOST MODEL PERFORMANCE IN PREDICTING STUDENT STUDY PERIOD Damar Sakti, Muhammad Lintang; Jailani, Jailani; Retnawati, Heri; Hidayati, Kana; Waryanto, Nur Hadi; Ibrahim, Zulfa Safina; Khoirunnisa, Asma’; Satiranandi Wibowo, Firdaus Amruzain; Berlian, Miftah Okta; Batubara, Angella Ananta
BAREKENG: Jurnal Ilmu Matematika dan Terapan Vol 19 No 4 (2025): BAREKENG: Journal of Mathematics and Its Application
Publisher : PATTIMURA UNIVERSITY

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30598/barekengvol19iss4pp2679-2692

Abstract

Student success can be defined based on the period of study taken until graduation from college. Machine learning can be used to predict the factors that are thought to influence student success. To achieve optimal machine learning model performance, attention is needed on the sample size. This study aims to determine the effect of student sample size on the stability of model performance to predict student success. This research is quantitative. The data used is student data from a university in Yogyakarta from 2014 to 2019, totaling 19061 students. The target variable is the student study period in months, while the predictor variables are college entrance pathways, GPA from semester 1 to semester 6, and family socioeconomic conditions based on the father’s and mother’s income. This research uses the XGBoost model with the best hyperparameters and the bootstrap approach. Bootstrapping was performed on the original data by sampling twenty different sample sizes: 250, 500, 750, 1000, 1250, 1500, 1750, 2000, 2250, 2500, 2750, 3000, 3250, 3500, 3750, 4000, 4250, 4500, 4750, and 5000. The resulting bootstrap samples were replicated ten times. Model performance evaluation uses the Root Mean Square Error (RMSE) value. The result of this research is the XGBoost model with the best hyperparameters, obtained through the training data division scheme of 90% and testing data of 10%, which has the smallest RMSE value of 8.318. The model uses the best hyperparameters: n_estimators of 75, max_depth of 8, min_child_weight of 5, eta of 0.07, gamma of 0.2, subsample of 0.8, and colsample_bylevel of 1. The XGBoost model with optimal hyperparameters demonstrates peak performance stability at a sample size of 1750 students, as evidenced by consistent RMSE values across 10 bootstrap replications, confirming that this data quantity provides the ideal balance between prediction accuracy and stability for estimating study duration.