On-time student graduation is one of the important indicators in evaluating higher education performance. However, the classification of on-time graduation in academic data often faces the problem of imbalanced class distribution, which can affect the performance of machine learning models. This study aims to develop a classification model for student on-time graduation using the CatBoost algorithm optimized through Grid Search Cross-Validation within the Educational Data Mining framework. The research dataset consists of 951 students with a class imbalance ratio of 1.83:1. Imbalance handling was performed using the Synthetic Minority Oversampling Technique (SMOTE), while hyperparameter optimization was conducted using stratified 5-fold cross-validation with a total of 108 parameter combinations. The results show that the optimized CatBoost model achieved better performance than the default model, with an accuracy of 91.10 percent, a weighted F1-score of 91.20 percent, and a ROC-AUC of 96.09 percent, improving from the default model’s accuracy of 89.01 percent, weighted F1-score of 89.20 percent, and ROC-AUC of 95.39 percent. Feature importance analysis shows that accumulated credits and academic performance in the middle to final semesters are the most influential factors in the classification results. This study demonstrates that hyperparameter optimization plays an important role in improving CatBoost performance on academic data with imbalanced class distribution. The developed model represents classification based on students’ longitudinal academic records, rather than an early prediction system based on initial-semester data, because the features with the highest predictive contribution come from the middle to final semesters.
Copyrights © 2026