Timely graduation prediction is a crucial issue in higher education, especially when academic, demographic, and behavioral factors interact in complex ways. However, many previous studies rely on default machine learning (ML) parameters and fail to consider the class imbalance problem, leading to suboptimal predictions. This study aims to build a comprehensive framework to evaluate the effectiveness of seven ML algorithms, which are AdaBoost, K-Nearest Neighbors, Naïve Bayes, Neural Network, Random Forest, SVM-RBF, and XGBoost, for predicting graduation on time by incorporating five resampling techniques and hyperparameter tuning. Resampling methods include Random Undersampling (RUS), Random Oversampling (ROS), SMOTENC, and two hybrid approaches (RUS-ROS and SMOTENC-RUS). Hyperparameter tuning was conducted using Grid Search, and model performance was evaluated through cross-validation and hold-out methods. The results show that Random Forest combined with RUS-ROS achieved the best performance, with an average metric score of 0.948. Statistical analysis using PERMANOVA (p = 0.009) and Bonferroni's post-hoc pairwise tests confirmed significant differences between certain models. This study contributes to the educational data mining literature by demonstrating that combining resampling and hyperparameter tuning improves classification performance in imbalanced educational datasets.
Copyrights © 2025