The global higher education landscape is becoming increasingly competitive in attracting outstanding students, qualified faculty, and international research collaborations. University ranking systems serve as strategic instruments for assessing institutional performance and as a basis for public policy. However, traditional ranking approaches employing linear aggregate scores often oversimplify the complex relationships among indicators such as research, internationalization, and graduate outcomes. This study develops a data-driven predictive model to map the non-linear relationships among university performance indicators. The research employs a quantitative predictive analytics approach using a dataset of 52 Japanese universities from the 2024–2026 period, encompassing the variables Research_Impact_Score, Employment_Rate, Intl_Student_Ratio, Institution_Age, Institution_Type, and Region, with National_Rank as the target variable. The research stages include data preprocessing (handling missing values, encoding, scaling), feature engineering (including Institutional Age), regression model development (Linear, Ridge, Lasso, SVR) as well as ensemble models (Random Forest and Gradient Boosting), evaluation using RMSE, MAE, and R², and explainable analysis based on feature importance. The results indicate that the Gradient Boosting model delivers the best performance with an RMSE of 1.175117, MAE of 1.087856, and R² of 0.994988, followed by Random Forest with an RMSE of 1.436536 and R² of 0.992510. Traditional linear regression models demonstrate significantly lower performance (R² 0.657519), confirming the superiority of non-linear approaches in modeling complex relationships among indicators. Stability testing using K-Fold Cross Validation yields an average RMSE of 1.1045 with a difference of 0.4493 between folds, indicating model consistency. Feature contribution analysis reveals that Research_Impact_Score is the dominant factor with a contribution of 97.94%, followed by Employment_Rate at 1.81%, while internationalization indicators and geographical factors contribute minimally. These findings confirm that research performance constitutes the primary determinant of university rankings, whereas employability and internationalization serve as supporting factors. This study demonstrates that ensemble-based machine learning models are effective in predicting national rankings accurately and interpretably. This approach offers a multidimensional evaluation framework that is more representative than linear aggregate scores, and provides policy implications for enhancing research quality, curriculum relevance, and internationalization strategies of higher education institutions.
Copyrights © 2026