Journal of Applied Data Sciences
Vol 7, No 2: May 2026

Hybrid Machine Learning for Early Prediction of At-Risk Students with Imbalanced Data

Esti Wijayanti (Doctoral Program of Information Systems, Diponegoro University, Semarang, 50275, Indonesia)
Widowati Widowati (Universitas Diponegoro)
Catur Edi Widodo (Universitas Diponegoro)



Article Info

Publish Date
31 May 2026

Abstract

The phenomenon of student dropout remains a major challenge for higher education institutions because it impacts academic performance and institutional reputation. Identification of students at risk of dropping out is often hampered by data imbalance, where the number of dropouts is far fewer than active students, so conventional prediction models tend to be biased towards the majority class. This study aims to develop an accurate and reliable prediction framework for students at risk of dropping out to detect at-risk students through a hybrid machine learning approach with data balancing techniques. The main contribution of this study is the integration of Support Vector Machine and Extreme Gradient Boosting in a stacked ensemble architecture supported by data balancing optimization techniques. The proposed model leverages the ability of Support Vector Machine to separate complex classification patterns, while Extreme Gradient Boosting improves prediction accuracy through iterative learning and modeling interactions between variables. The problem of data imbalance is addressed through oversampling techniques for the minority class so that the model learning process becomes more balanced. The model framework is tested using a dataset consisting of 3,652 students with academic, socioeconomic, and behavioral variables. Experimental results show that the proposed hybrid model outperforms the single model, with an accuracy rate of 97 percent, a precision rate of 94 percent, and a recall rate of 95 percent. These findings suggest that a combination of complementary machine learning methods, coupled with data optimization, can significantly improve the predictive ability of student dropout. The practical implication of this research is the availability of a robust decision support system for universities in designing timely and targeted interventions. By identifying students at risk of dropping out, institutions can strengthen retention strategies, improve student academic success, and reduce dropout rates more effectively.

Copyrights © 2026






Journal Info

Abbrev

JADS

Publisher

Subject

Computer Science & IT Control & Systems Engineering Decision Sciences, Operations Research & Management

Description

One of the current hot topics in science is data: how can datasets be used in scientific and scholarly research in a more reliable, citable and accountable way? Data is of paramount importance to scientific progress, yet most research data remains private. Enhancing the transparency of the processes ...