Journal of Applied Data Sciences
Vol 7, No 1: January 2026

A Stacking Ensemble Model for Predicting Student High School Graduation Outcomes

Fitriyani, Fitriyani (Unknown)
Alkodri, Ari Amir (Unknown)
Aswin, Fajar (Unknown)



Article Info

Publish Date
19 Dec 2025

Abstract

This study develops and evaluates machine learning models to predict high school graduation outcomes and identify at-risk students for early intervention. Using a quantitative approach, data from 1,017 students across three public high schools were analyzed, encompassing academic performance (average yearly scores), behavioral factors (attendance rates and extracurricular participation), and socio-economic background (proxied by parental occupation). A comparative modeling strategy was applied, beginning with a Decision Tree baseline and advancing to a Stacking Ensemble model that integrated three heterogeneous base learners—Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), and Decision Tree—combined through a Logistic Regression meta-model. Both models were optimized using GridSearchCV and adjusted for class imbalance between graduates (93.4%) and at-risk students (6.6%). The results showed that academic variables, particularly third-year average scores (mean = 82.6, SD = 6.4) and attendance rate (mean = 94.3%), were the strongest predictors of graduation, while socio-economic indicators had minimal impact. The Stacking Ensemble achieved a notable improvement over the Decision Tree, reaching an accuracy of 99.6%, precision of 0.909, recall of 1.000, F1-score of 0.952, and AUC of 1.000, compared to the baseline accuracy of 94.9% (F1-score = 0.519, AUC = 0.83). These findings indicate the superior predictive capability of the ensemble model in identifying students at risk of non-graduation. The study’s novelty lies in combining interpretable and high-performance models to construct a practical early-warning framework that can guide educators and policymakers in targeted academic interventions. However, the near-perfect metrics also suggest potential overfitting, emphasizing the need for validation using external datasets before broader application. Overall, this research contributes a robust, data-driven methodology for improving student retention through predictive analytics in educational settings.

Copyrights © 2026






Journal Info

Abbrev

JADS

Publisher

Subject

Computer Science & IT Control & Systems Engineering Decision Sciences, Operations Research & Management

Description

One of the current hot topics in science is data: how can datasets be used in scientific and scholarly research in a more reliable, citable and accountable way? Data is of paramount importance to scientific progress, yet most research data remains private. Enhancing the transparency of the processes ...