Journal of Applied Data Sciences
Vol 6, No 4: December 2025

Hybrid Ensemble Learning with SMOTEENN and Soft Voting for Stunting Risk Prediction: A SHAP-Based Explainable Approach

Furqany, Nuwairy El (Unknown)
Subianto, Muhammad (Unknown)
Rusyana, Asep (Unknown)



Article Info

Publish Date
22 Oct 2025

Abstract

Stunting remains a critical public health concern in Indonesia, with long-term consequences for physical growth, cognitive development, and human capital. This study introduces a hybrid machine learning framework to predict household-level stunting risk by integrating Synthetic Minority Over-sampling Technique with Edited Nearest Neighbors (SMOTEENN), soft voting ensemble, and SHapley Additive exPlanations (SHAP). The objective is to enhance both predictive accuracy and interpretability in identifying high-risk households. A dataset of 115,579 household records from West Sumatra, comprising 20 demographic, socioeconomic, health, and housing predictors, was utilized. Preprocessing steps included handling missing values, categorical encoding, and applying SMOTEENN exclusively on the training set to mitigate class imbalance. The baseline models demonstrated limited sensitivity, with XGBoost performing best at 74.56% accuracy and 71.08% F1-score on imbalanced data. After applying SMOTEENN, performance improved substantially, with XGBoost achieving 91.82% accuracy and 91.74% F1-score. Further improvements were obtained through hybridization, where the Random Forest and XGBoost soft voting ensemble reached 91.95% accuracy and 92.46% F1-score, representing a notable gain over individual classifiers. SHAP analysis added interpretability by identifying family members, education level, diverse food consumption, occupation, and drinking water source as dominant predictors of stunting risk. The novelty of this study lies in the integration of SMOTEENN with ensemble learning and SHAP, providing not only robust performance but also transparency in feature contributions. The findings demonstrate that the proposed framework improves sensitivity to minority classes, delivers superior predictive accuracy compared to baseline models, and offers interpretable insights to guide targeted interventions. By combining methodological rigor with explainability, this research contributes a practical decision-support tool for policymakers, supporting early detection of at-risk households and accelerating stunting reduction efforts in Indonesia.

Copyrights © 2025






Journal Info

Abbrev

JADS

Publisher

Subject

Computer Science & IT Control & Systems Engineering Decision Sciences, Operations Research & Management

Description

One of the current hot topics in science is data: how can datasets be used in scientific and scholarly research in a more reliable, citable and accountable way? Data is of paramount importance to scientific progress, yet most research data remains private. Enhancing the transparency of the processes ...