Journal of Applied Data Sciences
Vol 6, No 1: JANUARY 2025

Optimizing Stunting Detection through SMOTE and Machine Learning: a Comparative Study of XGBoost, Random Forest, SVM, and k-NN

Sugihartono, Tri (Unknown)
Wijaya, Benny (Unknown)
Marini, Marini (Unknown)
Alkayess, Ahmad Paqih (Unknown)
Anugerah, Hendra Agustian (Unknown)



Article Info

Publish Date
01 Jan 2025

Abstract

Stunting is a vital public health priority that affects millions of children from all over the world, especially in developing countries, where chronic malnutrition impairs their physical growth and cognitive development. Early detection of stunting is necessary for its timely intervention to reduce long-lasting effects. The following study deals with the application of higher-end machine learning techniques in order to detect stunting with more accuracy, using XGBoost, Random Forest, SVM, and k-NN algorithms. Using a dataset sourced from Kaggle, containing 10,000 samples of anthropometric and demographic features, we addressed the significant class imbalance of the data; the number of samples representing stunted children was only 15% of the total. We surmounted this limitation using SMOTE to generate synthetic data in order to balance the representation for this minority class. Further feature selection to improve the performance and interpretability of the model was done using backward elimination, where less impactful features like "Body Length" and "Breastfeeding" were systematically excluded, while putting more emphasis on more predictive variables such as weight, age, and socio-economic indicators. The evaluation of machine learning models showed significant improvements in performance with the integration of SMOTE and optimized feature selection, especially regarding recall and ROC-AUC metrics, which are critical in healthcare settings where the minimization of false negatives is of high importance. XGBoost was the best-performing model among those evaluated, yielding an accuracy of 0.8574, a recall of 0.8914, and an ROC-AUC of 0.9311, hence balancing precision and sensitivity more appropriately than other models. These results emphasize the efficiency of XGBoost in stunting detection while overcoming challenges arising from imbalanced datasets. It then illustrates the potential of merging machine learning techniques with synthetic data augmentation methodologies for the optimization of outcomes related to population health, and forms a basis for healthcare practitioners and policymakers by locating the at-risk children on time. The findings not only point to the importance of advanced data-driven approaches in stunting detection but also lay the ground for future research on machine learning applications in the fight against other malnutrition-related public health challenges, which could be crucial for improving child health and well-being across the world.

Copyrights © 2025






Journal Info

Abbrev

JADS

Publisher

Subject

Computer Science & IT Control & Systems Engineering Decision Sciences, Operations Research & Management

Description

One of the current hot topics in science is data: how can datasets be used in scientific and scholarly research in a more reliable, citable and accountable way? Data is of paramount importance to scientific progress, yet most research data remains private. Enhancing the transparency of the processes ...