Scientific Journal of Informatics
Vol. 12 No. 2: May 2025

Integration of Random Forest, ADASYN, and SHAP for Diabetes Prediction and Interpretation

Aulia, Hozana (Unknown)
Wibowo, Adi (Unknown)
Sutrisno, Sutrisno (Unknown)



Article Info

Publish Date
23 Jun 2025

Abstract

Purpose: Diabetes is a chronic disease with a globally rising prevalence. Early detection of individuals at risk is essential to prevent long-term complications. This study aims to develop a diabetes prediction model that not only achieves high classification accuracy but also provides transparent explanations of the factors influencing its predictions. Methods: The study utilized the Pima Indians Diabetes Dataset, which contains clinical data from 768 female patients aged over 21. The methodology included data preprocessing (handling of missing values and feature engineering, such as the creation of Age_BMI and Glucose_BMI features), a 70:30 train-test split, class imbalance handling using the ADASYN technique, model development using the Random Forest algorithm with hyperparameter tuning via GridSearchCV, and model interpretability analysis using SHAP. Result: The proposed model achieved an accuracy of 79.2% and a recall of 85.2% on the test data. SHAP analysis revealed that Glucose, Age_BMI, BMI, and DiabetesPedigreeFunction were the most influential features in predicting diabetes. Furthermore, the SHAP heatmap indicated that individuals aged 30–50 years with obesity were at the highest risk. These findings align with existing medical literature, reinforcing the role of metabolic and age-related factors in diabetes development. Novelty: This study presents an integrative approach combining class balancing (ADASYN), classification (Random Forest), and model interpretability (SHAP) in a unified framework for diabetes prediction. It emphasizes the importance of transparent model interpretation for healthcare professionals, enabling not only predictive outcomes but also actionable insights into risk factors. The findings support future research opportunities, including the integration of lifestyle variables and external validation using real-world clinical data from diverse populations.

Copyrights © 2025






Journal Info

Abbrev

sji

Publisher

Subject

Computer Science & IT Control & Systems Engineering Decision Sciences, Operations Research & Management Electrical & Electronics Engineering Engineering

Description

Scientific Journal of Informatics (p-ISSN 2407-7658 | e-ISSN 2460-0040) published by the Department of Computer Science, Universitas Negeri Semarang, a scientific journal of Information Systems and Information Technology which includes scholarly writings on pure research and applied research in the ...