Mobile and Forensics
Vol. 8 No. 1 (2026)

Enhancing Early Diabetes Detection Using Tree-Based Machine Learning Algorithms with SMOTEENN Balancing

Lonang, Syahrani (Unknown)
Putra, Ahmad Fatoni Dwi (Unknown)
Firdaus, Asno Azzawagama (Unknown)
Syuhada, Fahmi (Unknown)
Sa'adati, Yuan (Unknown)



Article Info

Publish Date
07 Feb 2026

Abstract

Diabetes continues to be a critical global health issue, demanding accurate predictive systems to enable preventive interventions. Traditional diagnostic tests lack efficiency for large-scale early screening, which has led to growing interest in artificial intelligence solutions. This research proposed an effective methodology for diabetes classification based on tree-based algorithms enhanced with SMOTEENN balancing. The study employed the Kaggle Diabetes Prediction Dataset with 100,000 instances and eight medical and demographic features. Preprocessing steps included handling missing and duplicate values, encoding categorical variables, and scaling numerical attributes with Min-Max normalization. To address severe class imbalance, SMOTEENN was adopted, producing a cleaner and more balanced dataset. Model evaluation was performed using Stratified 5-Fold cross-validation on six classifiers: Decision Tree, Random Forest, Gradient Boosting, AdaBoost, XGBoost, and CatBoost. Experimental results indicated significant gains after balancing, with ensemble methods outperforming single-tree baselines. Random Forest delivered the best overall performance (98.93% accuracy, 98.96% F1-score, 99.16% recall, 99.94% AUC), followed by CatBoost and XGBoost with comparable results above 99% AUC. While Decision Tree benefited most from SMOTEENN in relative terms, it remained less competitive. Analysis of the importance of the analysis revealed HbA1c level and blood glucose level as dominant predictors, validating clinically meaningful learning. These findings suggest that integrating hybrid resampling with ensemble tree classifiers provides reliable and general predictions for diabetes risk. The approach holds promise for deployment in healthcare decision support systems.

Copyrights © 2026






Journal Info

Abbrev

mf

Publisher

Subject

Computer Science & IT Control & Systems Engineering Electrical & Electronics Engineering Library & Information Science Neuroscience

Description

Mobile and Forensics (MF) adalah Jurnal Nasional berbasis online dan open access untuk penelitian terapan pada bidang Mobile Technology dan Digital Forensics. Jurnal ini mengundang seluruh ilmuan dan peneliti dari seluruh dunia untuk bertukar dan menyebarluaskan topik-topik teoritis dan praktik yang ...