Journal of Soft Computing Exploration
Vol. 7 No. 1 (2026): March 2026

Enhancing diabetes classification performance using XGBoost integrated with SMOTE and bayesian hyperparameter optimization

Muhammad Nurul Ihyaul Ulum (Department of Computer Science, Universitas Negeri Semarang, Indonesia)
Jumanto Unjung (Department of Computer Science, Universitas Negeri Semarang, Indonesia)



Article Info

Publish Date
18 Mar 2026

Abstract

Diabetes mellitus is a long-term metabolic disorder that is becoming more common around the world. Finding people at risk early can help prevent serious health problems and improve patient outcomes. Machine learning is often used to predict diabetes, but imbalanced medical data can make it harder for models to spot positive cases. In this study, we created a diabetes classification model by combining the Extreme Gradient Boosting (XGBoost) algorithm with the Synthetic Minority Over-sampling Technique (SMOTE), and we used Bayesian Optimization to fine-tune the model’s settings. We worked with the Pima Indians Diabetes Dataset, which has 768 patient records and eight clinical features. Our steps included preprocessing the data, splitting it into training and testing sets, using SMOTE to balance the training data classes, training the XGBoost model, and performing hyperparameter tuning using Bayesian Optimization with Stratified 5-Fold Cross-Validation to determine the optimal parameter configuration. The final model reached an accuracy of 0.88, a precision of 0.79, a recall of 0.91, an F1-score of 0.84, and a ROC-AUC of 0.955. These results show that our approach can identify diabetes cases more effectively while keeping strong overall performance.

Copyrights © 2026






Journal Info

Abbrev

journal

Publisher

Subject

Computer Science & IT Control & Systems Engineering Decision Sciences, Operations Research & Management Electrical & Electronics Engineering

Description

The journal focuses on publishing high-quality, original research and review articles in the field of Soft Computing, Informatics and Computer Science, emphasizing the development, application, and rigorous evaluation of Advanced Computational Methods, Artificial Intelligence (AI), Machine Learning ...