Bulletin of Computer Science Research
Vol. 5 No. 3 (2025): April 2025

Pendekatan Hybrid K-Means SMOTE dan Logistic Regression Untuk Deteksi Dini Diabetes Mellitus Pada Imbalanced Data

Salam, Abdus (Unknown)
Azhari, Lukman (Unknown)
Septarini, Ri Sabti (Unknown)
Heriyani, Nofitri (Unknown)



Article Info

Publish Date
25 Apr 2025

Abstract

The increasing global prevalence of Diabetes Mellitus necessitates more accurate early detection efforts, particularly through machine learning-based approaches. However, one of the main challenges in medical classification lies in data imbalance, where the number of diabetic cases is significantly lower than that of non-diabetic ones. This study aims to develop a hybrid model by integrating Logistic Regression and K-Means SMOTE to enhance the sensitivity of early detection for Diabetes Mellitus, especially toward the minority class. Logistic Regression is chosen for its computational efficiency and interpretability, while K-Means SMOTE plays a role in balancing class distribution by generating synthetic samples in a structured manner based on clusters of minority class data. The dataset used consists of 2,000 records with 9 health-related features, obtained from the Kaggle platform. Evaluation results indicate that the model utilizing K-Means SMOTE achieves the best performance, with an accuracy of 82.00%, an F1-score of 72.73% for the Diabetes class, and the highest ROC-AUC score of 87.48%. Compared to models without oversampling and with standard SMOTE, this approach improves model generalization and sensitivity to positive cases. These findings have practical implications for the development of fairer and more effective machine learning-based early detection systems, particularly for implementation in healthcare facilities with limited resources.

Copyrights © 2025






Journal Info

Abbrev

bulletincsr

Publisher

Subject

Computer Science & IT

Description

Bulletin of Computer Science Research covers the whole spectrum of Computer Science, which includes, but is not limited to : • Artificial Immune Systems, Ant Colonies, and Swarm Intelligence • Bayesian Networks and Probabilistic Reasoning • Biologically Inspired Intelligence • Brain-Computer ...