Diabetes mellitus is a chronic disease characterized by high blood sugar levels due to metabolic system disturbances, specifically related to insulin production or effectiveness. If left untreated, it can lead to serious complications. Early and accurate detection is crucial for timely medical intervention. This research aimed to improve the accuracy of a diabetes classification system using the K-Nearest Neighbors (KNN) algorithm. An initial KNN model with imbalanced data (without SMOTE) and no GridSearchCV achieved only 83% accuracy. While seemingly good, its performance for the positive class was low (precision 80%, recall 69%, F1-score 74%), indicating bias towards the negative class due to data imbalance. To address this, several steps were implemented: data preprocessing (handling missing data and feature normalization), hyperparameter optimization using GridSearchCV, and data balancing with SMOTE. After these improvements, the KNN model showed significant performance gains, with accuracy reaching 94%. Performance for the positive class greatly improved (precision 90%, recall 98%, F1-score 94%), and for the negative class (precision 98%, recall 89%, F1-score 93%). These results demonstrate that combining preprocessing, model optimization, and class balancing effectively enhances the KNN algorithm's ability to detect diabetes more accurately and robustly, proving that machine learning with proper data processing can aid in developing medical decision support systems for early diabetes diagnosis.
Copyrights © 2025