Diabetes mellitus is a chronic disease with an increasing global prevalence that requires early detection and accurate classification to prevent severe complications. Machine learning has been widely applied in diabetes prediction; however, one of the major challenges lies in the class imbalance problem commonly found in medical datasets. This study focuses on the Multiclass Diabetes Dataset, which consists of 264 samples and exhibits imbalanced distribution among classes (Class 2: 128 samples, Class 0: 96 samples, Class 1: 40 samples). Such imbalance may bias the classifier toward majority classes, reducing its ability to recognize minority classes. The results indicate that SMOTE effectively improved the model’s ability to classify minority classes, with significant increases in recall and F1-score. Among the tested algorithms, Random Forest achieved the best performance, with an overall accuracy of 98% and F1-score above 0.98. Although KNN experienced a slight performance drop after SMOTE, other algorithms, particularly SVM and Logistic Regression, demonstrated notable improvements.
Copyrights © 2025