Class imbalance is a major challenge in diabetes classification, as it can lead models to become biased toward the majority class. Oversampling approaches such as the Synthetic Minority Oversampling Technique (SMOTE) and Adaptive Synthetic Sampling (ADASYN) are applied to address this issue by improving the representation of the minority class. This study compares the performance of both methods using the CatBoost algorithm on a diabetes classification dataset. The evaluation is carried out using accuracy, precision, recall, F1-score, and ROC-AUC metrics. The experimental results indicate that the baseline CatBoost model already achieves strong performance, with an accuracy of 0.9720 and a ROC-AUC of 0.9796; however, the recall for the minority class remains relatively low at 0.6935. The implementation of SMOTE yields the most optimal improvement, achieving an accuracy of 0.9727, precision of 0.9737, recall of 0.6971, and an F1-score of 0.8125, while maintaining a ROC-AUC of 0.9796. Meanwhile, ADASYN also improves performance compared to the baseline, but its results are slightly lower than SMOTE, with an accuracy of 0.9719 and recall of 0.6924. Overall, SMOTE proves to be more effective in enhancing the CatBoost model’s ability to detect the minority class without compromising overall performance. Therefore, SMOTE is recommended as a more stable and optimal oversampling method for handling imbalanced data in diabetes classification tasks.
Copyrights © 2026