This study investigates the predictive performance of machine learning models in diagnosing diabetes using the Pima Indians Diabetes Dataset. Seven models, including Logistic Regression, Random Forest, Gradient Boosting, XGBoost, LightGBM, Stacking Classifier, and Voting Classifier, were evaluated. A 10-fold cross-validation strategy was employed to ensure robust and reliable performance assessment. The evaluation incorporated standard metrics such as accuracy, precision, recall, F1 score, and ROC AUC, as well as a custom metric designed to prioritize recall while maintaining precision, addressing the clinical importance of minimizing false negatives. LightGBM and Random Forest emerged as the top-performing individual models, achieving competitive scores across metrics. Ensemble methods, particularly the Stacking Classifier, demonstrated robustness by leveraging the complementary strengths of base models. Statistical validation using the Friedman test confirmed significant differences in model rankings, with a test statistic of 22.77 and a p-value of 0.00088. However, pairwise comparisons using the Wilcoxon signed-rank test revealed that the differences between top models, such as LightGBM and Random Forest, were not statistically significant. These results emphasize the effectiveness of tree-based and ensemble models in addressing clinical diagnostic challenges. The study highlights the importance of using a custom metric to align model evaluation with clinical priorities. Future work should explore hybrid modeling approaches and larger datasets to further enhance predictive accuracy and generalizability in real-world healthcare applications.
Copyrights © 2024