Early detection of diabetes mellitus is crucial to prevent severe complications. This study evaluates three machine learning algorithms for diabetes prediction using a quantitative comparative experimental design. The algorithms are k-Nearest Neighbors (k-NN), Support Vector Machine (SVM), and Random Forest. These methods were chosen to compare distinct learning paradigms. k-NN is distance-based, SVM is margin-based, and Random Forest is an ensemble method. The goal is to find the optimal model for clinical use. The Pima Indians Diabetes dataset was used. It includes 390 patients and 15 clinical features. Performance was measured by accuracy, precision, recall, and F1-score. Random Forest had the highest accuracy (89.7%) and F1-score, providing the most balanced classification. SVM followed with 84.6%, and k-NN achieved 76.9%. Although k-NN had the highest recall (0.750), its precision was low (0.375), showing a high false-positive rate. Feature importance analysis pointed to blood glucose levels as the most significant predictor, which matches clinical knowledge. In summary, ensemble techniques like Random Forest offer the most reliable results. This highlights the importance of selecting the right algorithm for early diabetes detection in clinical applications.
Copyrights © 2026