The prevalence of diabetes as a chronic disease poses significant challenges worldwide, necessitating accurate and early detection of risk categories to improve management and prevention strategies. This research evaluates the application of the K-Nearest Neighbors (KNN) algorithm to classify diabetes risk categories using the Pima Indian Diabetes dataset. The study implements rigorous preprocessing steps, including handling missing values, normalization, and feature engineering, to optimize the dataset for KNN’s distance-based calculations. Hyperparameter tuning and the exploration of various distance metrics, such as Euclidean and Manhattan, are conducted to enhance model accuracy. The KNN model achieves a moderate accuracy of 66%, with a precision of 0.52 and a recall of 0.58 for the diabetic class, highlighting its effectiveness in general pattern recognition but limited ability to handle imbalanced datasets. The research identifies glucose levels and BMI as key predictors and emphasizes the importance of balanced datasets and advanced feature selection techniques. Future recommendations include integrating additional clinical features and hybrid models to improve diagnostic accuracy and applicability in clinical settings. This study underscores KNN's potential as a foundational tool in machine learning for medical diagnostics, contributing to the broader effort to enhance healthcare outcomes through data-driven decision-making.
Copyrights © 2024