Diabetes mellitus is a chronic disease with a globally increasing prevalence, driven by modern lifestyle changes. Early detection of diabetes risk is crucial in preventing and mitigating long-term complications. This study aims to cluster individuals based on their diabetes risk levels using the K-Means Clustering algorithm by considering lifestyle and health condition attributes. The dataset used was obtained from the Kaggle platform, consisting of 5,452 entries and 22 attributes. The pre-processing stage involved data cleaning, normalization, and manual feature selection. The optimal number of clusters was determined using the Elbow Method, which indicated the best result at k = 3. Cluster quality evaluation was performed using the Davies-Bouldin Index (DBI), which yielded a score of 0.7678, indicating a reasonably good level of cluster compactness and separation. The final output formed three risk clusters: low, medium, and high, with distributions of 424, 819, and 615 records, respectively. This segmentation is expected to serve as a basis for healthcare institutions in designing more targeted and data-driven preventive interventions.
Copyrights © 2025