Stroke is a serious disease that can cause permanent disability and death. This study applies the DBSCAN algorithm to cluster Stroke risk using a public Kaggle dataset (n = 5,110), which contains demographic and clinical attributes such as age, gender, hypertension, heart disease, body mass index (BMI), glucose levels, and smoking status. Preprocessing steps included median imputation for BMI, categorical encoding, Z-score standardization, and PCA for visualization. Parameter selection was conducted using the k-distance plot and Silhouette evaluation, resulting in ε = 2.5 and min_samples = 3 with a Silhouette Score of 0.2158. The findings indicate that DBSCAN has potential to support Stroke prevention strategies, although further parameter tuning and feature optimization are required to improve clustering quality.
Copyrights © 2025