This study applies the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm to the UK Smoking Survey Dataset from Kaggle (1,691 records, 13 attributes). Preprocessing includes missing value imputation, label encoding, feature selection of 8 features, and StandardScaler normalization. Optimal parameters (eps=2.0; min_samples=15) were determined via K-Distance Graph. Four clusters were identified: Cluster 0 (n=539, male non-smokers, avg. 51.2 yrs), Cluster 1 (n=228, female smokers, 11.9 cig/weekday), Cluster 2 (n=731, female non-smokers, avg. 53.0 yrs), Cluster 3 (n=168, male smokers, 13.5 cig/weekday), and 25 noise points as extreme heavy smokers. Evaluation: Silhouette Score=0.2032, Davies-Bouldin Index=2.0494, Calinski-Harabasz Index=395.68. Results demonstrate DBSCAN’s effectiveness in identifying demographic-based smoking behavior patterns.
Copyrights © 2026