Obesity is a growing public health problem influenced by multiple interacting lifestyle behaviors that cannot always be adequately captured by single-factor or label-driven analysis. Therefore, this study applies an unsupervised clustering approach to identify natural behavioral segments associated with obesity risk. The data were obtained from the Poltekkes Kemenkes Semarang Obesity Risk Dataset, consisting of 20,758 records and 16 mixed attrib`utes (numerical and categorical) with the NObeyesdad label. Pre-processing included standardizing numerical features and one-hot encoding categorical features, followed by dimensionality reduction using PCA to 13 components retaining approximately 95.48% of the variance. Ward clustering was applied in the PCA space, and the number of clusters was tested for k=2–10 using the Silhouette coefficient, Davies–Bouldin Index (DBI), and Calinski–Harabasz (CH) index. Although the average Silhouette coefficient was modest (≈0.2029), the k=5 solution was retained because it offered the best balance between internal validation results and the practical interpretability of cluster-based risk profiles. BMI-based interpretation using WHO Asia criteria identified Cluster 0 as very high risk (mean BMI 33.04; 73.7% obese), Cluster 2 as high risk characterized by predominant smoking (65.3% obese), Cluster 4 as moderate-to-high risk (34.0% overweight; 30.3% obese), Cluster 1 as a mixed group, and Cluster 3 as relatively low risk (mean BMI 22.21; 8.9% obese). Agreement between clusters and the label was low (NMI 0.126; ARI 0.073), indicating that the clusters represent similarity in behavioral patterns rather than the label classes.
Copyrights © 2026