Akbas, Ibrahim
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Clustering Performance on Heart Disease Data: Effects of Distance Metrics and Scaling Akbas, Ibrahim; Taspinar, Yavuz; Koklu, Murat
Journal of Technology and System Information Vol. 3 No. 1 (2026): January
Publisher : Indonesian Journal Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47134/jtsi.v3i1.5336

Abstract

Cardiovascular diseases (CVD) are one of the leading causes of morbidity and mortality worldwide, requiring advanced analytical approaches to identify early-stage risk groups and classify patient profiles in greater detail. The aim of this study is to reveal latent patient subgroups associated with CVD using unsupervised machine learning methods on clinical data. In this context, a dataset consisting of 11 clinical variables from 303 patients who visited the VA Medical Center in Long Beach, California, was analyzed. During the preprocessing stage, missing observations were eliminated, only numerical variables were used, and both z-score standardization and min–max normalization were applied to the data. Subsequently, hierarchical clustering analyses were performed using single, complete, and average linkage approaches based on Euclidean and cosine distance measures) (the number of possible clusters for different distance–scaling combinations was evaluated using the Elbow and Silhouette measures. The results obtained showed that the 4-cluster solution, particularly under the complete and average linkage methods, represented the data structure in the most clinically explanatory manner. The similarity between the clustering results obtained using the k-means algorithm with Euclidean distance in standardized data and cosine distance in normalized data was calculated as the Rand Index (RI) = 0.8179) (this value demonstrated that the cluster structure was largely preserved despite different distance metrics and scaling strategies.  The findings demonstrate that unsupervised learning-based clustering approaches provide a useful tool for defining meaningful risk classes within heterogeneous patient populations based on clinical datasets and for conducting comparative clinical evaluations between these classes.