This study aims to compare the one-hot-encoding method, Gower distance combined with k-means, DBSCAN, and OPTICS algorithms, and k-prototype for clustering mixed data types based on accuracy. The dataset used in this research is the chronic kidney disease (CKD) dataset sourced from the UCI Machine Learning Repository. Based on the evaluation using the silhouette index, it is found that k-prototype with the number of clusters k=2 is the most optimal clustering method because it provides the highest silhouette index value compared to the other four methods, with a value of 0,3796. Cluster 1 contains 175 observations, while cluster 2 contains 225 observations. When associated with the labels on the dataset, the clustering results provide an accuracy value of 81,25 percent.
Copyrights © 2023