Cluster analysis is an important topic and analysis in which the ultimate goal is to classify data into several groups based on similar basic. The most applied cluster methods or algorithms are k-means, k-medoids and hierarchical clustering methods. Therefore, this study aimed to compare methods in cluster analysis employing healthcare data on attributes related to TB. The best method will be assigned based on the level of accuracy for each algorithm and the number of clusters. There were four main steps in the clustering analysis used in this study, which were feature selection, clustering algorithm, cluster validation and interpretation. The clustering algorithm used are k-means, k-medoids and hierarchical clustering, with cluster sizes of 2, 3 and 4. The result showed that k-medoids have a higher accuracy than other clustering algorithms or methods. This study explained that compared to k-means and hierarchical clustering, k-medoid had the highest accuracy for both training and testing data. K-medoid was better than the other two algorithms as it was more robust to noise and outliers which were found in the datasets. This outcome was consistent with the training and testing datasets. In terms of the number of clusters, the two-cluster model was better than the three-cluster or the four-cluster model as this model could classify the groups vividly. The results were consistent in k-mean, k-medoid and hierarchical clustering methods, with the smallest sum of squares value of 24.7% for the k-mean. The smallest diameters and the average dissimilarities of k-medoid models were found in group 1. This result explained that group 1, in all algorithms, was more compact and more similar than other groups.
Copyrights © 2023