Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : Journal of Big Data Analytic and Artificial Intelligence

Optimalisasi Pengelompokkan Konsumen dengan Multi Internal Metric Validation dan Boxplot Analysis Fitriyanto, Rachmad; Nurindah, Nurindah
Journal of Big Data Analytic and Artificial Intelligence Vol 8 No 1 (2025): JBIDAI Juni 2025
Publisher : STMIK PPKIA Tarakanita Rahmawati

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.71302/jbidai.v8i1.67

Abstract

The simultaneous use of multiple internal validation metrics to determine the optimal number of clusters in K-Means Clustering often results in differing K values, which can confuse data practitioners when extracting insights, such as identifying customer characteristics. This study aims to develop an evaluation framework to address the ambiguity arising from varying K values produced by different internal validation metrics. The proposed K evaluation framework consists of two stages. In the first stage, five internal validation metrics—Davies-Bouldin Index (DBI), Silhouette Score, Elbow Method, Dunn Index, and Calinski-Harabasz Index—are used as filters to generate up to five top K candidates. The second stage involves boxplot analysis, interquartile range (IQR), and elbow visualization to explore the cohesiveness and stability of the resulting clusters. The first-stage evaluation yielded four potential cluster counts: K = 2, 5, 7, and 10. In the second stage, based on the elbow graph of the average interquartile range, K = 5 was identified as the most optimal number of clusters compared to the other candidates. These results indicate that using a larger number of internal validation metrics may increase the likelihood of producing multiple K values. However, a higher number of clusters does not necessarily guarantee better quality. The implications of this research highlight the importance of a layered evaluation approach in determining the optimal number of clusters, especially when employing multiple internal validation metrics. The proposed framework can assist data practitioners in making more informed decisions and reducing ambiguity in the clustering process. In the future, this framework can be extended by incorporating external validation metrics or adapted to other clustering algorithms.