The increasing volume and complexity of data present challenges in big data processing, particularly in manually identifying data patterns and relationships. In data mining, clustering methods such as the K-Means algorithm are widely used to group data based on similar characteristics. However, K-Means’ reliance on random initial centroid selection can yield suboptimal clustering results. This study aims to compare the evaluation results and iteration time of three optimization methods—Elbow, Particle Swarm Optimization (PSO), and Sum of Square Error (SSE)—on the K-Means algorithm. The dataset used is the Online Retail II dataset from the UCI Machine Learning Repository. The Davies-Bouldin Index (DBI) method is used as an evaluation tool to assess the validity of the formed clusters. Based on the analysis results, the Elbow and SSE optimization methods achieved a DBI score of 0.8500 with faster iteration times compared to PSO. Meanwhile, the PSO method provided the best DBI score of 0.7376, although it required significantly longer iteration time. The results of this study are expected to serve as a reference for selecting an appropriate optimization method for the K-Means algorithm based on time requirements and clustering evaluation outcomes.
Copyrights © 2024