This study aims to segment wholesale customers using the K-Means clustering algorithm and to examine the impact of outliers and data imbalance on the clustering results. The data are taken from the Wholesale Customers Dataset of the UCI Machine Learning Repository, consisting of 440 customers with eight numerical attributes representing annual purchase amounts. The preprocessing steps include exploratory data analysis, outlier detection using Z-Score and boxplot visualization, handling of extreme values with winsorizing, and Z-Score normalization to make the attribute scales comparable. The number of clusters is determined using the Elbow Method. Applying K-Means with produces two highly imbalanced clusters, with 437 customers in Cluster 0 and 3 customers in Cluster 1. Cluster 0 represents regular customers whose purchasing patterns are close to the overall average, while Cluster 1 consists of customers with very high purchases, especially in Frozen and Delicassen categories. Evaluation using the average within centroid distance and the Davies–Bouldin Index shows that, after outlier handling and normalization, the cluster structure becomes more stable and easier to interpret. The resulting segmentation can support differentiated marketing and service strategies for regular and high-spending customers and highlights the importance of proper preprocessing when applying K-Means.
Copyrights © 2026