Insight : International Journal of social research
Vol. 4 No. 1 (2026): Insight : International Journal of Social Research

Comparison Of Clustering Algorithms In Analyzing E-Commerce Data From Kaggle

Anwar, Syahrul (Unknown)
iskandar, Erwin (Unknown)



Article Info

Publish Date
20 Feb 2026

Abstract

The rapid growth of the e-commerce industry produces a huge volume of transaction data, so the right data analysis techniques are needed to extract valuable information for business decision-making purposes. This study aims to compare the performance of three clustering algorithms, namely K-Means, DBSCAN, and Hierarchical Clustering, in analyzing e-commerce datasets sourced from the Kaggle platform. The dataset used is "Online Retail II" published by Daqing Chen through the UCI Machine Learning Repository and Kaggle, containing 541,909 transactions from an online retail company in the UK; After the data cleansing process, a total of 406,829 valid transactions from 4,372 unique customers were used as the basis for analysis. The data was analyzed using the RFM (Recency, Frequency, Monetary) approach as the basis for clustering features for customer segmentation. The algorithm performance evaluation was carried out using three internal validation metrics, namely the Silhouette Score, the Davies-Bouldin Index (DBI), and the Calinski-Harabasz Index (CHI). The results showed that K-Means with k=3 produced the best performance with a Silhouette Score of 0.612 and the lowest DBI of 0.842, followed by Hierarchical Clustering with the Ward and DBSCAN methods. K-Means also excels in computing efficiency with an execution time of 1.23 seconds, much faster than Hierarchical Clustering which takes 8.72 seconds. The resulting segmentation identified three main customer groups: High-value Customers, 31.0%, Medium-value Customers, 43.3%, and passive or at-risk customers (Low-value/At-Risk Customers, 25.7%). These findings provide practical implications that can be directly applied by e-commerce businesses, particularly in designing segmented marketing strategies, loyalty programs, and customer reactivation campaigns based on the choice of clustering algorithms that match their data characteristics and business analytics needs.

Copyrights © 2026






Journal Info

Abbrev

pi

Publisher

Subject

Humanities Computer Science & IT Earth & Planetary Sciences Economics, Econometrics & Finance Environmental Science

Description

Insight : International Journal of Social Research is a scientific journal in the form of research and can be accessed openly. This journal is published once a month by PT. Worldwide Research Publishing Insight : International Journal of Social Research provides a means for ongoing discussion of ...