Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : Journal Innovations Computer Science

Clustering and Classification of Retail Sales Data: A Big Data and Data Mining Analysis Almagribi, Ahmad Bilal; Redjeki, Sri
Journal Innovations Computer Science Vol. 4 No. 2 (2025): November
Publisher : Yayasan Kawanad

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.56347/jics.v4i2.303

Abstract

In the evolving retail landscape, data-driven decision-making has become essential for understanding customer behavior and predicting sales trends. This study integrates clustering and classification techniques to analyze retail sales data comprising 1,000 transactions obtained from Kaggle. Using the K-Means algorithm, three optimal customer clusters were identified through the Elbow Method, achieving an average within-centroid distance of 25,272.635 and a Davies–Bouldin Index of 0.443, indicating clear cluster separation. The subsequent classification phase compared the predictive performance of three algorithms—Naïve Bayes, Decision Tree, and Random Forest—on 70:30 training-to-testing data partitions. The Naïve Bayes algorithm attained 94.67% accuracy, while both Decision Tree and Random Forest achieved perfect classification accuracy of 100%. These findings highlight the robustness and adaptability of tree-based models for complex retail datasets, outperforming probabilistic methods in terms of accuracy and generalization. The results suggest that the integration of clustering and classification provides retailers with a powerful analytical framework for identifying high-value customer segments, optimizing marketing strategies, and enhancing inventory management. Despite achieving strong outcomes, the study acknowledges dataset limitations and recommends future research involving larger and more diverse datasets, as well as additional features, to expand model scalability and predictive precision.