This research explores the use of synthetic data in Length Recency Frequency Monetary (LRFM) analysis and K-Means clustering for customer segmentation. It is challenging to access accurate and comprehensive customer data, this study generates synthetic data using Time-series Generative Adversarial Networks (TimeGAN) to supplement or replace original data. LRFM analysis is used to measure customer characteristics based on the dimensions of Length, Recency, Frequency, and Monetary, which are then applied to clustering using the K-Means algorithm. The quality of clustering is evaluated using the Silhouette Coefficient and Davies-Bouldin Index. The results show that the Silhouette Coefficient for synthetic data is 0.42, slightly higher compared to the original data which has a value of 0.41. Meanwhile, the Davies-Bouldin Index for synthetic data is 0.90, slightly higher than the original data which has a value of 0.89. This indicates that synthetic data can mimic the characteristics of real data without compromising the accuracy and quality of clustering. By combining synthetic data, LRFM analysis, and K-Means clustering, this research provides in-depth insights into customer segmentation. The findings are expected to help companies develop more effective marketing strategies, enhance customer retention, and optimize overall customer experience. This study asserts that synthetic data is a valid alternative to real data in customer analysis.
Copyrights © 2025