Customer churn remains one of the most pressing issues in the e-commerce sector, as it directly erodes revenue and reduces customer lifetime value. This study proposes an interpretable machine learning approach designed not only to predict churn but also to uncover practical insights that can inform retention strategies. The analysis draws on a publicly available dataset containing customer behavior and transaction records. Data preparation involved handling missing values, applying label encoding, and addressing class imbalance with SMOTE. Five classification models—Logistic Regression, Random Forest, XGBoost, Support Vector Machine, and Gradient Boosting—were trained on an 80:20 stratified split, with performance assessed through accuracy, precision, recall, F1-score, and AUC. Among these, XGBoost delivered the most consistent results, achieving 96% accuracy, 95% precision, 92% recall, and a near-perfect AUC of 0.999, followed closely by Random Forest. Logistic Regression produced the lowest AUC at 0.886. To ensure transparency in decision-making, SHAP (SHapley Additive exPlanations) was applied, revealing Tenure, Complain, and CashbackAmount as the most influential predictors. Longer customer relationships were linked to reduced churn risk, while frequent complaints and higher cashback usage indicated a greater likelihood of leaving. These findings contribute knowledge by blending robust predictive performance with interpretability, enabling e-commerce businesses to design more targeted and proactive customer retention measures.
Copyrights © 2025