A high customer churn rate represents a significant challenge for the banking industry, leading to substantial financial losses and higher acquisition costs for new customers. Proactively identifying customers who are likely to churn is essential for implementing effective retention strategies. This study aims to address this issue by implementing and comprehensively comparing three different machine learning classification algorithms: Logistic Regression, Random Forest, and XGBoost. The study utilized a secondary dataset consisting of bank customer profiles from 10,000 customers with various characteristics, including credit scores, account balances, and transaction activities. The research methodology followed the Cross-Industry Standard Process for Data Mining (CRISP-DM) framework. The models were evaluated using several metrics, including Accuracy, Precision, Recall, F1-Score, and ROC-AUC. The findings indicate that the ensemble models significantly outperformed the linear model (Logistic Regression), which achieved an F1-Score of only 0.286. Random Forest emerged as the best-performing model in this study, achieving the highest Accuracy (0.864), F1-Score (0.590), and ROC-AUC (0.852). In comparison, XGBoost demonstrated competitive performance with an F1-Score of 0.579 and a ROC-AUC of 0.832. The study concludes that Random Forest provides the most optimal overall performance, offering the strongest capability for identifying at-risk customers within the dataset.
Copyrights © 2026