Media Statistika
Vol 18, No 1 (2025): Media Statistika

EVALUATING RANDOM FOREST AND XGBOOST FOR BANK CUSTOMER CHURN PREDICTION ON IMBALANCED DATA USING SMOTE AND SMOTE-ENN

Andespa, Reyuli (Unknown)
Sadik, Kusman (Unknown)
Suhaeni, Cici (Unknown)
Soleh, Agus M (Unknown)



Article Info

Publish Date
14 Oct 2025

Abstract

The banking industry faces significant challenges in retaining customers, as churn can critically affect both revenue and reputation. This study introduces a robust churn prediction framework by comparing the performance of XGBoost and Random Forest algorithms under imbalanced data conditions. The novelty of this research lies in integrating the SMOTE and SMOTE-ENN techniques with machine learning algorithms to enhance model performance and reliability on highly imbalanced datasets. Unlike conventional approaches that rely solely on oversampling or undersampling, this study demonstrates that the hybrid combination of XGBoost and SMOTE provides superior predictive accuracy, stability, and efficiency. Hyperparameter optimization using GridSearchCV was conducted to identify the most effective parameter configurations for both algorithms. Model performance was evaluated using the F1-Score and Area Under the Curve (AUC). The results indicate that XGBoost with SMOTE achieved the best performance, with an F1-Score of 0.8730 and an AUC of 0.9828, showing an optimal balance between precision and recall. Feature importance analysis identified Months_Inactive_12_mon, Total_Trans_Amt, and Total_Relationship_Count as the most influential predictors. Overall, this approach outperforms traditional resampling and modeling techniques, providing practical insights for data-driven customer retention strategies in the banking industry.

Copyrights © 2025