Understanding customer behavior is a strategic factor in business decision-making, particularly within the automotive sector, where competition is intense and product variety is diverse. While previous studies often rely on limited demographic variables, such as age and gender, this research advances the field by integrating ensemble logistic regression with Bayesian Optimization for hyperparameter tuning and SHAP-based interpretability. The proposed model incorporates additional features beyond demographics, including vehicle category, product type, vehicle year, dealer branch, and transaction source, to enhance predictive accuracy. The methodology involves data preprocessing through encoding and cleaning, class balancing using SMOTE combined with undersampling, and stratified train-test splitting (80:20). Baseline Logistic Regression achieved an accuracy of 80%, ROC AUC of 0.89, precision of 0.47/0.96, recall of 0.84/0.79, and F1-scores of 0.59/0.89. By applying ensemble logistic regression with Bayesian Optimization, performance improved to 84% accuracy, ROC AUC of 0.92, precision of 0.51/0.98, recall of 0.83/0.84, and F1-scores of 0.63/0.92. SHAP analysis confirmed that the additional features significantly contribute to prediction outcomes. The novelty of this study lies in combining Ensemble Logistic Regression with Bayesian Optimization and SHAP explainability in the automotive domain, offering not only improved accuracy but also interpretability and fairness for business decision-making, providing actionable insights for targeted marketing strategies and product management. Future studies may incorporate broader behavioral and transactional variables to capture more nuanced customer decision patterns..
Copyrights © 2025