This study evaluates the performance of three gradient boosting algorithms, XGBoost, LightGBM, and CatBoost, for customer segmentation in the automotive industry. Utilizing a dataset of 8,068 training and 2,627 testing observations with 11 demographic and behavioral variables, the research aims to classify customers into four segments. The methodology includes preprocessing (handling missing values, encoding), hyperparameter tuning via Randomized Search Cross-Validation, and evaluation using ROC AUC. Results indicate that XGBoost outperforms other models, achieving an AUC of 0.5837 on testing data with significant variables, while LightGBM and CatBoost scored 0.5834 and 0.5759, respectively. Key findings highlight the importance of feature selection, with Age, Profession, and Spending Score being the most influential predictors. The study concludes that XGBoost is the most robust for segmentation tasks, though all models exhibit challenges in distinguishing overlapping classes. These insights can guide data-driven marketing strategies in automotive and related sectors.
Copyrights © 2026