BAREKENG: Jurnal Ilmu Matematika dan Terapan
Vol 20 No 3 (2026): BAREKENG: Journal of Mathematics and Its Application

LEVERAGING XGBOOST, LIGHTGBM, AND CATBOOST FOR ENHANCED CUSTOMER SEGMENTATION IN THE AUTOMOTIVE INDUSTRY

Novri Suhermi (Department of Statistics, Faculty of Science and Data Analytics, Institut Teknologi Sepuluh Nopember, Indonesia)
Rahida Rihhadatul Aisy (Department of Statistics, Faculty of Science and Data Analytics, Institut Teknologi Sepuluh Nopember, Indonesia)
Aulia Afifatur Rohmah (Department of Statistics, Faculty of Science and Data Analytics, Institut Teknologi Sepuluh Nopember, Indonesia)
Anis Alif Nurhayati (Department of Statistics, Faculty of Science and Data Analytics, Institut Teknologi Sepuluh Nopember, Indonesia)
Agnes Nathania Pramesty (Department of Statistics, Faculty of Science and Data Analytics, Institut Teknologi Sepuluh Nopember, Indonesia)
Aura Lovi Ardanika (Department of Statistics, Faculty of Science and Data Analytics, Institut Teknologi Sepuluh Nopember, Indonesia)
Fauziyah Nurul Isnaini (Department of Statistics, Faculty of Science and Data Analytics, Institut Teknologi Sepuluh Nopember, Indonesia)



Article Info

Publish Date
08 Apr 2026

Abstract

This study evaluates the performance of three gradient boosting algorithms, XGBoost, LightGBM, and CatBoost, for customer segmentation in the automotive industry. Utilizing a dataset of 8,068 training and 2,627 testing observations with 11 demographic and behavioral variables, the research aims to classify customers into four segments. The methodology includes preprocessing (handling missing values, encoding), hyperparameter tuning via Randomized Search Cross-Validation, and evaluation using ROC AUC. Results indicate that XGBoost outperforms other models, achieving an AUC of 0.5837 on testing data with significant variables, while LightGBM and CatBoost scored 0.5834 and 0.5759, respectively. Key findings highlight the importance of feature selection, with Age, Profession, and Spending Score being the most influential predictors. The study concludes that XGBoost is the most robust for segmentation tasks, though all models exhibit challenges in distinguishing overlapping classes. These insights can guide data-driven marketing strategies in automotive and related sectors.

Copyrights © 2026






Journal Info

Abbrev

barekeng

Publisher

Subject

Computer Science & IT Control & Systems Engineering Economics, Econometrics & Finance Energy Engineering Mathematics Mechanical Engineering Physics Transportation

Description

BAREKENG: Jurnal ilmu Matematika dan Terapan is one of the scientific publication media, which publish the article related to the result of research or study in the field of Pure Mathematics and Applied Mathematics. Focus and scope of BAREKENG: Jurnal ilmu Matematika dan Terapan, as follows: - Pure ...