Parameter: Journal of Statistics
Vol. 5 No. 2 (2025)

COMPARISON BETWEEN XGBOOST, CATBOOST, RANDOM FOREST, AND LIGHTGBM IN INDONESIAN WOMEN’S BREAST CANCER DATASET

Pramita Izati, Prajna (Unknown)
Aniniyah, Nuchaila (Unknown)
Isnawaty, Devi Putri (Unknown)



Article Info

Publish Date
30 Dec 2025

Abstract

Breast cancer is the most prevalent cancer among women in Indonesia and remains a major public health concern, making the identification of key risk factors essential for early detection. This study applies four machine learning classification algorithms—XGBoost, Random Forest, CatBoost, and LightGBM—to classify breast cancer risk factors using a breast cancer dataset consisting of 400 samples. Data preprocessing was performed prior to analysis, and the dataset was divided into 75% training and 25% testing data using 10-fold cross-validation. Model performance was evaluated using accuracy, precision, recall, F1-score, and area under the curve (AUC). The results show that CatBoost outperforms the other models, achieving the highest AUC value of 0.72. Feature importance analysis indicates that a high-fat diet, menopause status, and working status are the most influential risk factors, while breastfeeding shows a protective effect. These findings demonstrate that CatBoost provides strong predictive performance and effectively identifies key factors associated with breast cancer risk in Indonesia.

Copyrights © 2025






Journal Info

Abbrev

parameter

Publisher

Subject

Computer Science & IT Decision Sciences, Operations Research & Management Mathematics

Description

Parameter: Journal of Statistics is a refereed journal committed to original research articles, reviews and short communications of Statistics and its ...