Claim Missing Document
Check
Articles

Found 1 Documents
Search

COMPARISON BETWEEN XGBOOST, CATBOOST, RANDOM FOREST, AND LIGHTGBM IN INDONESIAN WOMEN’S BREAST CANCER DATASET Pramita Izati, Prajna; Aniniyah, Nuchaila; Isnawaty, Devi Putri
Parameter: Journal of Statistics Vol. 5 No. 2 (2025)
Publisher : Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Tadulako

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.22487/27765660.2025.v5.i2.17658

Abstract

Breast cancer is the most prevalent cancer among women in Indonesia and remains a major public health concern, making the identification of key risk factors essential for early detection. This study applies four machine learning classification algorithms—XGBoost, Random Forest, CatBoost, and LightGBM—to classify breast cancer risk factors using a breast cancer dataset consisting of 400 samples. Data preprocessing was performed prior to analysis, and the dataset was divided into 75% training and 25% testing data using 10-fold cross-validation. Model performance was evaluated using accuracy, precision, recall, F1-score, and area under the curve (AUC). The results show that CatBoost outperforms the other models, achieving the highest AUC value of 0.72. Feature importance analysis indicates that a high-fat diet, menopause status, and working status are the most influential risk factors, while breastfeeding shows a protective effect. These findings demonstrate that CatBoost provides strong predictive performance and effectively identifies key factors associated with breast cancer risk in Indonesia.