Breast cancer is the most prevalent cancer among women in Indonesia and remains a major public health concern, making the identification of key risk factors essential for early detection. This study applies four machine learning classification algorithms—XGBoost, Random Forest, CatBoost, and LightGBM—to classify breast cancer risk factors using a breast cancer dataset consisting of 400 samples. Data preprocessing was performed prior to analysis, and the dataset was divided into 75% training and 25% testing data using 10-fold cross-validation. Model performance was evaluated using accuracy, precision, recall, F1-score, and area under the curve (AUC). The results show that CatBoost outperforms the other models, achieving the highest AUC value of 0.72. Feature importance analysis indicates that a high-fat diet, menopause status, and working status are the most influential risk factors, while breastfeeding shows a protective effect. These findings demonstrate that CatBoost provides strong predictive performance and effectively identifies key factors associated with breast cancer risk in Indonesia.
Copyrights © 2025