Breast cancer remains one of the leading causes of mortality worldwide, with high prevalence rates among women in Indonesia. Accurate and efficient diagnostic models are essential to support early detection and reduce mortality. This study aims to develop a predictive model for breast cancer classification using the CatBoost algorithm, a gradient boosting method known for its ability to natively handle categorical features and reduce overfitting through ordered boosting. The dataset used consists of diagnostic features of breast tumors, which were preprocessed by checking completeness and transforming numerical attributes into categorical bins to capture value distribution more effectively. To address class imbalance between benign and malignant cases, the SMOTE (Synthetic Minority Over-sampling Technique) method was applied, resulting in a balanced training set. Optimal hyperparameters for the CatBoost model were obtained using Bayesian optimization, with key parameters including depth, learning rate, and L2 regularization. The model was then trained and evaluated using recall, accuracy, and F1-score metrics, with a confusion matrix used to assess prediction quality. The results demonstrate that CatBoost achieved high performance with a recall of 1,0, accuracy of 98,6%, and F1-score of 0,99, outperforming or matching other benchmark models such as SVM, Neural Network, and XGBoost. These findings highlight the reliability and effectiveness of CatBoost in supporting medical decision-making for breast cancer diagnosis.
Copyrights © 2025