Building of Informatics, Technology and Science
Vol 7 No 1 (2025): June (2025)

Prediksi Penyakit Kanker Payudara Menggunakan Algoritma Synthetic Minority Oversampling Technique dan Categorical Boosting Classifier

Mandala, Muhamad Bintang (Unknown)
Witanti, Wina (Unknown)
Komarudin, Agus (Unknown)



Article Info

Publish Date
25 Jun 2025

Abstract

Breast cancer remains one of the leading causes of mortality worldwide, with high prevalence rates among women in Indonesia. Accurate and efficient diagnostic models are essential to support early detection and reduce mortality. This study aims to develop a predictive model for breast cancer classification using the CatBoost algorithm, a gradient boosting method known for its ability to natively handle categorical features and reduce overfitting through ordered boosting. The dataset used consists of diagnostic features of breast tumors, which were preprocessed by checking completeness and transforming numerical attributes into categorical bins to capture value distribution more effectively. To address class imbalance between benign and malignant cases, the SMOTE (Synthetic Minority Over-sampling Technique) method was applied, resulting in a balanced training set. Optimal hyperparameters for the CatBoost model were obtained using Bayesian optimization, with key parameters including depth, learning rate, and L2 regularization. The model was then trained and evaluated using recall, accuracy, and F1-score metrics, with a confusion matrix used to assess prediction quality. The results demonstrate that CatBoost achieved high performance with a recall of 1,0, accuracy of 98,6%, and F1-score of 0,99, outperforming or matching other benchmark models such as SVM, Neural Network, and XGBoost. These findings highlight the reliability and effectiveness of CatBoost in supporting medical decision-making for breast cancer diagnosis.

Copyrights © 2025






Journal Info

Abbrev

bits

Publisher

Subject

Computer Science & IT

Description

Building of Informatics, Technology and Science (BITS) is an open access media in publishing scientific articles that contain the results of research in information technology and computers. Paper that enters this journal will be checked for plagiarism and peer-rewiew first to maintain its quality. ...