Suryaputri, Cantika Okzen
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : JOURNAL OF APPLIED INFORMATICS AND COMPUTING

Analysis of Gradient Boosted Trees Algorithm in Breast Cancer Classification Suryaputri, Cantika Okzen; Rahardi, Majid
Journal of Applied Informatics and Computing Vol. 10 No. 1 (2026): February 2026
Publisher : Politeknik Negeri Batam

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30871/jaic.v10i1.11875

Abstract

Early and accurate classification of breast cancer is essential to support clinical diagnostic processes and improve patient outcomes. This study proposes a comprehensive machine learning pipeline based on Gradient Boosted Tree algorithms to classify breast tumors into benign and malignant categories. The proposed framework integrates several preprocessing stages, including outlier handling using the Local Outlier Factor (LOF), feature normalization with StandardScaler, class imbalance handling using SMOTE, and feature selection through ANOVA-based SelectKBest. Five ensemble learning models—XGBoost, LightGBM, CatBoost, HistGradientBoosting, and GradientBoosting—were trained and evaluated using accuracy, precision, recall, F1-score, and ROC-AUC metrics. The experimental results show that all models achieved strong and comparable classification performance. Among them, CatBoost obtained the highest ROC-AUC value of 0.9960, along with an accuracy of 0.9649, precision of 0.9750, recall of 0.9286, and F1-score of 0.9512. Statistical evaluation using the DeLong test indicated that the differences in ROC-AUC among the evaluated models were not statistically significant (p > 0.05), suggesting similar discriminative capabilities across models. To enhance model interpretability, SHAP (SHapley Additive exPlanations) was applied to the CatBoost model as a representative classifier. The results show that features related to nuclear size and shape, such as radius, area, perimeter, and concavity, contributed most significantly to malignant predictions. This study demonstrates that the integration of robust preprocessing techniques, Gradient Boosted Tree models, and explainable machine learning provides an accurate and interpretable approach for breast cancer classification. However, the evaluation was conducted on a single public dataset without external validation, and further studies using independent and real-world datasets are required before clinical deployment.