Journal of Applied Data Sciences
Vol 6, No 1: JANUARY 2025

The Development of Stacking Techniques in Machine Learning for Breast Cancer Detection

Van FC, Lucky Lhaura (Unknown)
Anam, M. Khairul (Unknown)
Bukhori, Saiful (Unknown)
Mahamad, Abd Kadir (Unknown)
Saon, Sharifah (Unknown)
Nyoto, Rebecca La Volla (Unknown)



Article Info

Publish Date
27 Dec 2024

Abstract

This study addresses the challenges of accurately detecting breast cancer using machine learning (ML) models, particularly when handling imbalanced datasets that often cause model bias toward the majority class. To tackle this, the Synthetic Minority Over-sampling Technique (SMOTE) was applied not only to balance the class distribution but also to improve the model's sensitivity in detecting malignant tumors, which are underrepresented in the dataset. SMOTE was effective in generating synthetic samples for the minority class without introducing overfitting, enhancing the model's generalization on unseen data. Additionally, AdaBoost was employed as the meta model in the stacking framework, chosen for its ability to focus on misclassified instances during training, thereby boosting the overall performance of the combined base models. The study evaluates several models and combinations, with K-Nearest Neighbors (KNN) + SMOTE achieving an accuracy of 97%, precision, recall, and F1-score of 97%. Similarly, C4.5 + Hyperparameter Tuning + SMOTE reached 95% in all metrics. The stacking model with Logistic Regression (LR) as the meta model and SMOTE achieved a strong performance with 97% accuracy, precision, recall, and F1-score all at 97%. The best result was obtained using the combination of Stacking AdaBoost + Hyperparameter Tuning + SMOTE, reaching an accuracy of 98%. These findings highlight the effectiveness of combining SMOTE with stacking techniques to develop robust predictive models for medical applications. The novelty of this study lies in the integration of SMOTE and advanced stacking methods, particularly using AdaBoost and Logistic Regression, to address the issue of class imbalance in medical datasets. Future work will explore deploying this model in clinical settings for accurate and timely breast cancer detection.

Copyrights © 2025






Journal Info

Abbrev

JADS

Publisher

Subject

Computer Science & IT Control & Systems Engineering Decision Sciences, Operations Research & Management

Description

One of the current hot topics in science is data: how can datasets be used in scientific and scholarly research in a more reliable, citable and accountable way? Data is of paramount importance to scientific progress, yet most research data remains private. Enhancing the transparency of the processes ...