Breast Cancer Dataset (BCD) represents a critical health problem due to the increasing prevalence of breast cancer and the importance of early detection of recurrence. Machine Learning (ML) approaches have been widely applied to support diagnosis and prediction; however, class imbalance remains a major challenge, where the majority class (“no-recurrence-events”) significantly outnumbers the minority class (“recurrence-events”). This imbalance can lead to biased models that fail to accurately detect recurrence cases. This study aims to evaluate the effectiveness of class imbalance handling using the Synthetic Minority Over-sampling Technique (SMOTE) on several ML models, including Decision Tree, Naïve Bayes, k-Nearest Neighbors (k-NN), and Random Forest. The dataset used consists of 286 records with 9 features obtained from the UCI Machine Learning repository. Data preprocessing was performed, including handling missing values and outliers, followed by class balancing using SMOTE. Model evaluation was conducted using 10-fold cross-validation and performance metrics such as accuracy, precision, recall, and F1-score. The results show that the application of SMOTE significantly improves model performance, with an average accuracy increase of 11.85%. Among the evaluated models, Random Forest combined with SMOTE achieved the best performance, with an accuracy of 79.79%. In contrast, models such as Naïve Bayes and k-NN demonstrated relatively lower performance. Overall, this study confirms that handling class imbalance using SMOTE can enhance classification performance, particularly in improving the detection of minority classes in breast cancer recurrence prediction tasks.
Copyrights © 2026