Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : Sistemasi: Jurnal Sistem Informasi

Comparative Analysis of Oversampling and SMOTEENN Techniques in Machine Learning Algorithms for Breast Cancer Prediction Yulian, Tri; Susanto, Erliyan Redy
Sistemasi: Jurnal Sistem Informasi Vol 14, No 3 (2025): Sistemasi: Jurnal Sistem Informasi
Publisher : Program Studi Sistem Informasi Fakultas Teknik dan Ilmu Komputer

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.32520/stmsi.v14i3.5146

Abstract

Breast cancer is the leading cause of cancer-related death among women, with one of the major challenges in developing predictive models being the class imbalance in medical datasets. This imbalance hinders the detection of minority classes (patients with cancer), which is critical for early diagnosis. This study aims to analyze the performance of Support Vector Machine (SVM) and Random Forest algorithms in predicting breast cancer using oversampling and SMOTEENN preprocessing techniques. The dataset used is the SEER Breast Cancer Dataset, which was balanced using both techniques. Model performance was evaluated using metrics such as accuracy, precision, recall, and F1-score. The results showed that SVM with oversampling achieved the highest accuracy of 98.97%, followed by SVM with SMOTEENN at 97.20%. Random Forest with oversampling reached an accuracy of 96.63%, while with SMOTEENN it achieved 95.90%. SVM proved more effective in identifying both classes with minimal error, particularly when combined with oversampling. These findings highlight that selecting the appropriate model and data preprocessing technique—such as oversampling or SMOTEENN—can significantly enhance predictive accuracy. This research contributes to the development of more accurate and reliable breast cancer prediction systems, supporting early diagnosis and clinical decision-making in medical applications.