Claim Missing Document
Check
Articles

Found 2 Documents
Search
Journal : Sistemasi: Jurnal Sistem Informasi

Comparative Analysis of Oversampling and SMOTEENN Techniques in Machine Learning Algorithms for Breast Cancer Prediction Yulian, Tri; Susanto, Erliyan Redy
Sistemasi: Jurnal Sistem Informasi Vol 14, No 3 (2025): Sistemasi: Jurnal Sistem Informasi
Publisher : Program Studi Sistem Informasi Fakultas Teknik dan Ilmu Komputer

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.32520/stmsi.v14i3.5146

Abstract

Breast cancer is the leading cause of cancer-related death among women, with one of the major challenges in developing predictive models being the class imbalance in medical datasets. This imbalance hinders the detection of minority classes (patients with cancer), which is critical for early diagnosis. This study aims to analyze the performance of Support Vector Machine (SVM) and Random Forest algorithms in predicting breast cancer using oversampling and SMOTEENN preprocessing techniques. The dataset used is the SEER Breast Cancer Dataset, which was balanced using both techniques. Model performance was evaluated using metrics such as accuracy, precision, recall, and F1-score. The results showed that SVM with oversampling achieved the highest accuracy of 98.97%, followed by SVM with SMOTEENN at 97.20%. Random Forest with oversampling reached an accuracy of 96.63%, while with SMOTEENN it achieved 95.90%. SVM proved more effective in identifying both classes with minimal error, particularly when combined with oversampling. These findings highlight that selecting the appropriate model and data preprocessing technique—such as oversampling or SMOTEENN—can significantly enhance predictive accuracy. This research contributes to the development of more accurate and reliable breast cancer prediction systems, supporting early diagnosis and clinical decision-making in medical applications.
Comparison of Machine Learning Models for Predicting Lung Cancer Severity Lestari, Ninik; Susanto, Erliyan Redy
Sistemasi: Jurnal Sistem Informasi Vol 14, No 6 (2025): Sistemasi: Jurnal Sistem Informasi
Publisher : Program Studi Sistem Informasi Fakultas Teknik dan Ilmu Komputer

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.32520/stmsi.v14i6.5258

Abstract

This study aims to compare the performance of four machine learning algorithms Random Forest (RF), Support Vector Machine (SVM), Logistic Regression (LR), and K-Nearest Neighbors (KNN) in predicting lung cancer severity based on patient medical data. The dataset includes clinical information with the target variable categorized into three severity levels: low, medium, and high. Experiments were conducted using an 80:20 train-test split without feature scaling. The results show that RF achieved 100% accuracy, LR 99%, KNN 82%, and SVM 43%. The superior performance of Random Forest can be attributed to its ensemble of decision trees, which mitigates overfitting in medium-dimensional numerical features, whereas SVM (kernel = RBF, C = 1.0, gamma = "scale") failed to adapt due to the absence of scaling and hyperparameter tuning. Recall, precision, and F1-score further confirm the dominance of RF and LR. This study provides insights into the effectiveness of machine learning algorithms in lung cancer diagnosis and highlights the contribution of a multi-algorithm approach. The findings recommend using RF as the primary model and LR as a complementary control within clinical decision support systems, enabling physicians to make earlier, more personalized treatment decisions and ultimately improve lung cancer patient prognosis.