Breast cancer remains one of the most pressing global public health challenges, with approximately 2.3 million women diagnosed worldwide in 2022 and around 670,000 deaths attributed to the disease. Despite the widespread application of machine learning algorithms for breast cancer classification, findings across studies remain highly varied, and there is still no consistent conclusion regarding which algorithm is most superior for breast cancer diagnosis. This study aims to analyze and compare the performance of four machine learning algorithms Logistic Regression, Support Vector Machine (SVM), Random Forest, and K-Nearest Neighbors (KNN) in predicting breast cancer. The dataset used was the Breast Cancer Wisconsin (Diagnostic) Data Set obtained from Kaggle, containing morphological characteristics of tumor cells. Data preprocessing involved cleaning, label encoding, feature normalization using StandardScaler, and an 80:20 train-test split. Model performance was evaluated using confusion matrix, precision, recall, F1-score, accuracy, and ROC-AUC. The results showed that all four models achieved excellent performance with overall accuracy ranging from 95.61% to 97.37%. SVM emerged as the most accurate model (97.37%) with perfect recall (1.00) for the Benign class. Logistic Regression demonstrated the highest ROC-AUC value (0.9960), indicating excellent discriminative ability. Random Forest and KNN showed slightly lower performance, particularly in detecting Malignant cases with recall of 0.90. These findings confirm that machine learning can serve as an effective tool to support breast cancer diagnosis, with algorithm selection depending on data characteristics and clinical priorities.
Copyrights © 2025