Image processing and machine learning are being used in biomedical applications as supporting tools for the detection and diagnosis of certain diseases. Breast cancer is one of these diseases that researchers have devoted great effort to for decades. To accomplish this task, image-based and feature-based public datasets are available for use. Due to several factors such as hardware limitations or preprocessing, images can become noisy. The noise in images, which can lead to anomalies or outliers in the dataset, may decrease detection accuracy and mislead medical staff during the diagnostic stage. Therefore, this study aims to present the effect of removing outliers from the dataset on the detection accuracy of breast cancer. The proposed method removes outliers detected through z-score analysis. The remaining data are normalized, and the classification accuracies of ten methods are obtained through direct implementation. The methods include XGBoost, Neural Network, CNN, RNN, AdaBoost, LSTM, GRU, Random Forest, SVM, and Logistic Regression. The public dataset Wisconsin Diagnostic Breast Cancer (WDBC) was used in this study. An ablation study was conducted by fine-tuning the threshold value of the z-score method. The results showed that the best accuracy was obtained when the threshold value was set to 3. Additionally, a comparison was made between the results obtained using the entire dataset and the dataset after outlier removal. The results showed that the average accuracy of all classifiers was 98.08%. In conclusion, the findings indicate that removing outliers from the dataset increases the overall accuracy of breast cancer detection
Copyrights © 2025