Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : TIN: TERAPAN INFORMATIKA NUSANTARA

Optimasi Algoritma Machine Learning Menggunakan Seleksi Fitur Xgboost Untuk Klasifikasi Kanker Payudara Ramadhan, Naufal Cahya; H, Hanny Hikmayanti; Rohana, Tatang; Siregar, Amril Mutoi
TIN: Terapan Informatika Nusantara Vol 5 No 2 (2024): July 2024
Publisher : Forum Kerjasama Pendidikan Tinggi (FKPT)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/tin.v5i2.5408

Abstract

This research analyzes the performance of the K-Nearest Neighbors (KNN), Naïve Bayes, and Random Forest algorithms in the classification of breast cancer diagnosis using the Wisconsin Breast Cancer dataset. The problem discussed is how to improve the accuracy of breast cancer diagnosis classification through appropriate preprocessing techniques. The research objective is to evaluate and compare the performance of the three algorithms after the application of preprocessing which includes data cleaning, handling missing values, data duplication, and outliers, as well as feature selection using XGBoost and SMOTE oversampling. application of feature selection to identify the most relevant features and SMOTE to balance the class distribution in the dataset. Performance evaluation results using a confusion matrix show that Random Forest has the best performance with high accuracy, precision, recall, and F1-score, reaching an AUC of 98% after the application of SMOTE. The combination of feature selection and SMOTE was shown to significantly improve model performance, although KNN showed a decrease in performance with SMOTE, while Naïve Bayes experienced a considerable improvement. This study demonstrates the importance of preprocessing techniques in the development of machine learning models for medical applications, emphasizing that appropriate techniques can significantly improve classification performance and result in more accurate diagnoses.