Lung cancer is one of the deadliest diseases and a major global health issue. Early detection is crucial to improving survival rates; however, challenges remain in prediction accuracy due to class imbalance in medical datasets. This study aims to analyze the implementation of the K-Nearest Neighbors (KNN) algorithm combined with the Synthetic Minority Oversampling Technique (SMOTE) for early detection of lung cancer. The dataset used was obtained from Kaggle.com and consists of 1000 patient records with 26 clinical and demographic features. The research process followed the CRISP-DM methodology, which includes business understanding, data understanding, data preparation, modeling, evaluation, and deployment stages. In the modeling phase, the KNN algorithm was implemented with k=3 after applying SMOTE to balance the class distribution. Evaluation results showed excellent model performance with an accuracy of 99.50%, and precision, recall, and F1-score values that were nearly perfect. Therefore, the combination of the KNN algorithm and SMOTE has proven to be effective in enhancing the predictive capability for lung cancer severity levels, indicating its potential to be developed into a medical decision support system in the future.
Copyrights © 2025