This study aims to develop a lung cancer risk prediction system using the K-Nearest Neighbors (KNN) algorithm. The dataset consists of 5,000 patient medical records. During preprocessing, several steps were conducted, including handling missing values, data normalization, label encoding, and selecting relevant features for prediction. A total of 61 outlier entries were removed using the Interquartile Range (IQR) method, resulting in 4,939 clean data entries. The model was trained by tuning the k parameter, with the best performance achieved at k = 20, reaching a training accuracy of 89%. Model evaluation on the test data produced an accuracy of 88%, along with high precision, recall, and F1-score for both classes. After training, the model was integrated into a mobile-based application called LungHealth, which allows users to assess their lung cancer risk. This system is expected to support early detection in a fast, accurate, and efficient manner, enabling individuals—especially those with high-risk factors—to take preventive actions promptly and improve awareness of their lung health.
Copyrights © 2025