Based on data from the 2020 Global Cancer Statistics, lung cancer is the deadliest malignancy in the world with the most incidents occurring in Southeast Asia. Deaths caused by lung disease are still very high, so it is necessary to increase prevention efforts, for example by increasing the results of the prediction model. The application of machine learning methods to lung cancer survey datasets that are generally used by researchers for lung disease prediction, including the development of assistive devices, still does not handle missing values, noisy data, unbalanced classes, and even data validation efficiently. Therefore, a mean/mode imputation approach is proposed to handle missing value replacement, Min-Max Normalization to handle smoothing noisy data, K-Fold Cross Validation to handle data validation, and a hyperparameter tuning approach that can unify the performance of each machine learning method. to make classification decisions as well as to reduce unbalanced classes. The results of this study indicate that the proposed method provides an accuracy of 0.9%, so as to improve the accuracy performance of machine learning methods, the difference is 0.95% with Logistic Regression, 0.9% with KNeighborsClassifier, 0.34% with Gaussian NB.
Copyrights © 2022