Hafiz Kurniawan, Muhammad
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

A Comparative Evaluation of Predictive Models for Lung Cancer: Insights from Logistic Regression, Naive Bayes, and Random Forest Hafiz Kurniawan, Muhammad; Misinem, Misinem
International Journal of Advances in Artificial Intelligence and Machine Learning Vol. 2 No. 1 (2025): International Journal of Advances in Artificial Intelligence and Machine Learni
Publisher : CV Media Inti Teknologi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.58723/ijaaiml.v2i1.378

Abstract

This study aims to evaluate the performance of three machine learning models-Logistic Regression, Naive Bayes, and Random Forest-in predicting lung cancer using a publicly available dataset from Kaggle. The data used included demographic information, risk factors, and diagnostic imaging features, with significant class imbalance between benign and malignant cases. To address this imbalance, the Synthetic Minority Sampling Technique (SMOTE) was applied. In addition, Principal Component Analysis (PCA) and Recursive Feature Elimination (RFE) were used for dimensionality reduction and feature selection to improve model performance. The results showed that Random Forest, especially when combined with PCA, outperformed the other models with the highest accuracy of 96.77% and a balanced F1 score of 0.50 for the minority class. Although Logistic Regression achieved high accuracy, it was less effective in predicting minority classes, especially when combined with RFE. Meanwhile, Naive Bayes showed moderate performance but was limited by the assumption of feature independence. The application of SMOTE significantly improved the model's ability to handle class imbalance, while PCA proved more effective than RFE in improving model performance. This study highlights the importance of selecting appropriate machine learning models and preprocessing techniques for lung cancer prediction. Random Forest, with its ability to model complex relationships and handle imbalanced data, emerged as the most effective model for this task. These findings underscore the potential of machine learning in medical diagnostics and provide valuable insights for future research.