This study aims to compare the performance of four machine learning algorithms Random Forest (RF), Support Vector Machine (SVM), Logistic Regression (LR), and K-Nearest Neighbors (KNN) in predicting lung cancer severity based on patient medical data. The dataset includes clinical information with the target variable categorized into three severity levels: low, medium, and high. Experiments were conducted using an 80:20 train-test split without feature scaling. The results show that RF achieved 100% accuracy, LR 99%, KNN 82%, and SVM 43%. The superior performance of Random Forest can be attributed to its ensemble of decision trees, which mitigates overfitting in medium-dimensional numerical features, whereas SVM (kernel = RBF, C = 1.0, gamma = "scale") failed to adapt due to the absence of scaling and hyperparameter tuning. Recall, precision, and F1-score further confirm the dominance of RF and LR. This study provides insights into the effectiveness of machine learning algorithms in lung cancer diagnosis and highlights the contribution of a multi-algorithm approach. The findings recommend using RF as the primary model and LR as a complementary control within clinical decision support systems, enabling physicians to make earlier, more personalized treatment decisions and ultimately improve lung cancer patient prognosis.
Copyrights © 2025