Colorectal cancer remains a leading cause of global mortality, necessitating effective predictive tools for patient survival. While Machine Learning algorithms like K-Nearest Neighbors (KNN) utilize patient data for prediction, standard KNN implementations often suffer from the curse of dimensionality and overfitting, leading to unreliable performance on complex medical datasets. This study aims to evaluate and optimize the performance of the KNN algorithm by integrating Principal Component Analysis (PCA) for dimensionality reduction and K-Fold Cross-Validation (KFCV) to enhance model stability. The research utilized a quantitative approach on a global colorectal cancer dataset, processing demographic and clinical features through a rigorous pipeline of imputation, encoding, and normalization. Three model configurations were systematically compared: Standard KNN, KNN combined with PCA, and an optimized KNN model utilizing both PCA and KFCV across various neighbor values. The results demonstrate a distinct trade-off between predictive sensitivity and model stability. While the Standard KNN and PCA-enhanced models achieved higher recall, indicating a strong ability to identify survivors in a single data split, the fully optimized KNN+PCA+KFCV model provided the most stable and generalized accuracy with minimal deviation. These findings indicate that while PCA effectively reduces computational complexity without information loss, the integration of cross-validation is crucial for obtaining an honest assessment of model performance. This research contributes to clinical informatics by highlighting the necessity of prioritization between high sensitivity and generalization stability when developing survival prediction models for complex, inseparable medical data.
Copyrights © 2026