Reza, Bob Subhan
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : Jurnal Teknik Informatika (JUTIF)

Optimized KNN Performance with PCA and K-Fold Cross-Validation for Colorectal Cancer Survival Prediction Manza, Yuke; Rosnelly, Rika; Furqan, Mhd; Reza, Bob Subhan
Jurnal Teknik Informatika (Jutif) Vol. 7 No. 1 (2026): JUTIF Volume 7, Number 1, February 2026
Publisher : Informatika, Universitas Jenderal Soedirman

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52436/1.jutif.2026.7.1.5422

Abstract

Colorectal cancer remains a leading cause of global mortality, necessitating effective predictive tools for patient survival. While Machine Learning algorithms like K-Nearest Neighbors (KNN) utilize patient data for prediction, standard KNN implementations often suffer from the curse of dimensionality and overfitting, leading to unreliable performance on complex medical datasets. This study aims to evaluate and optimize the performance of the KNN algorithm by integrating Principal Component Analysis (PCA) for dimensionality reduction and K-Fold Cross-Validation (KFCV) to enhance model stability. The research utilized a quantitative approach on a global colorectal cancer dataset, processing demographic and clinical features through a rigorous pipeline of imputation, encoding, and normalization. Three model configurations were systematically compared: Standard KNN, KNN combined with PCA, and an optimized KNN model utilizing both PCA and KFCV across various neighbor values. The results demonstrate a distinct trade-off between predictive sensitivity and model stability. While the Standard KNN and PCA-enhanced models achieved higher recall, indicating a strong ability to identify survivors in a single data split, the fully optimized KNN+PCA+KFCV model provided the most stable and generalized accuracy with minimal deviation. These findings indicate that while PCA effectively reduces computational complexity without information loss, the integration of cross-validation is crucial for obtaining an honest assessment of model performance. This research contributes to clinical informatics by highlighting the necessity of prioritization between high sensitivity and generalization stability when developing survival prediction models for complex, inseparable medical data.