Lung adenocarcinoma (LUAD) is a major cause of cancer-related mortality worldwide. This study aims to identify potential LUAD biomarkers and develop robust classification models using the GSE151101 microarray dataset. Preprocessing included RMA normalization, ComBat batch-effect correction, and feature filtering based on annotation completeness, variability, and statistical significance. Support Vector Machine (SVM) and Gaussian Process Classification (GPC) models were constructed, with the polynomial GPC model achieving the best performance (accuracy 97.92%; F1-score 97.96%). Repeated 10-fold cross-validation confirmed its stability (mean accuracy 96.88%, SD ±1.97%, CV 2.03%), outperforming linear SVM, GPC-RBF, and Multiple Kernel Learning (MKL). Functional enrichment analysis showed that key discriminative genes; CDH13, CDKN2A, BCL2L11, MYL9, COL1A1, and AKT3; were enriched in pathways related to epithelial–mesenchymal transition, extracellular matrix remodelling, focal adhesion, PI3K/AKT signalling, and cell-cycle regulation, all of which are central to LUAD progression. In general, polynomial-kernel GPC is a stable and useful way to classify transcriptomes and rank biomarkers. Nevertheless, the translational potential of these signatures requires further validation in independent and clinically controlled cohorts.
Copyrights © 2025