Liver disease remains a major global health problem that requires early and accurate diagnosis to prevent severe clinical complications and mortality. In recent years, Support Vector Machine (SVM) combined with Principal Component Analysis (PCA) has been widely applied for liver disease classification. However, existing studies are often limited by small or moderately sized datasets, a lack of systematic comparison among SVM kernel functions, and insufficient discussion of clinical relevance and data representativeness. These limitations restrict model generalizability and hinder practical clinical adoption. To address these gaps, this study evaluates a PCA–SVM classification framework using a large-scale Liver Disease Patient Dataset comprising 30,691 clinical records, thereby improving robustness and population representativeness. The main contribution of this research lies in a systematic and controlled comparison of four SVM kernel functions linear, radial basis function (RBF), polynomial, and sigmoid—under identical preprocessing and dimensionality reduction conditions. PCA is applied to reduce feature redundancy while preserving over 97% of clinically relevant information, supporting efficient learning without increasing model complexity. Experimental results indicate that kernel selection has a substantial impact on diagnostic performance. The RBF kernel consistently outperforms other kernels, achieving an accuracy of 83.63% and an area under the ROC curve of 92.09%, while maintaining strong generalization on unseen data. From a clinical perspective, these findings demonstrate that the proposed PCA–SVM framework has significant potential as a clinical decision support tool for early liver disease screening based on routine laboratory data, offering a balance between predictive performance, computational efficiency, and practical applicability.
Copyrights © 2026