JTAM (Jurnal Teori dan Aplikasi Matematika)
Vol 10, No 3 (2026): July

A Comparative Study of PCA-Based Dimensionality Reduction and Best Subset Selection in Disease Classification

Andreas Rony Wijaya (Department of Statistics, Universitas Sebelas Maret)
Atika Ratna Dewi (Department of Data Sciences, Universitas Telkom)
Muhammad Bayu Nirwana (Department of Statistics, Universitas Sebelas Maret)
Respatiwulan Respatiwulan (Department of Statistics, Universitas Sebelas Maret)
Sri Sulistijowati Handajani (Department of Statistics, Universitas Sebelas Maret)



Article Info

Publish Date
08 Jun 2026

Abstract

Real-world datasets often contain many variables, some of which may be irrelevant or redundant. To build an effective classification model, it is important to simplify the data by keeping only the most influential features. One common approach that can be used for selecting the most influential variables is feature selection. However, when dealing with many variables, removing some may result in the loss of information. Hence, it is also necessary to consider methods that can simplify the model while retaining most of the information from the original variables. Dimensionality reduction is one such approach that effectively addresses this issue. This study employs a comparative quantitative research approach to evaluate the effectiveness of principal component analysis (PCA) as a dimensionality reduction method and best subset selection as a feature selection method in improving classification performance. The study utilizes a heart disease dataset from the UCI Machine Learning Repository consisting of 303 observations and 13 predictor variables as a case study. Both approaches are applied to reduce the number of predictor variables and make the model more interpretable. After applying both methods, three classification models — logistic regression, naïve Bayes, and linear discriminant analysis — are trained and evaluated using accuracy, recall, precision, and F1-score, and the results are further illustrated through ROC curves. Feature selection using best-subset selection yields seven variable combinations with the most significant predictors, whereas PCA requires eight principal components to explain 80% of the total variation.  The best classification performance was obtained using the feature-selected dataset, achieving an accuracy of 87% and an AUC of 0.93, outperforming both the original dataset model and the PCA-reduced dataset model. These results show that feature selection using best subset selection provides a better balance between simplicity and classification performance. Furthermore, the models obtained after feature reduction, both from best subset selection and PCA, still maintain good predictive ability as indicated by their relatively high AUC values.

Copyrights © 2026






Journal Info

Abbrev

jtam

Publisher

Subject

Mathematics

Description

Jurnal Teori dan Aplikasi Matematika (JTAM) dikelola oleh Program Studi Pendidikan Matematika FKIP Universitas Muhammadiyah Mataram dengan ISSN (Cetak) 2597-7512 dan ISSN (Online) 2614-1175. Tim Redaksi menerima hasil penelitian, pemikiran, dan kajian tentang (1) Pengembangan metode atau model ...