Garuda - Garba Rujukan Digital

Article Per Year (5 Year)

p-Index From 2021 - 2026

0.23

P-Index

This Author published in this journals

All Journal JTAM (Jurnal Teori dan Aplikasi Matematika)

Andreas Rony Wijaya

Department of Statistics, Universitas Sebelas Maret

Author-ID : 10182579

Mathematics

Published : 1 Documents Claim Missing Document

Claim Missing Document

Articles

A Comparative Study of PCA-Based Dimensionality Reduction and Best Subset Selection in Disease Classification Andreas Rony Wijaya; Atika Ratna Dewi; Muhammad Bayu Nirwana; Respatiwulan Respatiwulan; Sri Sulistijowati Handajani
JTAM (Jurnal Teori dan Aplikasi Matematika) Vol 10, No 3 (2026): July
Publisher : Universitas Muhammadiyah Mataram

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.31764/jtam.v10i3.38265

Real-world datasets often contain many variables, some of which may be irrelevant or redundant. To build an effective classification model, it is important to simplify the data by keeping only the most influential features. One common approach that can be used for selecting the most influential variables is feature selection. However, when dealing with many variables, removing some may result in the loss of information. Hence, it is also necessary to consider methods that can simplify the model while retaining most of the information from the original variables. Dimensionality reduction is one such approach that effectively addresses this issue. This study employs a comparative quantitative research approach to evaluate the effectiveness of principal component analysis (PCA) as a dimensionality reduction method and best subset selection as a feature selection method in improving classification performance. The study utilizes a heart disease dataset from the UCI Machine Learning Repository consisting of 303 observations and 13 predictor variables as a case study. Both approaches are applied to reduce the number of predictor variables and make the model more interpretable. After applying both methods, three classification models — logistic regression, naïve Bayes, and linear discriminant analysis — are trained and evaluated using accuracy, recall, precision, and F1-score, and the results are further illustrated through ROC curves. Feature selection using best-subset selection yields seven variable combinations with the most significant predictors, whereas PCA requires eight principal components to explain 80% of the total variation. The best classification performance was obtained using the feature-selected dataset, achieving an accuracy of 87% and an AUC of 0.93, outperforming both the original dataset model and the PCA-reduced dataset model. These results show that feature selection using best subset selection provides a better balance between simplicity and classification performance. Furthermore, the models obtained after feature reduction, both from best subset selection and PCA, still maintain good predictive ability as indicated by their relatively high AUC values.

Co-Authors Atika Ratna Dewi Muhammad Bayu Nirwana Respatiwulan Respatiwulan Sri Sulistijowati Handajani

Title

Found 1 Documents
Search

Abstract

Title Search

Found 1 Documents Search

Abstract

Title

Found 1 Documents
Search