Dwianto, Rio
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

The Effect of Feature Selection on Machine Learning Classification Pardede, Jasman; Dwianto, Rio
JOIV : International Journal on Informatics Visualization Vol 9, No 4 (2025)
Publisher : Society of Visual Informatics

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.62527/joiv.9.4.2926

Abstract

High-dimensional datasets can lead to overfitting and computationally expensive model building on machine learning. This study uses a dimensionality reduction technique, namely feature selection techniques, to overcome these problems. Five feature selection methods were used, i.e., Chi-Square (CS), Information Gain (IG), Genetic Algorithm (GA), Particle Swarm Optimization (PSO), and Least Absolute Shrinkage and Selection Operator (LASSO), and three classifier methods viz. Naïve Bayes, Extreme Gradient Boosting (XGB), and RF Classifier. The dataset used is the Heart Attack Analysis & Prediction Dataset. In this study, three scenarios of the best feature selection were carried out, namely: 1. selection of the best feature using a specific feature selection, 2. the intersection of selection of the best feature from the same category, 3. the intersection of selection of the best feature from the five proposed feature selection methods. The performance model is measured using accuracy, precision, recall, f1-score, AUC, and training time. This study reveals that feature selection is very effective in improving the performance of prediction models. Based on the experiment results, the best feature selection is CS and IG in the Filter Category with the XGB model. The best feature selected improved the performance of accuracy, precision, recall, f1-score, and AUC, i.e., 1.7%, 1%, 2.3%, 1.6%, and 0.2%, respectively. Meanwhile, training time requirements decreased by 23.5%. Feature selection with specific techniques performs better than feature selection by selecting the best features from the same category feature selection technique or various other feature selection methods.