Femi Dwi Astuti
Informatic, Universitas Teknologi Digital Indonesia, Daerah Istimewa Yogyakarta

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

THE EFFECTS OF FEATURE SELECTION METHODS ON THE CLASSIFICATIONS OF IMBALANCED DATASETS Femi Dwi Astuti; Indra Yatini Buryadi
IJISCS (International Journal of Information System and Computer Science) Vol 6, No 3 (2022): IJISCS (International Journal of Information System and Computer Science)
Publisher : Bakti Nusantara Institute

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.56327/ijiscs.v6i3.1279

Abstract

imbalanced data often results in less than optimal classification. Also, datasets with a large number of attributes tends to make the classification results not too good, and in order get better classification accuracy results, one thing that could be done is to perform pre-processing to select the features to be used in the classification. This research uses information gain and gain ratio feature selection algorithms for the pre-processing stage prior to classification, and Naïve Bayes algorithm for the classification. The test is performed to determine the values of accuracy, precision, recall from the classification process without feature selection; accuracy value with information gain feature selection; accuracy value with gain ratio; and accuracy value with CBFS feature selection. The results are then compared to determine which feature selection algorithm gives the best results when applied to data with imbalanced classes. The results showed that the classification accuracy on the default of credit card client dataset using Nave Bayes algorithm was 64.27%. The information gain feature selection was able to increase the accuracy by 5.27% (from 64.27% to 69.54%), while the gain ratio feature selection was able to increase the accuracy by 14.19% (from 64.27% to 78.46%). In this case, the gain ratio is more suitable for data with greatly varied attribute values.