Poltak Poltak
Program Studi S2 Teknik Informatika, Fakultas Ilmu Komputer dan Teknologi Informasi, Universitas Sumatera Utara, Indonesia

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

COMPARISON OF FEATURE SELECTION TO PERFORMANCE IMPROVEMENT OF K-NEAREST NEIGHBOR ALGORITHM IN DATA CLASSIFICATION Iswanto Iswanto; Tulus Tulus; Poltak Poltak
Jurnal Teknik Informatika (Jutif) Vol. 3 No. 6 (2022): JUTIF Volume 3, Number 6, December 2022
Publisher : Informatika, Universitas Jenderal Soedirman

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.20884/1.jutif.2022.3.6.471

Abstract

One of the most widely used data classification methods is the K-Nearest Neighbor (K-NN) algorithm. Classification of data in this method is carried out based on the calculation of the closest distance to the training data as much as the value of K from its neighbors. Then the new data class is determined using the most votes system from the number of K nearest neighbors. However, the performance of this method is still lower than other data classification methods. The cause is the use of the most voting system in determining new data classes and the influence of features less relevant to the dataset. This study compares several feature selection methods in the data set to see their effects on the performance of the K-NN algorithm in data classification. The feature selection methods in this research are Information gain, Gain ratio, and Gini index. The method was tested on the Water Quality dataset from the Kaggle Repository to see the most optimal feature selection method. The test results on the dataset show that the use of the feature selection method affects to increase the performance of the K-NN algorithm. The average increase in the accuracy value obtained from the value of K=1 to K=15 is the Information Gain increased by 1.17%, Gain ratio increased by 0.69%, and the Gini index increased by 1.19%. The highest accuracy value in the classification of the Water Quality dataset is 89.66% at K=13 with the Information Gain feature selection method.