Jurnal Teknik Informatika (JUTIF)
Vol. 3 No. 6 (2022): JUTIF Volume 3, Number 6, December 2022

COMPARISON OF FEATURE SELECTION TO PERFORMANCE IMPROVEMENT OF K-NEAREST NEIGHBOR ALGORITHM IN DATA CLASSIFICATION

Iswanto Iswanto (Program Studi S2 Teknik Informatika, Fakultas Ilmu Komputer dan Teknologi Informasi, Universitas Sumatera Utara, Indonesia)
Tulus Tulus (Program Studi S2 Teknik Informatika, Fakultas Ilmu Komputer dan Teknologi Informasi, Universitas Sumatera Utara, Indonesia)
Poltak Poltak (Program Studi S2 Teknik Informatika, Fakultas Ilmu Komputer dan Teknologi Informasi, Universitas Sumatera Utara, Indonesia)



Article Info

Publish Date
26 Dec 2022

Abstract

One of the most widely used data classification methods is the K-Nearest Neighbor (K-NN) algorithm. Classification of data in this method is carried out based on the calculation of the closest distance to the training data as much as the value of K from its neighbors. Then the new data class is determined using the most votes system from the number of K nearest neighbors. However, the performance of this method is still lower than other data classification methods. The cause is the use of the most voting system in determining new data classes and the influence of features less relevant to the dataset. This study compares several feature selection methods in the data set to see their effects on the performance of the K-NN algorithm in data classification. The feature selection methods in this research are Information gain, Gain ratio, and Gini index. The method was tested on the Water Quality dataset from the Kaggle Repository to see the most optimal feature selection method. The test results on the dataset show that the use of the feature selection method affects to increase the performance of the K-NN algorithm. The average increase in the accuracy value obtained from the value of K=1 to K=15 is the Information Gain increased by 1.17%, Gain ratio increased by 0.69%, and the Gini index increased by 1.19%. The highest accuracy value in the classification of the Water Quality dataset is 89.66% at K=13 with the Information Gain feature selection method.

Copyrights © 2022






Journal Info

Abbrev

jurnal

Publisher

Subject

Computer Science & IT

Description

Jurnal Teknik Informatika (JUTIF) is an Indonesian national journal, publishes high-quality research papers in the broad field of Informatics, Information Systems and Computer Science, which encompasses software engineering, information system development, computer systems, computer network, ...