Jurnal Nasional Pendidikan Teknik Informatika (JANAPATI)
Vol. 13 No. 2 (2024)

A Comparative Study on the Impact of Feature Selection and Dataset Resampling on the Performance of the K-Nearest Neighbors (KNN) Classification Algorithm

Gunadi, I Gede Aris (Unknown)
Rachmawati, Dewi Oktofa (Unknown)



Article Info

Publish Date
31 Jul 2024

Abstract

This study aims to evaluate the impact of dataset balancing and feature selection on the performance of the K-Nearest Neighbors (KNN) classification algorithm. The primary objective is to determine the effect of different training data balance ratios on classification performance. Additionally, the study analyzes the contribution of feature selection methods and data balancing to the overall performance of the classification algorithm. Three datasets (Titanic, Wine Quality, and Heart Diseases) sourced from Kaggle, were utilized in this research. Following the preprocessing stage, the datasets were subjected to three resampling scenarios with balance ratios of 0.3, 0.6, and 0.9. Feature selection was performed by combining correlation test values and information gain values, each weighted at 50%. The selected features were those with positive combined values of summation, correlation, and information gain. The KNN classification algorithm was then applied to datasets with and without feature selection. The results indicate that achieving a perfectly balanced ratio (ratio = 1) is not essential for improving classification performance. A balance ratio of 0.6 yielded results comparable to those of a perfect balance ratio. Furthermore, the findings demonstrate that feature selection has a more significant impact on classification performance than data balancing. Specifically, data with a balance ratio of 0.3 and feature selection outperformed data with a balance ratio of 0.6 but without feature selection.

Copyrights © 2024






Journal Info

Abbrev

janapati

Publisher

Subject

Computer Science & IT Education Engineering

Description

Jurnal Nasional Pendidikan Teknik Informatika (JANAPATI) is a collection of scientific articles in the field of Informatics / ICT Education widely and the field of Information Technology, published and managed by Jurusan Pendidikan Teknik Informatika, Fakultas Teknik dan Kejuruan, Universitas ...