Computer Science and Information Technologies
Vol 4, No 2: July 2023

Investigating the impact of data scaling on the k-nearest neighbor algorithm

Muasir Pagan (Universitas Sumatera Utara)
Muhammad Zarlis (Universitas Sumatera Utara)
Ade Candra (Universitas Sumatera Utara)



Article Info

Publish Date
01 Jul 2023

Abstract

This study investigates the impact of data scaling techniques on the performance of the k-nearest neighbor (KNN) algorithm using ten different datasets from various domains. Three commonly used data scaling techniques, min-max normalization, Z-score, and decimal scaling, are evaluated based on the KNN algorithm's performance in terms of accuracy, precision, recall, F1-score, runtime, and memory usage. The study aims to provide insights into the applicability and effectiveness of different scaling techniques in different contexts, aid in the design and implementation of machine learning systems, and help identify the strengths and weaknesses of each technique and their suitability for specific types of data. The results show that data scaling significantly affects the performance of the KNN algorithm, and the choice of scaling method can have significant implications for practical applications. Moreover, the performance of the three scaling techniques varies across different datasets, suggesting that the choice of scaling technique should be made based on the specific characteristics of the data. Overall, this study provides a comprehensive analysis of the impact of data scaling techniques on the KNN algorithm's performance and can help practitioners and researchers in the machine learning community make informed decisions when designing and implementing machine learning systems.

Copyrights © 2023






Journal Info

Abbrev

csit

Publisher

Subject

Computer Science & IT Engineering

Description

Computer Science and Information Technologies ISSN 2722-323X, e-ISSN 2722-3221 is an open access, peer-reviewed international journal that publish original research article, review papers, short communications that will have an immediate impact on the ongoing research in all areas of Computer ...