Indonesian Applied Research Computing and Informatics
Vol. 1 No. 2: December (2025)

Impact of Data Normalization on K-Nearest Neighbor Classification Performance: A Case Study on Date Fruit Dataset

Muhammad Jauhar Vikri (Computer science, Universitas Nahdatul Ulama Sunan Giri)
Afril Efan Pajri (Computer System, Universitas Nahdatul Ulama Sunan Giri)
Putri Liana (Computer System, Universitas Nahdatul Ulama Sunan Giri)



Article Info

Publish Date
25 Jan 2026

Abstract

Data normalization is a crucial preprocessing step for distance-based classification algorithms such as K-Nearest Neighbor (KNN), as differences in feature scales can significantly affect distance calculations and classification accuracy. This study investigates the impact of data normalization on KNN classification performance using the Date Fruit Dataset as a case study. Three preprocessing scenarios are evaluated: raw data without normalization, Min–Max normalization, and Z-score standardization. In addition, the performance of standard KNN is compared with distance-weighted KNN to assess the contribution of distance weighting under different preprocessing conditions. The experiments are conducted using stratified 10-fold cross-validation, and model performance is evaluated using accuracy and standard deviation. Statistical significance of performance differences is examined using paired t-test, and sensitivity analysis is performed to analyze the effect of varying the number of nearest neighbors. The results show that data normalization leads to a substantial improvement in classification performance compared to raw data. Z-score standardization achieves the highest and most stable accuracy, followed by Min–Max normalization. Distance-weighted KNN consistently produces slightly higher accuracy than standard KNN; however, the improvement is not statistically significant after normalization. Sensitivity analysis indicates that normalized data results in a wider and more stable range of optimal k values. These findings demonstrate that data normalization plays a more dominant role than distance weighting in improving KNN performance. The study provides empirical evidence that proper preprocessing is essential for reliable KNN-based classification and establishes a robust baseline for further enhancements such as feature weighting and metaheuristic optimization.

Copyrights © 2025






Journal Info

Abbrev

iarci

Publisher

Subject

Computer Science & IT Control & Systems Engineering

Description

Focus and Scope Indonesian Applied Research Computing and Informatics Indonesian Applied Research Computing and Informatics is a scientific journal that publishes applied research in the fields of computing and informatics. The journal aims to serve as a platform for academics, researchers, and ...