The occurrence of missing values in data mining is a significant challenge that can hinder the knowledge extraction process. Incomplete data not only reduces efficiency in data management and analysis, but also has the potential to bias decision-making. This study aims to improve the performance of the C4.5 algorithm in dealing with missing value problems through the application of imputation techniques and GridSearchCV optimization. In this study, we propose an approach to handling missing values by combining several imputation methods, including minimum, maximum, mean-mode, median, and k-Nearest Neighbors (k-NN). These methods are applied to the Chronic Kidney Disease dataset obtained from the UCI Repository. After the imputation process, we performed hyperparameter optimization using GridSearchCV to find the best parameter combination for the C4.5 algorithm. Experimental results show that the application of imputation techniques and GridSearchCV optimization significantly improves the classification accuracy of the C4.5 algorithm. The comparison results show that the application of missing value handling, combined with GridSearchCV optimization, successfully improves the accuracy of the model by 2.25% compared to without using missing values. This proves that handling missing values along with proper GridSearchCV optimization can improve the prediction quality of the model.
Copyrights © 2025