Missing data is a problem in data processing that can reduce the quality of analysis results if not addressed. This study aims to evaluate the performance of two imputation methods, namely Random Forest Imputation (RF) and Classification and Regression Tree (CART), at various levels of missing value proportions, namely 5%, 10%, 15%, and 20%. The data used in this study are Bivariate Gamma data of 200 observations with two variables, which were generated using RStudio software. The evaluation was carried out based on the correlation value between the imputed data and the original data, as well as the error measures Mean Absolute Percentage Error (MAPE) and Root Mean Square Error (RMSE). The results showed that at the missing value levels of 5% and 10%, the CART method produced the smallest MAPE and RMSE values, so that the CART method was the best method, although there was no significant difference between the RF method and the 10% missing value data. At 15% and 20% missing values, the RF method demonstrated superior performance with smaller MAPE and RMSE values compared to CART. Overall, the CART method is more suitable for use with a low proportion of missing values, while the RF method provides more stable performance at a high proportion of missing values. The results of this study provide recommendations for selecting a more appropriate imputation method based on the level of missing data.
Copyrights © 2025