The challenge that is often faced in large-scale company-based surveys is non-response. Non-response is one of the causes of missing data. One way that can be used to deal with missing data is to perform data imputation. So far, data imputation in the Annual Survey of Manufacturing Industries (STPIM) has been carried out by using a combination of various approaches, including historical, auxiliary unit, and clerical imputation. These methods, however, tend to be inefficient and are unable to measure the outcomes’ accuracy. In light of these limitations, we aim to introduce an innovative approach in data imputation processing by utilizing machine learning. By comparing various machine learning methods, we obtained the results that K-nearest neighbors imputation is the best method in terms of accuracy in imputing output data in STPIM 2021. Meanwhile, in terms of computing performance, linear Support Vector Machine (SVM) gives the most efficient processing time.
Copyrights © 2024