Widyawati, Amalia Safira
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Performance of Multivariate Missing Data Imputation Methods on Climate Data Widyawati, Amalia Safira; Fitrianto, Anwar; Silvianti, Pika
Journal of Applied Informatics and Computing Vol. 9 No. 6 (2025): December 2025
Publisher : Politeknik Negeri Batam

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30871/jaic.v9i6.11316

Abstract

Climate data plays an important role in various aspects of life. However, missing data is often found, which can interfere with data processing and reduce the quality of analysis. Therefore, appropriate handling methods are needed to ensure that the analysis results remain valid. This study aims to compare the performance of several imputation methods for missing multivariate data based on the identification of actual missing data patterns, and to determine the appropriate imputation method based on the mechanism of missing data. This study also aims to apply the best method to data with actual missing data patterns to assess its effect on descriptive statistical changes required for further climatological analysis. The methods used include monthly averages, missRanger, k-Nearest Neighbor (k-NN), and Iterative Robust-Model Imputation (IRMI). The missing data information was obtained from Global Surface Summary of the Day (GSOD) data, namely temperature, precipitation, humidity, pressure, and wind speed variables with a daily frequency for 11 years, with a missing data proportion of 11.4%. The missing data patterns were then applied to relatively complete NASA Power data to evaluate the imputation results. The results show that IRMI is less capable of handling extreme missing data conditions, namely 17 completely missing rows. In contrast, k-NN, missRanger, and monthly averages provided better results in both extreme and non-extreme conditions. Of the four methods, monthly averages were chosen because they were able to overcome missing data while maintaining multivariate structure with 58% on sMAPE and 2.64% on relative difference.