Accurate weather predictions play an important role in assisting strategic decisions in various fields, from agriculture to disaster management. However, there is a fundamental challenge in creating automatic prediction models, namely the nature of meteorological datasets, which are often imbalanced in class distribution. This phenomenon causes conventional machine learning algorithms to favor the dominant class and be less capable of detecting the rare class (rain), as seen in the low sensitivity values. This study aims to overcome this bias problem and improve the accuracy of daily rainfall classification using a comparative approach with four algorithms: Random Forest, K-Nearest Neighbor (KNN), LightGBM, and XGBoost. As the main method to overcome data imbalance, the Synthetic Minority Over-sampling Technique (SMOTE) was applied to generate new samples in the underrepresented class. Model performance was evaluated comprehensively using a confusion matrix, One-vs-Rest (OvR) strategy, and conventional evaluation metrics. The results of the experiments on the baseline model showed a failure to detect the minority class with very low Recall and F1-Score values (< 0.30). The application of SMOTE was proven to significantly improve Recall and F1-Score compared to the SMOTE. LightGBM using SMOTE was recorded as the most superior model that successfully balanced all evaluation metrics.
Copyrights © 2026