Floods are natural disasters that often occur in Indonesia, one of which is the city of Samarinda which experienced a significant increase in flood cases in 2018-2021. The use of machine learning, especially the Support Vector Machine (SVM) algorithm, aims to accurately predict future flood events, but the main problem faced is data imbalance and high-dimensional data. This research combines SVM with Random Oversampling (ROS) oversampling techniques and Recursive Feature Elimination (RFE) feature selection to overcome data imbalance and high-dimensional data, with the aim of increasing the classification accuracy of Samarinda City flood data. The cross validation method is with 10-fold cross-validation, and the model performance is evaluated with a confusion matrix to calculate the accuracy value. The data used was obtained from BPDB and BMKG Samarinda City for the 2021-2023 period, consisting of 11 attributes and 1095 lines of data. The research results show that RFE succeeded in identifying the five most important features, namely minimum temperature (Tn), maximum temperature (Tx), average temperature (Tavg), humidity (RH_avg) and maximum wind direction (ddd_x). With the combination of SVM, ROS, and RFE models, flood data classification accuracy increased by 0.78% from 97.14% to 97.92%.
Copyrights © 2024