Data availability in large observations and dimensions is known as big data. There are several problems in processing big data, such as imbalanced datasets. In classification modeling, an imbalanced dataset is a common challenge. Data class predictions are more likely to be accurate in the majority class data and inaccurate in the minority class, resulting from the problem of imbalanced data. The data-level, the algorithm-level, and the ensemble method approach are the solutions that have been extensively researched. Some methods with a data-level approach are SMOTE, Undersampling, and Oversampling. The algorithm-level method is NWKNN. And then, the ensemble approach is UnderBagging, RUSBoosting, SMOTEBoost, and SMOTEBagging. The goal of this study is to determine the best method for handling each case of the imbalanced dataset. There are three cases of imbalance, namely mild, moderate, and extreme. A simulation study was conducted for each imbalanced case to evaluate the accuracy of each method. Based on the AUC value, the SMOTEBagging method is the best for mild imbalance cases with an AUC value of 0.9581. For moderate imbalance cases, the SMOTEBagging method is the best method, with an AUC value of 0.9033. Meanwhile, for extreme imbalance cases, the UnderBagging method provides the best performance.
Copyrights © 2026