In data mining, there is a classification method. One of the problems often experienced in data mining classification is class imbalance. Class imbalance is a condition where the distribution of the dataset is uneven, meaning that it is divided into the majority class and the minority class with varying degrees of severity. The minority class is often misclassified because the majority class will be overclassified. This problem makes the classification process difficult and results in sub-optimal classification performance. Due to an imbalance, the classification will produce much higher accuracy for the majority class than for the minority class. This study aims to apply Random Oversampling, Chi-Square, and AdaBoost in overcoming class imbalances to optimize the performance of the C5.0 classification. In dealing with unbalanced datasets, performance appraisal needs to focus more on the positive class. So that the metric that is more suitable for assessing the classification results of unbalanced datasets is recall/sensitivity/TPR. The results showed that the application of Random Oversampling alone was able to improve the recall/sensitivity/TPR performance of standard C5.0. The application of Chi-Square alone has not been able to improve the performance of the C5.0 classification, but it has increased after the application of Random Oversampling. The combination of the three, namely Random Oversampling, Chi-Square, and AdaBoost able to increase the recall/sensitivity/TPR value of the standard C5.0.
Copyrights © 2023