Claim Missing Document
Check
Articles

Found 2 Documents
Search
Journal : JOIV : International Journal on Informatics Visualization

Hybrid Approach with Distance Feature for Multi-Class Imbalanced Datasets Hartono, Hartono; Ongko, Erianto
JOIV : International Journal on Informatics Visualization Vol 7, No 1 (2023)
Publisher : Society of Visual Informatics

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30630/joiv.7.1.1292

Abstract

The multi-class imbalance problem has a higher level of complexity when compared to the binary class problem. The difficulty is due to the large number of classes that will present challenges related to overlapping between classes. Many approaches have been proposed to deal with these multi-class problems. One is a hybrid approach combining a data-level approach and an algorithm-level approach. This approach is done by the ensemble on the classifier and also oversampling on the minority class. SMOTE is an oversampling method that provides good performance, but this method is necessary to determine the best sample used in the interpolation process to generate new samples. The need for determining the best sample is related to the overlap between classes that always accompanies the multi-class imbalance problem. The existence of overlap requires efforts to determine the safe region to synthesize the sample in the oversampling process in SMOTE. The safe region is considered the best for synthesizing samples due to the lower tendency of overlapping. It can be done by constructing distance features to determine the safe region. The sample with the best distance and the lowest imbalance ratio will be selected as a sample in the over-sampling process with SMOTE. The main contribution of this research is the proposed method of Hybrid Approach with Distance Feature so that it can determine safe samples, with the main advantage being in addition to handling multi-class imbalances, it is also better for handling overlapping. The results of this study will be compared with Multiple Random Balance (MultiRandBal) which performs a random oversampling process. The results showed that the Augmented R-Value, Class Average Accuracy, Class Balance Accuracy, and Hamming Loss obtained in this method was better than the random oversampling process. These results also show that the Hybrid Approach with Distance Feature provides better results in handling multi-class imbalances when compared to MultiRandBal.
Avoiding Overfitting dan Overlapping in Handling Class Imbalanced Using Hybrid Approach with Smoothed Bootstrap Resampling and Feature Selection Hartono, Hartono; Ongko, Erianto
JOIV : International Journal on Informatics Visualization Vol 6, No 2 (2022)
Publisher : Society of Visual Informatics

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30630/joiv.6.2.985

Abstract

The dataset tends to have the possibility to experience imbalance as indicated by the presence of a class with a much larger number (majority) compared to other classes(minority). This condition results in the possibility of failing to obtain a minority class even though the accuracy obtained is high. In handling class imbalance, the problems of diversity and classifier performance must be considered. Hence, the Hybrid Approach method that combines the sampling method and classifier ensembles presents satisfactory results. The Hybrid Approach generally uses the oversampling method, which is prone to overfitting problems. The overfitting condition is indicated by high accuracy in the training data, but the testing data can show differences in accuracy. Therefore, in this study, Smoothed Bootstrap Resampling is the oversampling method used in the Hybrid Approach, which can prevent overfitting. However, it is not only the class imbalance that contributes to the decline in classifier performance. There are also overlapping issues that need to be considered. The approach that can be used to overcome overlapping is Feature Selection. Feature selection can reduce overlap by minimizing the overlap degree. This research combined the application of Feature Selection with Hybrid Approach Redefinition, which modifies the use of Smoothed Bootstrap Resampling in handling class imbalance in medical datasets. The preprocessing stage in the proposed method was carried out using Smoothed Bootstrap Resampling and Feature Selection. The Feature Selection method used is Feature Assessment by Sliding Thresholds (FAST). While the processing is done using Random Under Sampling and SMOTE. The overlapping measurement parameters use Augmented R-Value, and Classifier Performance uses the Balanced Error Rate, Precision, Recall, and F-Value parameters. The Balanced Error Rate states the combined error of the majority and minority classes in the 10-Fold Validation test, allowing each subset to become training data. The results showed that the proposed method provides better performance when compared to the comparison method