Emerging Science Journal
Vol 7, No 5 (2023): October

IRS-BAG-Integrated Radius-SMOTE Algorithm with Bagging Ensemble Learning Model for Imbalanced Data Set Classification

Lilis Yuningsih (Department of Information System, Faculty Computer and Informatics, Institut Teknologi dan Bisnis STIKOM Bali, Denpasar 80234,)
Gede Angga Pradipta (Post Graduate Department of Information System, Faculty Computer and Informatics, Institut Teknologi dan Bisnis STIKOM Bali, Denpasar 80234,)
Dadang Hermawan (Department of Digital Bussines, Faculty Bussines and Vocation, Institut Teknologi dan Bisnis STIKOM Bali Denpasar 80234,)
Putu Desiana Wulaning Ayu (Post Graduate Department of Information System, Faculty Computer and Informatics, Institut Teknologi dan Bisnis STIKOM Bali, Denpasar 80234,)
Dandy Pramana Hostiadi (Post Graduate Department of Information System, Faculty Computer and Informatics, Institut Teknologi dan Bisnis STIKOM Bali, Denpasar 80234,)
Roy Rudolf Huizen (Post Graduate Department of Information System, Faculty Computer and Informatics, Institut Teknologi dan Bisnis STIKOM Bali, Denpasar 80234,)



Article Info

Publish Date
01 Oct 2023

Abstract

Imbalanced learning problems are a challenge faced by classifiers when data samples have an unbalanced distribution among classes. The Synthetic Minority Over-Sampling Technique (SMOTE) is one of the most well-known data pre-processing methods. Problems that arise when oversampling with SMOTE are the phenomenon of noise, small disjunct samples, and overfitting due to a high imbalance ratio in a dataset. A high level of imbalance ratio and low variance conditions cause the results of synthetic data generation to be collected in narrow areas and conflicting regions among classes and make them susceptible to overfitting during the learning process by machine learning methods. Therefore, this research proposes a combination between Radius-SMOTE and Bagging Algorithm called the IRS-BAG Model. For each sub-sample generated by bootstrapping, oversampling was done using Radius SMOTE. Oversampling on the sub-sample was likely to overcome overfitting problems that might occur. Experiments were carried out by comparing the performance of the IRS-BAG model with various previous oversampling methods using the imbalanced public dataset. The experiment results using three different classifiers proved that all classifiers had gained a notable improvement when combined with the proposed IRS-BAG model compared with the previous state-of-the-art oversampling methods. Doi: 10.28991/ESJ-2023-07-05-04 Full Text: PDF

Copyrights © 2023






Journal Info

Abbrev

ESJ

Publisher

Subject

Environmental Science

Description

Emerging Science Journal is not limited to a specific aspect of science and engineering but is instead devoted to a wide range of subfields in the engineering and sciences. While it encourages a broad spectrum of contribution in the engineering and sciences. Articles of interdisciplinary nature are ...