Suwannik, Worasait
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Under-sampling technique for imbalanced data using minimum sum of euclidean distance in principal component subset Kasemtaweechok, Chatchai; Suwannik, Worasait
IAES International Journal of Artificial Intelligence (IJ-AI) Vol 13, No 1: March 2024
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijai.v13.i1.pp305-318

Abstract

Imbalanced datasets are characterized by a substantially smaller number of data points in the minority class compared to the majority class. This imbalance often leads to poor predictive performance of classification models when applied in real-world scenarios. There are three main approaches to handle imbalanced data: over-sampling, under-sampling, and hybrid approach. The over-sampling methods duplicate or synthesize data in the minority class. On the other hand, the under-sampling methods remove majority class data. Hybrid methods combine the noise-removing benefits of under-sampling the majority class with the synthetic minority class creation process of over-sampling. In this research, we applied principal component (PC) analysis, which is normally used for dimensionality reduction, to reduce the amount of majority class data. The proposed method was compared with eight state-of-the-art under-sampling methods across three different classification models: support vector machine, random forest, and AdaBoost. In the experiment, conducted on 35 datasets, the proposed method had higher average values for sensitivity, G-mean, the Matthews correlation coefficient (MCC), and receiver operating characteristic curve (ROC curve) compared to the other under-sampling methods.