IAES International Journal of Artificial Intelligence (IJ-AI)
Vol 13, No 1: March 2024

Under-sampling technique for imbalanced data using minimum sum of euclidean distance in principal component subset

Kasemtaweechok, Chatchai (Unknown)
Suwannik, Worasait (Unknown)



Article Info

Publish Date
01 Mar 2024

Abstract

Imbalanced datasets are characterized by a substantially smaller number of data points in the minority class compared to the majority class. This imbalance often leads to poor predictive performance of classification models when applied in real-world scenarios. There are three main approaches to handle imbalanced data: over-sampling, under-sampling, and hybrid approach. The over-sampling methods duplicate or synthesize data in the minority class. On the other hand, the under-sampling methods remove majority class data. Hybrid methods combine the noise-removing benefits of under-sampling the majority class with the synthetic minority class creation process of over-sampling. In this research, we applied principal component (PC) analysis, which is normally used for dimensionality reduction, to reduce the amount of majority class data. The proposed method was compared with eight state-of-the-art under-sampling methods across three different classification models: support vector machine, random forest, and AdaBoost. In the experiment, conducted on 35 datasets, the proposed method had higher average values for sensitivity, G-mean, the Matthews correlation coefficient (MCC), and receiver operating characteristic curve (ROC curve) compared to the other under-sampling methods. 

Copyrights © 2024






Journal Info

Abbrev

IJAI

Publisher

Subject

Computer Science & IT Engineering

Description

IAES International Journal of Artificial Intelligence (IJ-AI) publishes articles in the field of artificial intelligence (AI). The scope covers all artificial intelligence area and its application in the following topics: neural networks; fuzzy logic; simulated biological evolution algorithms (like ...