Scientific Journal of Informatics
Vol 10, No 1 (2023): February 2023

Simulation Study of Imbalanced Classification on High-Dimensional Gene Expression Data

Masithoh Yessi Rochayani (Department of Statistics, Universitas Diponegoro, Indonesia)
Umu Sa'adah (Department of Mathematics, Universitas Brawijaya, Indonesia)
Ani Budi Astuti (Department of Statistics, Universitas Brawijaya, Indonesia)



Article Info

Publish Date
01 Feb 2023

Abstract

Purpose: Classification of gene expression helps study disease. However, it faces two obstacles: an imbalanced class and a high dimension. The motivation of this study is to examine the effectiveness of undersampling before feature selection on high-dimensional data with imbalanced classes.Methods: Least Absolute Shrinkage and Selection Operator (Lasso), which can select features, can handle high-dimensional data modeling. Random undersampling (RUS) can be used to deal with imbalanced classes. The Classification and Decision Tree (CART) algorithm is used to construct a classification model because it can produce an interpretable model. Thirty simulated datasets with varying imbalance ratios are used to test the proposed approaches, which are Lasso-CART and RUS-Lasso-CART. The simulated data are generated from parameters of real gene expression data.Results: The simulation study results show that when the minority class accounts for more than 25% of the observation size, the Lasso-CART method is appropriate. Meanwhile, RUS-Lasso-CART is effective when the minority class size is at least 20 observations.Novelty: The novelty of this simulation study is using the RUS-Lasso-CART hybrid method to address the classification problem of high-dimensional gene expression data with imbalanced classes.

Copyrights © 2023






Journal Info

Abbrev

SJI

Publisher

Subject

Computer Science & IT

Description

Scientific Journal of Informatics published by the Department of Computer Science, Semarang State University, a scientific journal of Information Systems and Information Technology which includes scholarly writings on pure research and applied research in the field of information systems and ...