Umu Sa'adah
Department Of Mathematics, Universitas Brawijaya, Indonesia

Published : 2 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 2 Documents
Search

Knowledge discovery from gene expression dataset using bagging lasso decision tree Umu Sa'adah; Masithoh Yessi Rochayani; Ani Budi Astuti
Indonesian Journal of Electrical Engineering and Computer Science Vol 21, No 2: February 2021
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijeecs.v21.i2.pp1151-1159

Abstract

Classifying high-dimensional data are a challenging task in data mining. Gene expression data is a type of high-dimensional data that has thousands of features. The study was proposing a method to extract knowledge from high-dimensional gene expression data by selecting features and classifying. Lasso was used for selecting features and the classification and regression tree (CART) algorithm was used to construct the decision tree model. To examine the stability of the lasso decision tree, we performed bootstrap aggregating (Bagging) with 50 replications. The gene expression data used was an ovarian tumor dataset that has 1,545 observations, 10,935 gene features, and binary class. The findings of this research showed that the lasso decision tree could produce an interpretable model that theoretically correct and had an accuracy of 89.32%. Meanwhile, the model obtained from the majority vote gave an accuracy of 90.29% which showed an increase in accuracy of 1% from the single lasso decision tree model. The slightly increasing accuracy shows that the lasso decision tree classifier is stable.
Simulation Study of Imbalanced Classification on High-Dimensional Gene Expression Data Masithoh Yessi Rochayani; Umu Sa'adah; Ani Budi Astuti
Scientific Journal of Informatics Vol 10, No 1 (2023): February 2023
Publisher : Universitas Negeri Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15294/sji.v10i1.40589

Abstract

Purpose: Classification of gene expression helps study disease. However, it faces two obstacles: an imbalanced class and a high dimension. The motivation of this study is to examine the effectiveness of undersampling before feature selection on high-dimensional data with imbalanced classes.Methods: Least Absolute Shrinkage and Selection Operator (Lasso), which can select features, can handle high-dimensional data modeling. Random undersampling (RUS) can be used to deal with imbalanced classes. The Classification and Decision Tree (CART) algorithm is used to construct a classification model because it can produce an interpretable model. Thirty simulated datasets with varying imbalance ratios are used to test the proposed approaches, which are Lasso-CART and RUS-Lasso-CART. The simulated data are generated from parameters of real gene expression data.Results: The simulation study results show that when the minority class accounts for more than 25% of the observation size, the Lasso-CART method is appropriate. Meanwhile, RUS-Lasso-CART is effective when the minority class size is at least 20 observations.Novelty: The novelty of this simulation study is using the RUS-Lasso-CART hybrid method to address the classification problem of high-dimensional gene expression data with imbalanced classes.