Claim Missing Document
Check
Articles

Found 4 Documents
Search

Hybrid methods to identify ovarian cancer from imbalanced high-dimensional microarray data Sapitri, Ni Kadek Emik; Sa'adah, Umu; Shofianah, Nur
IAES International Journal of Artificial Intelligence (IJ-AI) Vol 14, No 2: April 2025
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijai.v14.i2.pp1173-1182

Abstract

Scientists have used microarray data to identify healthy people and patients with various types of cancer, including ovarian cancer. Ovarian cancer is the most dangerous of all types of cancer that attacks the female reproductive organ. The right combination of methods is needed to identify ovarian cancer from microarray data because that type of data is high-dimensional and imbalanced. This research aims to propose two hybrid methods which are a combination of infinite feature selection (IFS) as features selector with classification and regression tree (CART) as a classifier. IFS can work with two separate scenarios, namely supervised infinite feature selection (SIFS) and unsupervised infinite feature selection (UIFS). This research also compares the performance of the two hybrid methods proposed (SIFS-CART and UIFS-CART) with CART without IFS. The data used is OVA_ovary that has 10937 columns and 1545 rows. The results shows that SIFS-CART achieves maximum performance using 1000 features and UIFS-CART 5000 features. CART without IFS uses all 10935 features. The balanced accuracy results show SIFS-CART can outperform CART without IFS and UIFS-CART. Using less features to get highest balanced accuracy results, SIFS is more effective in performing feature selection on the OVA_ovary dataset compared to UIFS.
THE STUDY OF ECCENTRICITY SPECTRUM AND ENERGY IN PATH AND CYCLE GRAPHS Sapitri, Ni Kadek Emik; Krisnawati, Vira Hari
BAREKENG: Jurnal Ilmu Matematika dan Terapan Vol 17 No 4 (2023): BAREKENG: Journal of Mathematics and Its Applications
Publisher : PATTIMURA UNIVERSITY

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30598/barekengvol17iss4pp2081-2094

Abstract

The eccentricity matrix is one of matrices to represent graphs. The eccentricity matrix is used as a basis for calculating the eccentricity spectrum and energy. This article aims to study the concepts of eccentricity spectrum and energy in simple graphs. For special cases, we also discuss eccentricity spectrum and energy of paths and cycles. All studies in this article focus on providing some examples to facilitate the reader's understanding of the concepts studied. In addition, this article also corrects the mistakes in the lemma about eccentricity spectrum of paths and theorem about eccentricity energy of odd-order cycles from reference articles. Corrections are made by indicating where the errors are in the referenced articles, providing counter examples, correcting inaccurate lemmas and theorems, and giving short proofs. At the end of the article, an open problem is also included to provide an overview of research ideas that can be developed from the concepts of eccentricity spectrum and energy.
IDENTIFYING IMPORTANT GENES IN OVARIAN CANCER FROM HIGH-DIMENSIONAL MICROARRAY DATA USING SIFS-CART METHOD Sapitri, Ni Kadek Emik; Sa'adah, Umu; Shofianah, Nur
BAREKENG: Jurnal Ilmu Matematika dan Terapan Vol 18 No 3 (2024): BAREKENG: Journal of Mathematics and Its Application
Publisher : PATTIMURA UNIVERSITY

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30598/barekengvol18iss3pp1909-1918

Abstract

Ovarian cancer can be identified from microarray data using machine learning. Many studies only focus on improving the machine learning classification algorithms to achieve higher performance. The purpose of classification is not only to obtain high performance but also to seek new knowledge from the results. This research focuses on both. By using a hybrid Supervised Infinite Feature Selection (SIFS) method with Classification and Regression Tree (CART) or SIFS-CART, this research aims to predict ovarian cancer and identify potential genes for ovarian cancer cases. The data used is the OVA_ovary dataset. SIFS in the best SIFS-CART model reduced 10935 genes in the initial OVA_ovary dataset to 1000 genes. Then, CART was built with these 1000 genes. Based on the balanced accuracy (BA) metric for imbalanced microarray data, the best SIFS-CART model achieves 85.7% BA in training and 83.2% in testing. The optimal CART in the best SIFS-CART model only needs four genes from 1000 selected genes to build it. Those genes are STAR, WT1, PEG3, and ASPN. Based on studies of several pieces of literature in the medical field, it can be concluded that STAR, WT1, and PEG3 play an important role in ovarian cancer cases. However, the relationship between ASPN and ovarian cancer in more detail has not been studied by medical researchers.
Knowledge Discovery from Confusion Matrix of Pruned CART in Imbalanced Microarray Data Ovarian Cancer Classification Sapitri, Ni Kadek Emik; Sa’adah, Umu; Shofianah, Nur
Scientific Journal of Informatics Vol 11, No 1 (2024): February 2024
Publisher : Universitas Negeri Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15294/sji.v11i1.50077

Abstract

Purpose: The results of microarray data analysis is important in cancer diagnosis, especially in early stages asymptomatic cancers like ovarian cancer. One of the challenges in analyzing microarray data is the problem of imbalanced data. Unfortunately, research that carries out cancer classification from microarray data often ignores this challenge, so that it doesn’t use appropriate evaluation metrics. It makes the results biased towards the majority class. This study uses a popular evaluation metric “accuracy” and an evaluation metric that is suitable for imbalanced data “balanced accuracy (BA)” to gain information from the confusion matrix regarding accuracy and BA values in case of ovarian cancer classification.Methods: This study use Classification and Regression Tree (CART) as the classifier. CART optimized by pruning. CART optimal is determined from the results of CART complexity analysis and confusion matrix.Results: The confusion matrix and CART interpretations in this research show that CART with low complexity is still able to predict majority class respondents well. However, when none of the data in the minority class was classified correctly, the accuracy value was still quite high, namely 86.97% and 88.03% respectively at the training and testing stages, while the BA value at both stages was only 50%.Novelty: It is very important to ensure that the evaluation metrics used match the characteristics of the data being processed. This research illustrate the difference between accuracy and BA. It concluded that that classification of an imbalanced dataset without doing resampling can use BA as evaluation metric, because based on the results, BA is more fairly to both classes.