Nusantara Science and Technology Proceedings
Multi-Conference Proceeding Series E

Feature Reduction of Lung Cancer Microarray Data Using Mutual Information Selection and PyCaret-Supported Recursive Feature Elimination

Andrew Jonathan Brahms Simangunsong (Universitas Indonesia, Depok 16911, Indonesia)
Valha Tsabita Hidayat (Universitas Indonesia, Depok 16911, Indonesia)



Article Info

Publish Date
21 Dec 2023

Abstract

Lung cancer remains a leading cause of cancer-related mortality worldwide, and Indonesia's ever-increasing amount of pollution signals an urgency for improvement in lung cancer early detection. One of the methods to detect lung cancer is molecular diagnosis using DNA microarray, which has been proven to be effective. However, the complexity of microarray data with a vast number of features hinders the timely and accurate detection of lung cancer. This study seeks to optimize the features of the data to improve classification performance. Our approach combines Mutual Information Feature Selection with Recursive Feature Elimination, leveraging the PyCaret library to train and evaluate machine learning models. The process involves initial feature reduction using Mutual Information to enhance computational efficiency, followed by training machine learning models with PyCaret. The two best-performing models for each dataset are used to perform recursive feature elimination to search for the most optimal feature. A support vector machine is also used for comparison. The final output will be three subsets of features and another subset that consists of combined features of the rest of other subsets. Finally, PyCaret will be utilized again to train machine learning models with all feature subsets. The study shows that other models can select fewer features compared to the Support Vector Machine and still maintain a powerful predictive power with high accuracy (95% - 98%). In conclusion, our research offers a new approach to selecting optimal features for microarray analysis, with implications for more effective and timely cancer diagnosis.

Copyrights © 2023






Journal Info

Abbrev

nuscientech

Publisher

Subject

Agriculture, Biological Sciences & Forestry Chemical Engineering, Chemistry & Bioengineering Economics, Econometrics & Finance Engineering Law, Crime, Criminology & Criminal Justice Materials Science & Nanotechnology Medicine & Pharmacology

Description

NST Proceeding supports regional research communities to globalise their findings in Science and Technology by providing an open access, online platform in line with international publishing standards and indexing scholarly conference proceedings. The current emphasis of the NST Proceeding includes ...