Bulletin of Electrical Engineering and Informatics
Vol 11, No 2: April 2022

Profiling DNA Sequence of SARS-Cov-2 Virus Using Machine Learning Algorithm

Lailil Muflikhah (Brawijaya University)
Muh. Arif Rahman (Brawijaya University)
Agus Wahyu Widodo (Brawijaya University)



Article Info

Publish Date
01 Apr 2022

Abstract

Corona virus disease-19 (COVID-19) is growing rapidly because it is an infectious disease. This disease is caused by a virus belonging to the type of DNA virus with very diverse genetics. This study proposes a feature extraction method using k-mer to obtain nucleotide frequencies in protein coding. In profiling viral DNA sequences, this study proposes to obtain similarity by country using hierarchical k-means, where the results are averaged by the hierarchical clustering method and then find the initial cluster center. The experimental results show that the silhouette, purity, and entropy are 0.867, 0.208, and 0.892, respectively. Then, we apply the Gini index feature selection to find the important components as characteristics in each country. The selected components are implemented using the ensemble method, Random Forest, to evaluate their performance. The experimental results showed high performance, including sensitivity, accuracy, specificity, and area under the curve (AUC).

Copyrights © 2022






Journal Info

Abbrev

EEI

Publisher

Subject

Electrical & Electronics Engineering

Description

Bulletin of Electrical Engineering and Informatics (Buletin Teknik Elektro dan Informatika) ISSN: 2089-3191, e-ISSN: 2302-9285 is open to submission from scholars and experts in the wide areas of electrical, electronics, instrumentation, control, telecommunication and computer engineering from the ...