Knowledge Engineering and Data Science
Vol 6, No 2 (2023)

Comparison of Machine Learning Algorithms for Species Family Classification using DNA Barcode

Riza, Lala Septem (Unknown)
Rahman, M Ammar Fadhlur (Unknown)
Prasetyo, Yudi (Unknown)
Zain, Muhammad Iqbal (Unknown)
Siregar, Herbert (Unknown)
Hidayat, Topik (Unknown)
Samah, Khyrina Airin Fariza Abu (Unknown)
Rosyda, Miftahurrahma (Unknown)



Article Info

Publish Date
03 Oct 2023

Abstract

Classifying plant species within the Liliaceae and Amaryllidaceae families presents inherent challenges due to the complex genetic diversity and overlapping morphological traits among species. This study explores the difficulties in accurate classification by comparing 11 supervised learning algorithms applied to DNA barcode data, aiming to enhance the precision of species family classification in these taxonomically intricate plant families. The ribulose-1,5-bisphosphate carboxylase-oxygenase large sub-unit (rbcL) gene, selected as a DNA barcode locus for plants, is used to represent species within the Amaryllidaceae and Liliaceae families. The experimental results demonstrate that nearly all tested models achieve accurate species classification into the appropriate families, with an accuracy rate exceeding 97%, except for the Naïve Bayes model. Regarding computational time, the Random Forest model requires significantly more time for training than other models. Regarding memory usage, the Least Squares Support Vector Machine with a polynomial kernel, and Regularized Logistic Regression consume more memory than other models. These machine learning models exhibit strong concordance with NCBI's classifications when predicting families using the test dataset, effectively categorizing species into the Amaryllidaceae and Liliaceae families.

Copyrights © 2023






Journal Info

Abbrev

keds

Publisher

Subject

Computer Science & IT Engineering

Description

Knowledge Engineering and Data Science (2597-4637), KEDS, brings together researchers, industry practitioners, and potential users, to promote collaborations, exchange ideas and practices, discuss new opportunities, and investigate analytics frameworks on data-driven and knowledge base ...