Emerging Science Journal
Vol. 9 No. 6 (2025): December

Feature Transformation on Big Data for Species Classification in Machine Learning

Yow, Li Wen (Unknown)
Ong, Lee Yeng (Unknown)
Tan, Joon Liang (Unknown)



Article Info

Publish Date
01 Dec 2025

Abstract

Classification of bacterial species, particularly for closely related taxa, remains a major challenge in many areas, e.g., public health, food industries, and many others. The issues are mainly caused by overlapping genetic features of organisms and data complexities. In this study, a bacterial taxonomic identification framework that integrates genome-derived motif sequences with machine learning was introduced. Two hundred and forty genome sequences from Salmonella enterica, representing six subspecies and ten serovars, were used for modelling. Sequence motifs were predicted from single-copy orthologous core genes of the downloaded genomes. Single nucleotide polymorphisms (SNPs) within these motifs were extracted and numerically encoded as machine learning features. The 20 top-most informative predictors from feature selections were used for model training in Random Forest and Support Vector Machine. Comparing the output from multiple analyses, the Random Forest model achieved the highest accuracy of 97.92%, demonstrating reliable differentiation of Salmonella at both subspecies and serovar levels. This research presents two key innovations: i) the use of sequence motifs as molecular signatures for bacterial classification; ii) a novel feature engineering method that transforms genome-derived data into machine learning-readable features. The proposed framework offers a practical and scalable solution for fine-level bacterial classification and has high potential to be applied for other microbial taxa.

Copyrights © 2025






Journal Info

Abbrev

ESJ

Publisher

Subject

Environmental Science

Description

Emerging Science Journal is not limited to a specific aspect of science and engineering but is instead devoted to a wide range of subfields in the engineering and sciences. While it encourages a broad spectrum of contribution in the engineering and sciences. Articles of interdisciplinary nature are ...