Jurnal Informatika Universitas Pamulang
Vol 9 No 1 (2024): JURNAL INFORMATIKA UNIVERSITAS PAMULANG

Application of Traditional Machine Learning Techniques for the Classification of Human DNA Sequences: A Comparative Study of Random Forest and XGBoost

Gregorius Airlangga (Atma Jaya Catholic University of Indonesia)



Article Info

Publish Date
30 Mar 2024

Abstract

This study evaluates the performance of hybrid machine learning models, specifically Random Forest and XGBoost, in classifying human DNA sequences into seven functional classes. Utilizing advanced feature vectorization techniques, this research addresses the challenges of analyzing high-dimensional genomic data. Both models were trained and tested on a dataset of annotated human DNA sequences, with an emphasis on generalizability to new, unseen data. Our results indicate that the Random Forest model achieved an accuracy of 87.98%, slightly outperforming the XGBoost model, which recorded an accuracy of 87.06%. These findings underscore the effectiveness of employing traditional machine learning techniques coupled with innovative data preprocessing for predictive modeling in genomics. The study not only enhances our understanding of genomic functionalities but also suggests robust methodologies for future genetic research and potential applications in personalized medicine. The implications of these results for improving classification accuracy and the recommendations for integrating more complex algorithms are also discussed

Copyrights © 2024






Journal Info

Abbrev

informatika

Publisher

Subject

Computer Science & IT

Description

Jurnal Informatika Universitas Pamulang is a periodical scientific journal that contains research results in the field of computer science from all aspects of theory, practice and application. Papers can be in the form of technical papers or surveys of recent developments research ...