Claim Missing Document
Check
Articles

Found 2 Documents
Search
Journal : Jurnal Informatika Universitas Pamulang

Application of Traditional Machine Learning Techniques for the Classification of Human DNA Sequences: A Comparative Study of Random Forest and XGBoost Airlangga, Gregorius
Jurnal Informatika Universitas Pamulang Vol 9 No 1 (2024): JURNAL INFORMATIKA UNIVERSITAS PAMULANG
Publisher : Teknik Informatika Universitas Pamulang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.32493/informatika.v9i1.39353

Abstract

This study evaluates the performance of hybrid machine learning models, specifically Random Forest and XGBoost, in classifying human DNA sequences into seven functional classes. Utilizing advanced feature vectorization techniques, this research addresses the challenges of analyzing high-dimensional genomic data. Both models were trained and tested on a dataset of annotated human DNA sequences, with an emphasis on generalizability to new, unseen data. Our results indicate that the Random Forest model achieved an accuracy of 87.98%, slightly outperforming the XGBoost model, which recorded an accuracy of 87.06%. These findings underscore the effectiveness of employing traditional machine learning techniques coupled with innovative data preprocessing for predictive modeling in genomics. The study not only enhances our understanding of genomic functionalities but also suggests robust methodologies for future genetic research and potential applications in personalized medicine. The implications of these results for improving classification accuracy and the recommendations for integrating more complex algorithms are also discussed
A Hybrid Model for Human DNA Sequence Classification Using Convolutional Neural Networks and Random Forests Airlangga, Gregorius
Jurnal Informatika Universitas Pamulang Vol 9 No 2 (2024): JURNAL INFORMATIKA UNIVERSITAS PAMULANG
Publisher : Teknik Informatika Universitas Pamulang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.32493/informatika.v9i2.39355

Abstract

Human DNA sequence classification is a fundamental task in genomics, essential for understanding genetic variations and its implications in disease susceptibility, personalized medicine, and evolutionary biology. This study proposes a novel hybrid model combining Convolutional Neural Networks (CNN) for feature extraction and Random Forest classifiers for final classification. The model was evaluated on a dataset of human DNA sequences, with achieving an accuracy of 75.34%. The results showed that performance metrics, including precision, recall, and F1-scores across multiple classes, showed significant improvements over traditional models. The CNN component effectively captures local dependencies and patterns within the sequences, while the Random Forest classifier handles complex decision boundaries, resulting in enhanced classification accuracy. Comparative analysis demonstrated the superiority of our hybrid approach, with the CNN-LSTM model achieving only 59.47% accuracy, and other RNN-based models like CNN-GRU and CNN-BiLSTM performing similarly lower. These results suggest that hybrid models can leverage the strengths of both deep learning and traditional machine learning techniques an offering a more effective tool for DNA sequence classification. The future work will optimize model architecture and explore larger, thus more diverse datasets to validate our approach's generalizability and robustness.