The classification of DNA sequences using deep learning models offers promising avenues for advancements in genomics and personalized medicine. This study provides a comprehensive evaluation of several deep learning architectures, including Convolutional Neural Networks (CNNs), Long Short-Term Memory networks (LSTMs), Gated Recurrent Units (GRUs), Bidirectional LSTMs (BiLSTMs), and hybrid models combining CNNs with various recurrent networks, to classify human DNA sequences into functional categories. We employed a dataset of approximately 100,000 labeled sequences, ensuring a balanced representation across seven distinct classes to facilitate a fair comparison of model performance. Each model was assessed based on accuracy, precision, recall, F1 score, and Area Under the Receiver Operating Characteristic Curve (AUC-ROC). The CNN model demonstrated superior accuracy (74.86%) and the highest AUC (94.64%), indicating its effectiveness in capturing spatial patterns within sequences. LSTM and GRU models showed commendable performance, particularly in balancing precision and recall, suggesting their capability in managing sequential dependencies. However, hybrid models did not perform as expected, showing lower overall metrics, which highlighted challenges in model integration and complexity management. The findings suggest that while CNNs excel in feature extraction, sequence-based models like LSTMs and GRUs provide valuable capabilities in capturing long-range dependencies, essential for comprehensive genomic analysis. The study underscores the need for optimized hybrid models and further research into model robustness and generalizability.
Copyrights © 2024