This study presents a comprehensive evaluation of seven machine learning models applied to the classification of human DNA sequences, highlighting their performance and potential applications in genomics. We explored Logistic Regression, Support Vector Machines (SVM), Random Forest, Decision Trees, Gradient Boosting, Naive Bayes, and XGBoost, using a 5-fold StratifiedKFold cross-validation method to ensure robustness and reliability in our findings. Naive Bayes demonstrated exceptional performance with near-perfect accuracy, precision, recall, and F1 scores, suggesting its suitability for rapid and efficient genomic classification. Logistic Regression also showed high efficacy, proving effective even in multi-class classifications of complex genetic data. Conversely, Decision Trees and SVM struggled with overfitting and computational efficiency, respectively, indicating the need for careful parameter tuning and optimization in practical applications. The study addresses these challenges and proposes strategies for enhancing model robustness and computational efficiency, such as advanced regularization techniques and hybrid modeling approaches. These insights not only aid in selecting appropriate models for specific genomic tasks but also pave the way for future research into integrating machine learning with genomic science to advance personalized medicine and genetic research. The findings encourage ongoing refinement of these models to unlock further potential in genomic applications.
Copyrights © 2024