Splice junction classification in DNA sequences is critical for understanding genetic structures and processes, particularly the differentiation between exon-intron (EI), intron-exon (IE), and neither boundaries. Traditional neural network models achieve high accuracy but often lack the ability to quantify uncertainty, which is essential for reliability in sensitive applications such as bioinformatics. This study addresses this limitation by incorporating Bayesian confidence quantification into DNA sequence classification using the Monte Carlo Dropout (MCD) approach. A baseline neural network was first implemented as a reference, achieving a test accuracy of 95.61%. Subsequently, MCD was applied, which not only improved the test accuracy to 96.03% but also provided uncertainty estimation for each prediction by sampling multiple inferences. The uncertainty values enabled the identification of low-confidence predictions, enhancing the interpretability and reliability of the model. Experiments were conducted on a binary-encoded DNA sequence dataset, representing nucleotides (A, C, G, T) and their splice junctions. The results demonstrated that MCD is a robust approach for DNA sequence classification, offering both high predictive performance and actionable insights through uncertainty quantification. This research highlights the potential of Bayesian confidence quantification in genomic studies, particularly for tasks requiring high reliability and interpretability. The proposed approach bridges the gap between accurate predictions and the need for robust uncertainty estimation, contributing to advancements in bioinformatics and machine learning applications in genetic research.
Copyrights © 2025