Accurate brain tumor diagnosis via Magnetic Resonance Imaging (MRI) is vital for effective neuro-oncological treatment. Although CNNs are widely regarded as the benchmark for local texture extraction, they frequently exhibit limitations in modeling long-distance global dependencies effectively. In contrast, Vision Transformers (ViTs), particularly the Swin variant, demonstrate superior capability in capturing global semantic context yet often fail to preserve the fine local granularity needed to delineate tumor boundaries significantly. To bridge this gap, we propose Bi-CA-UAE, a hybrid framework integrating Swin Transformer and EfficientNet-V2 through a novel Bidirectional Cross-Attention mechanism. Unlike static ensembles, our method enables dynamic information exchange between global and local feature maps before classification. Furthermore, we introduce an Uncertainty-Aware Gating Network to adaptively weigh each branch based on prediction confidence, reducing false positives in ambiguous cases. Validated on a multi-class MRI dataset of 7,023 images, the model achieved 99.85% accuracy and an Expected Calibration Error (ECE) of 0.02, matching the strongest baseline (Swin Transformer) while demonstrating superior training stability and calibration. Unlike naive concatenation ensembles that suffered from overfitting and performance degradation in later training stages, Bi-CA-UAE maintained robust peak performance. Additionally, the model attained perfect recall (1.00) for Glioma and a micro-average AUC of 1.00. t-SNE visualizations and reliability diagrams confirm the model's ability to learn highly discriminative and well-calibrated features, positioning it as a trustworthy clinical decision support system.
Copyrights © 2026