Three-dimensional reconstruction of CT and MRI images remains a persistent challenge in medical imaging, where clinicians require high‐fidelity volumes that preserve subtle anatomical details while remaining computationally efficient. This study evaluates a transformer-based neural network against a conventional convolutional neural network (CNN) baseline to determine which architecture delivers superior reconstruction accuracy for clinical use. A standard deep learning pipeline was constructed, which included data curation, intensity normalization, and augmentation, prior to training the models. The experimental comparison studied two representative architectures, a 3D U-Net that served as the CNN benchmark, and a 3D Swin Transform, that served as the attention approach. The quantitative analysis showed that the transformer produced a higher Peak Signal-to-Noise-Ratio (35.8 dB vs 33.1 dB), better Structural Similarity Index Measure (0.942 vs 0.911), and better Dice coefficient (0.91 vs 0.87) with little differences with respect to inference time per volume. The visual analysis showed sharper cortical folds and clearer lesion edges, which radiologists linked with higher diagnostic confidence. The transformer’s ability to model global spatial dependencies and reduce noise artifacts facilitates accurate and clinically pertinent reconstructions. This study shows that transformer models can be computationally efficient but more precise than CNN alternatives, which support their implementation in hospital Picture Archiving and Communication Systems (PACS) and within future real time patient diagnostics workflows. Taken together, these findings support the collective efforts of engineers and healthcare providers to leverage future algorithmic improvements that can enhance patient care and the safety of imaging.
Copyrights © 2025