Pneumonia detection through medical imaging presents a significant challenge, particularly in regions with limited access to healthcare professionals. This study presents an explainable artificial intelligence (XAI) model that integrates convolutional neural network (CNN) and vision transformer (ViT) to enhance the accuracy of pneumonia diagnosis using chest X-ray images. The proposed research aims to enhance diagnostic accuracy by providing explanations through gradient-weighted class activation mapping (Grad-CAM) visualization. The methodology includes image preprocessing, local feature extraction via CNN, and global spatial relationship modelling using ViT. The model was trained on a preprocessed chest X-ray dataset and evaluated using standard performance metrics such as accuracy, precision, recall, and F1 score. The proposed CNN-ViT model was assessed using chest X-ray datasets for pneumonia detection. The experimental results demonstrated that the model achieved an accuracy of 96.5%, precision of 96%, recall of 96%, and F1 score of 94%, These results indicate that the integration of CNN and ViT effectively enhances classification performance and provides a reliable tool for medical image analysis. Furthermore, Grad-CAM visualizations highlight the critical regions in the images that influence the model’s predictions, thereby enhancing interpretability. Compared to conventional models, this approach offers improved transparency in AI-driven diagnostics. Consequently, the proposed model represents a promising and reliable diagnostic tool, particularly beneficial in underserved or remote areas with limited medical infrastructure. Additionally, this research opens opportunities for the development of transparent and XAI-based diagnostic systems.
Copyrights © 2025