COVID is a contagious lung ailment that continues to be a world curse, and it remains a highly infectious respiratory disease with global health implications. Traditional diagnostic methods, such as RT-PCR, though widely used, are often constrained by high costs, limited accessibility, and delayed results. In contrast, radiology for lung disease detection has been proven advantageous for identifying deformities, and chest X-rays are the most preferred radiological method due to their non-invasive nature. To address these limitations, this study aims to develop an efficient, automated diagnostic system leveraging radiological imaging, specifically X-rays, which are cost-effective and widely available. The primary contribution of this research is the introduction of COV-TViT, a novel deep learning framework that integrates transfer learning with Vision Transformer (ViT) architecture for the accurate detection of COVID pneumonitis. The proposed method is evaluated using the COVID-QU-Ex dataset, which comprises a balanced set of X-ray images from COVID positive and healthy individuals. Methodologically, the system employs pre-trained convolutional neural networks (CNNs), specifically VGG16 and VGG19 (Visual Geometry Group), for transfer learning, followed by fine tuning to enhance feature extraction. The ViT model, known for its self-attention mechanism, is then applied to capture complex spatial dependencies in the X-ray images, enabling robust classification. Experimental results demonstrate that COV-TViT achieves a classification accuracy of 98.96% and an F1 score of 96.21%, outperforming traditional CNN based transfer learning models in several scenarios. These findings underscore the model’s potential for high-precision COVID pneumonitis detection. The proposed approach significantly transforms classification tasks using self-attention mechanisms to extract features and learn representations. Overall, the proposed diagnostic system COV-TViT can be advantageous in the fundamental identification of COVID pneumonitis.