Lung diseases constitute a significant source of morbidity and therefore require diagnostic frameworks that provide both high accuracy and operational efficiency. This study proposes the development of a Vision Transformer (ViT)-based classification model for lung X-ray images, employing transfer learning and fine-tuning techniques to improve detection performance across five disease categories. Experimental results demonstrate stable and effective model convergence, as reflected by the consistent decrease in loss metrics throughout the learning process. Evaluation on an independent test dataset shows that the proposed approach achieves an accuracy of 0.958, indicating strong and balanced generalization performance. Further analysis using a confusion matrix reveals that the ViT model is capable of recognizing subtle and complex radiographic patterns with low misclassification rates, particularly achieving high recall for major pathological classes, which is critical for minimizing false negatives in clinical screening scenarios. Overall, this study demonstrates that the application of transfer learning with fine-tuning on a Vision Transformer architecture yields competitive performance for multi-class lung X-ray classification when trained on a balanced dataset. These findings are consistent with prior evidence highlighting the effectiveness of ViT in capturing global contextual information in medical imaging tasks.
Copyrights © 2026