Lung cancer is the leading cause of cancer-related deaths worldwide, with early diagnosis often hindered by morphological variations in histopathological images. The main problem is the difficulty in accurately and rapidly distinguishing cancer types such as adenocarcinoma and squamous cell carcinoma from benign tissue. This research processes histopathological images as input to produce a three-class classification: adenocarcinoma, squamous cell carcinoma, and benign tissue. Early detection of lung cancer can improve survival rates by up to 50%, but manual diagnosis by pathologists depends on subjective experience, causing errors of up to 20% in ambiguous cases. For example, in developing countries like Indonesia, the shortage of pathologists exacerbates treatment delays. This gap demands a reliable automated approach to support more timely clinical decisions. The developed solution involves implementing Vision Transformer (ViT) with two different architectures: ViT-B/16 (base model with 86 million parameters) and ViT-L/16 (large model with 304 million parameters). Histopathological images are processed through normalization and patch embedding of 16×16 pixels, then features are extracted using self-attention mechanism. Models are trained with transfer learning from ImageNet-21k, applying fine- tuning on lung cancer histopathological images dataset. The process includes data splitting into training (70%), validation (15%), and testing (15%), as well as data augmentation to improve robustness. The ViT-B/16 model achieved testing accuracy of 98.40% with F1-score of 0.984, while ViT-L/16 achieved accuracy of 98.18% with F1-score of 0.982. Both models demonstrated perfect capability in detecting benign tissues (precision 1.00). The average AUC-ROC value reached 0.999 for ViT-B/16 and 0.998 for ViT-L/16, indicating very high discriminative power. The main contribution of this research is a comprehensive comparison between two scales of Vision Transformer for automated lung cancer diagnosis, proving that the smaller model (ViT-B/16) can achieve equivalent or better performance with higher computational efficiency.
Copyrights © 2026