Lung cancer segmentation and classification from computed tomography (CT) images play a vital role in early diagnosis, prognosis assessment, and effective treatment planning. Despite significant progress in medical image analysis, accurate lung lesion analysis remains highly challenging due to overlapping anatomical structures, heterogeneous tissue intensity distributions, irregular and complex tumor shapes, and poorly defined lesion boundaries. These factors often limit the reliability and generalization capability of conventional deep learning models when applied to real-world clinical data. To address these challenges, this paper proposes a Hybrid Swarm-Driven Vision Transformer (HSViT) framework that synergistically combines swarm intelligence with transformer-based deep learning. The processing pipeline begins with Contrast Limited Adaptive Histogram Equalization (CLAHE), which enhances local contrast while suppressing noise amplification, thereby improving the visibility of subtle pulmonary nodules and lesion regions. Subsequently, a U-Net segmentation model optimized using the Coyote Optimization Algorithm (COA) is employed to accurately delineate lung lesions. COA, a swarm-based metaheuristic, adaptively fine-tunes U-Net parameters, enabling improved convergence and more precise boundary detection compared to gradient-based optimization alone. Following segmentation, discriminative lesion features are extracted and passed to the HSViT classifier. The proposed classifier integrates a Dual-Stage Attention Fusion (DSAF) mechanism, which effectively captures both fine-grained local spatial features and long-range global contextual dependencies. The framework achieves a Dice Coefficient of 0.95, an overall classification accuracy of 98.7%, and a minimized training loss of 0.04. These results highlight the strong potential of HSViT for reliable automated lung cancer diagnosis and for supporting clinical decision-making systems in real-world healthcare environments.
Copyrights © 2026