Claim Missing Document
Check
Articles

Found 1 Documents
Search

Vision Language Transformer Framework for Efficient Cancer Diagnosis through Multimodal Integration Gutam, Bala Gangadhara; Malchi, Sunil Kumar
Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol 7 No 4 (2025): October
Publisher : Department of Electromedical Engineering, POLTEKKES KEMENKES SURABAYA

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/jeeemi.v7i4.1075

Abstract

Finding and treating cancer as soon as possible help patients get better outcomes. Patients requiring imaging or biopsy tests sometimes find it challenging to access them because these procedures are often limited by their high cost and availability in clinical settings. Recent AI methods, particularly those involving deep learning, can address these problems and significantly enhance the process for detecting cancer, offering greater efficiency and scalability. In this context, LLMs and VLMs are considered leading solutions for trying to make sense of multimodal variables within AI-driven healthcare systems. Although LLMs are strong at working with unstructured, clinically related text data, they have not often been used for patient assessment beyond descriptive or summarization tasks, by combining images and descriptions, along with both structured and unstructured data. The VLMs allow doctors and medical researchers to catch cancer symptoms from multiple angles. In this work, we study both LLMs and VLMs in cancer detection, analyzing their architectures, learning mechanisms, and performance on various datasets, and identifying directions for expanding multimodal AI in healthcare. Our results indicate that combining these two data types enhances how accurately we are able to diagnose patients across different types of cancer. Our studies in MIMIC-III, MIMIC-IV, TCGA, and CAMELYON 16/17 datasets revealed that multimodal transformer models significantly improve the accuracy of diagnosing biopsy results. In particular, BioViL achieves an AUC-ROC of 0.92 for detecting lung cancer, whereas CLIP Fine-tuned achieves a similar result of 0.91 for colon cancer detection.