Garuda - Garba Rujukan Digital

Jurnal Masyarakat Informatika

Vol 17, No 1 (2026): May 2026 (Ongoing)

Fadhilah, Husni (Unknown)
Utama, Nugraha Priya (Unknown)

Publish Date
08 Jan 2026

This research presents a Transformer-based encoder-decoder model for medical image captioning that incorporates semantic medical knowledge through Concept Unique Identifiers (CUIs) from the Unified Medical Language System (UMLS). The proposed architecture employs a Swin Transformer as the visual encoder and GPT-2 as the language decoder, with CUI integration applied during both caption preprocessing and decoding. Experiments were conducted on the ROCOv2 dataset under two scenarios: baseline (raw captions) and enhanced (CUI-enriched captions). Quantitative evaluation using BLEU, ROUGE, CIDEr, and BERT-based metrics demonstrates that the CUI-integrated model outperforms several baselines, including CNN-LSTM, ViT-BioMedLM, and DeepSeek-VL, achieving a BLEU-1 score of 0.371, ROUGE-L of 0.305, CIDEr of 0.275, and PubMedBERTScore-F1 of 0.893. These results represent a 20.1% improvement in BLEU-1 and a 39.9% increase in ROUGE-L compared to the best-performing model before caption preprocessing (ViT-GPT2 with BLEU-1 = 0.309, ROUGE-L = 0.218). Qualitative assessment by expert radiologists further confirms enhanced diagnostic accuracy, descriptive completeness, and clinical relevance. This study introduces a novel integration of medical semantic knowledge into captioning models, offering a scalable solution for clinical decision support in resource-limited settings such as Indonesia.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

Jurnal Masyarakat Informatika

Website

Abbrev

jmasif

Publisher

Universitas Diponegoro

Subject

Computer Science & IT

Description

JURNAL MASYARAKAT INFORMATIKA - JMASIF is a Journal published by the Department of Informatics, Universitas Diponegoro invites lecturers, researchers, students (Bachelor, Master, and Doctoral) as well as practitioners in the field of computer science and informatics to contribute to JMASIF in the ...

Article Info

Abstract

Transformer-Based Encoder-Decoder Model for Medical Image Captioning with Concept Embedding

Article Info

Abstract