Aulia Akbar, Rafy
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Automated Chest X-Ray Captioning Using Pretrained Vision Transformer with LSTM and Multi-Head Attention Aulia Akbar, Rafy; Putra, Ricky Eka; Yustanti, Wiyli
JIEET (Journal of Information Engineering and Educational Technology) Vol. 9 No. 1 (2025)
Publisher : Universitas Negeri Surabaya

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.26740/jieet.v9n1.p1-10

Abstract

Radiology report generation is a complex and error-prone task, especially for radiologists with limited experience. To overcome this, this study aims to develop an automated system for generating text-based radiology reports using chest X-ray images. The proposed approach combines computer vision and natural language processing through an encoder-decoder architecture. As an encoder, a Vision Transformer (ViT) model trained on the CheXpert dataset is used to extract visual features from X-ray images after Gamma Correction is performed to improve image quality. In the decoder section, word embeddings from the report text are processed using Long Short-Term Memory (LSTM) to capture word order relationships, and enriched with Multi-Head Attention (MHA) to pay attention to important parts of the text. Visual and text features are then combined and passed to a dense layer to generate text-based radiology reports. The evaluation results show that the proposed model achieves a ROUGE-L score of 0.385, outperforming previous models. The BLEU-1 score also shows competitive results with a value of 0.427. This study shows that the use of pre-trained ViT, combined with LSTM-MHA on the decoder, provides excellent performance in capturing visual and semantic context of text, as well as improving accuracy and efficiency in radiology report automation.