Muhammad Rifki
Universitas Majalengka

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

DEVELOPMENT OF CNN-LSTM-BASED IMAGE CAPTIONING DATASET TO ENHANCE VISUAL ACCESSIBILITY FOR DISABILITIES Muhammad Rifki; Ade Bastian; Ardi Mardiana
JITK (Jurnal Ilmu Pengetahuan dan Teknologi Komputer) Vol. 10 No. 4 (2025): JITK Issue May 2025
Publisher : LPPM Nusa Mandiri

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33480/jitk.v10i4.6657

Abstract

Visual accessibility in public spaces remains limited for individuals with visual impairments in Indonesia, despite technological advancements such as image captioning. This study aims to develop a custom dataset and a baseline CNN-LSTM image captioning model capable of describing sidewalk accessibility conditions in Indonesian language. The methodology includes collecting 748 annotated images from various Indonesian cities, with captions manually crafted to reflect accessibility features. The model employs DenseNet201 as the CNN encoder and LSTM as the decoder, with 70% of the data used for training and 30% for validation. Evaluation was conducted using BLEU and CIDEr metrics. Results show a BLEU-4 score of 0.27 and a CIDEr score of 0.56, indicating moderate alignment between model-generated and reference captions. While the absence of an attention mechanism and the limited dataset size constrain overall performance, the model demonstrates the ability to identify key elements such as tactile paving, signage, and pedestrian barriers. This study contributes to assistive technology development in a low-resource language context, providing foundational work for future research. Enhancements through data expansion, incorporation of attention mechanisms, and transformer-based models are recommended to improve descriptive richness and accuracy