Claim Missing Document
Check
Articles

Found 1 Documents
Search

Visual Content Captioning and Audio Conversion using CNN-RNN with Attention Model Agil Hermanto, Aldy; Giat Karyono; Imam Tahyudin; Boby Sandityas Prahasto
Journal of Innovation Information Technology and Application (JINITA) Vol 7 No 1 (2025): JINITA, June 2025
Publisher : Politeknik Negeri Cilacap

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35970/jinita.v7i1.2788

Abstract

The primary objective of this research is to develop an image captioning and audio conversion system based on Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) with the integration of an Attention Mechanism, aimed at improving accessibility for visually impaired individuals. The research design follows a systematic approach involving data collection, preprocessing, model development, training, evaluation, and implementation. The methodology utilizes CNN for visual feature extraction, RNN for language modeling, and an Attention Mechanism to enhance contextual relevance in caption generation. Google Text-to-Speech (gTTS) is also integrated to convert generated captions into audio format. The main outcomes demonstrate that the model is capable of generating coherent and contextually relevant captions, as validated through qualitative assessment and quantitative measurement using the BLEU score. Experimental results show decreasing training and validation loss over 8 epochs without signs of overfitting, indicating stable model performance. The attention visualization confirms the model’s ability to focus on relevant image regions during caption generation. In conclusion, the proposed CNN-RNN architecture with Attention effectively generates descriptive captions and converts them into speech, showing strong potential for real-world accessibility applications.