Hermanto, Aldy Agil
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Penerapan CNN dan RNN untuk Pembuatan Deskripsi Konten Visual Menggunakan Deep Learning Hermanto, Aldy Agil; Karyono, Giat; Tahyudin, Imam
Building of Informatics, Technology and Science (BITS) Vol 6 No 4 (2025): March 2025
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v6i4.6958

Abstract

The development of technology in the field of image and sound processing has had a significant impact on increasing the accessibility of information for various groups, especially for individuals with visual impairments. One of the innovations that emerged was the image to speech system, which allows the conversion of images into sounds that can be understood by its users. The main problem lies in the low accuracy of object recognition in images with high variability, such as poor lighting or complex backgrounds, as well as the challenge of producing suitable text descriptions to be converted into audio. The method used involves extracting image features using InceptionV3-based CNN and forming a sequence of descriptive texts through RNN with an attention mechanism. The dataset consists of 40,455 captions and 8,091 images, processed using text and image pre-processing techniques before being trained using the teacher forcing technique. The evaluation results show a very low BLEU score (5.154827976372712e-153), indicating the model's inability to replicate the original caption well. However, the audio from the text-to-speech conversion using Google Text-to-Speech is quite clear. Future solutions include increasing the dataset, applying regularization, and adjusting the model architecture to improve the accuracy of caption prediction and audio relevance to the image. With these improvements, it is hoped that the system can provide more inclusive visual information accessibility for individuals with visual impairments.