Garuda - Garba Rujukan Digital

Article Per Year (5 Year)

p-Index From 2020 - 2025

0.23

P-Index

This Author published in this journals

All Journal Building of Informatics, Technology and Science

Hermanto, Aldy Agil

Unknown Affiliation

Author-ID : 8385155

Computer Science & IT

Published : 1 Documents Claim Missing Document

Claim Missing Document

Articles

Penerapan CNN dan RNN untuk Pembuatan Deskripsi Konten Visual Menggunakan Deep Learning Hermanto, Aldy Agil; Karyono, Giat; Tahyudin, Imam
Building of Informatics, Technology and Science (BITS) Vol 6 No 4 (2025): March 2025
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v6i4.6958

The development of technology in the field of image and sound processing has had a significant impact on increasing the accessibility of information for various groups, especially for individuals with visual impairments. One of the innovations that emerged was the image to speech system, which allows the conversion of images into sounds that can be understood by its users. The main problem lies in the low accuracy of object recognition in images with high variability, such as poor lighting or complex backgrounds, as well as the challenge of producing suitable text descriptions to be converted into audio. The method used involves extracting image features using InceptionV3-based CNN and forming a sequence of descriptive texts through RNN with an attention mechanism. The dataset consists of 40,455 captions and 8,091 images, processed using text and image pre-processing techniques before being trained using the teacher forcing technique. The evaluation results show a very low BLEU score (5.154827976372712e-153), indicating the model's inability to replicate the original caption well. However, the audio from the text-to-speech conversion using Google Text-to-Speech is quite clear. Future solutions include increasing the dataset, applying regularization, and adjusting the model architecture to improve the accuracy of caption prediction and audio relevance to the image. With these improvements, it is hoped that the system can provide more inclusive visual information accessibility for individuals with visual impairments.

Co-Authors Giat Karyono Imam Tahyudin

Title

Found 1 Documents
Search

Abstract

Title Search

Found 1 Documents Search

Abstract

Title

Found 1 Documents
Search