Garuda - Garba Rujukan Digital

JOIV : International Journal on Informatics Visualization

Vol 9, No 1 (2025)

Indah, Komang Ayu Triana (Unknown)
Darma Putra, I Ketut Gede (Unknown)
Sudarma, Made (Unknown)
Hartati, Rukmi Sari (Unknown)

Publish Date
30 Jan 2025

Visual understanding is one of the core elements of computer vision consisting of image classification, object detection, and segmentation. The system applies a multilayer process to obtain complex image and video understanding using deep learning methods to convert the images to text. Therefore, this study aimed to extract video in the form of frames followed by the application of Transformer and Inception V3 architectures to the image captioning process. The synchronization was based on Multi-task Deep Learning method developed by combining Convolutional Neural Network (CNN) system in the image area, Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM) in the sentence area, Caption Content Network (CCN), and Relational Network Context (RCN). Moreover, Transformer Encoder-Decoder architecture was used in the process of labeling and determining the relationships between objects. The results of the image-to-text conversion process were determined by comparing prospective translated text with one or more references. This was achieved using accuracy and loss validation tables to provide graphical comparisons between the number of epochs and losses. The test results showed that the validation data accuracy was 70.166% while the loss was 22,648% and this showed more epoch iterations led to greater validation accuracy.Keywordsâ€” Visual Understanding, Transformer, Encoder, Decoder

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

JOIV : International Journal on Informatics Visualization

Website

Abbrev

joiv

Publisher

Politeknik Negeri Padang

Subject

Computer Science & IT

Description

JOIV : International Journal on Informatics Visualization is an international peer-reviewed journal dedicated to interchange for the results of high quality research in all aspect of Computer Science, Computer Engineering, Information Technology and Visualization. The journal publishes state-of-art ...

Article Info

Abstract

Multi Task Deep Learning with Transformer Encoder Decoder for Semantic Segmentation

Article Info

Abstract