JOIV : International Journal on Informatics Visualization
Vol 9, No 1 (2025)

Multi Task Deep Learning with Transformer Encoder Decoder for Semantic Segmentation

Indah, Komang Ayu Triana (Unknown)
Darma Putra, I Ketut Gede (Unknown)
Sudarma, Made (Unknown)
Hartati, Rukmi Sari (Unknown)



Article Info

Publish Date
30 Jan 2025

Abstract

Visual understanding is one of the core elements of computer vision consisting of image classification, object detection, and segmentation. The system applies a multilayer process to obtain complex image and video understanding using deep learning methods to convert the images to text. Therefore, this study aimed to extract video in the form of frames followed by the application of Transformer and Inception V3 architectures to the image captioning process. The synchronization was based on Multi-task Deep Learning method developed by combining Convolutional Neural Network (CNN) system in the image area, Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM) in the sentence area, Caption Content Network (CCN), and Relational Network Context (RCN). Moreover, Transformer Encoder-Decoder architecture was used in the process of labeling and determining the relationships between objects. The results of the image-to-text conversion process were determined by comparing prospective translated text with one or more references. This was achieved using accuracy and loss validation tables to provide graphical comparisons between the number of epochs and losses. The test results showed that the validation data accuracy was 70.166% while the loss was 22,648% and this showed more epoch iterations led to greater validation accuracy.Keywords— Visual Understanding, Transformer, Encoder, Decoder

Copyrights © 2025






Journal Info

Abbrev

joiv

Publisher

Subject

Computer Science & IT

Description

JOIV : International Journal on Informatics Visualization is an international peer-reviewed journal dedicated to interchange for the results of high quality research in all aspect of Computer Science, Computer Engineering, Information Technology and Visualization. The journal publishes state-of-art ...