Garuda - Garba Rujukan Digital

IAES International Journal of Artificial Intelligence (IJ-AI)

Vol 14, No 3: June 2025

Wijaya, Bryan Christofer (Unknown)
Sugiarto, Hendrik Santoso (Unknown)

Publish Date
01 Jun 2025

Image captioning in Indonesian language poses a significant challenge due to the complex interplay between visual and linguistic comprehension, as well as the scarcity of publicly available datasets. Despite considerable advancements in this field, research specifically targeting the Indonesian language remains scarce. In this paper, we propose a novel image captioning model employing a transformer-based architecture for both the encoder and decoder components. Our model is trained and evaluated on the pre-translated Flickr30k dataset in the Indonesian language. We conduct a comparative analysis of various transformertransformer configurations and convolutional neural network (CNN)-recurrent neural network (RNN) architectures. Our findings highlight the superior performance of a vision transformer (ViT) as the visual encoder, combined with IndoBERT as the textual decoder. This architecture achieved a BLEU-4 score of 0.223 and a ROUGE-L score of 0.472.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

IAES International Journal of Artificial Intelligence (IJ-AI)

Website

Abbrev

IJAI

Publisher

Institute of Advanced Engineering and Science

Subject

Computer Science & IT Engineering

Description

IAES International Journal of Artificial Intelligence (IJ-AI) publishes articles in the field of artificial intelligence (AI). The scope covers all artificial intelligence area and its application in the following topics: neural networks; fuzzy logic; simulated biological evolution algorithms (like ...

Article Info

Abstract

Transformer+transformer architecture for image captioning in Indonesian language

Article Info

Abstract