Nayoan, Royan Abida N.
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

A study on attention-based deep learning architecture model for image captioning Fudholi, Dhomas Hatta; Al-Faruq, Umar Abdul Aziz; Nayoan, Royan Abida N.; Zahra, Annisa
IAES International Journal of Artificial Intelligence (IJ-AI) Vol 13, No 1: March 2024
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijai.v13.i1.pp23-34

Abstract

Image captioning has been widely studied due to its ability in a visual scene understanding. Automatic visual scene understanding is useful for remote monitoring system and visually impaired people. Attention-based models, including transformer, are the current state-of-the-art architectures used in developing image captioning model. This study examines the works in the development of image captioning model, especially models that are developed based on attention mechanism. The architecture, the dataset, and the evaluation metrics analysis are done to the collected works. A general flow of image captioning model development is also presented. The literature search process carried out on Google Scholar. There are 36 literatures used in this study, including a specific image captioning development in Indonesian. It is done to take one point of view of image captioning development in a low resource language. Studies using transformer model generally achieves higher evaluation metric scores. In our finding, the highest evaluation scores on the consensus-based image description evaluation (CIDEr) c5 and c40 metrics are 138.5 and 140.5 respectively. This study gives a baseline on future development of image captioning model and brings the general concept of the image captioning development process including a picture of the development in low resource language.