Chauhan, Harshil
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

A comprehensive survey on automatic image captioning-deep learning techniques, datasets and evaluation parameters Chauhan, Harshil; Thacker, Chintan
International Journal of Electrical and Computer Engineering (IJECE) Vol 15, No 3: June 2025
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijece.v15i3.pp3257-3266

Abstract

Automatic image captioning is a pivotal intersection of computer vision and natural language processing, aiming to generate descriptive textual content from visual inputs. This comprehensive survey explores the evolution and state-of-the-art advancements in image caption generation, focusing on deep learning techniques, benchmark datasets, and evaluation parameters. We begin by tracing the progression from early approaches to contemporary deep learning methodologies, emphasizing encoder-decoder based models and transformer-based models. We then systematically review the datasets that have been instrumental in training and benchmarking image captioning models, including MSCOCO, Flickr30k, Flickr8k, and PASCAL 1k, discussing image count, types of scenes, and sources. Furthermore, we delve into the evaluation metrics employed to assess model performance, such as bilingual evaluation understudy (BLEU), metric for evaluation of translation with explicit ordering (METEOR), recall-oriented understudy for gisting evaluation (ROUGE), and consensus-based image description evaluation (CIDEr), analyzing their domains, bases, and measurement criteria. Through this survey, we aim to provide a detailed understanding of the current landscape, identify challenges, and propose future research directions in automatic image captioning.