Garuda - Garba Rujukan Digital

Article Per Year (5 Year)

p-Index From 2021 - 2026

0.23

P-Index

This Author published in this journals

All Journal IAES International Journal of Artificial Intelligence (IJ-AI)

Aggrwal, Mayank

Unknown Affiliation

Author-ID : 10004706

Computer Science & IT Engineering

Published : 1 Documents Claim Missing Document

Claim Missing Document

Articles

Transformer-based Hindi image description and storytelling using enhanced attention and FastText embeddings Sharma, Anjali; Aggrwal, Mayank; Khanna, Jitin
IAES International Journal of Artificial Intelligence (IJ-AI) Vol 15, No 2: April 2026
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijai.v15.i2.pp1771-1782

This work presents a novel image description generation framework that combines a Transformer-based encoder-decoder architecture with a custom squeeze-and-excitation (SE) attention block integrated into an EfficientNet feature extractor. The decoder uses FastText embeddings specifically trained for Hindi and is evaluated on the Microsoft common objects in context (MS-COCO) dataset. To improve the captioning process, the model incorporates a generative pre-trained transformer (GPT) module to generate narrative descriptions based on the initial captions and applies multiple similarity metrics to assess output quality. The proposed system significantly outperforms existing methods, achieving high bilingual evaluation understudy (BLEU) scores (BLEU-1 to BLEU-4: 83.24, 73.17, 64.56, and 58.22), a consensus-based image description evaluation (CIDEr) score of 81.41, an F1 score of 90.29, and a metric for evaluation of translation with explicit ordering (METEOR) score of 81.18, indicating strong caption accuracy. Furthermore, the model achieves low error rates, with a word error rate (WER) of 15% and a character error rate (CER) of 11%. This work highlights the challenges of applying large-scale datasets like MS-COCO to resource-limited languages and demonstrates the effectiveness of integrating FastText embeddings with transformer-based models for Hindi image captioning.

Co-Authors Anjali Sharma Khanna, Jitin

Title

Found 1 Documents
Search

Abstract

Title Search

Found 1 Documents Search

Abstract

Title

Found 1 Documents
Search