Garuda - Garba Rujukan Digital

Article Per Year (5 Year)

p-Index From 2021 - 2026

0.23

P-Index

This Author published in this journals

All Journal ComEngApp : Computer Engineering and Applications Journal

Rexcharles Enyinna Donatus

Air Force Instrtute of Technology, Kaduna

Author-ID : 10195175

Computer Science & IT Engineering

Published : 1 Documents Claim Missing Document

Claim Missing Document

Articles

A Comprehensive Survey of Audio-Visual Fusion with Attention Mechanisms: Trends, Challenges, and Future Directions Rexcharles Enyinna Donatus
Computer Engineering and Applications Journal Vol. 15 No. 2 (2026)
Publisher : Universitas Sriwijaya

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.18495/comengapp.v15i2.1332

Advances in multimodal deep learning have driven growing interest in attention mechanisms that enhance audio and visual integration for tasks such as emotion recognition, event localization, and human computer interaction. This comprehensive survey synthesizes recent progress in attention based fusion methods and highlights the evolution from early fusion strategies to more advanced architectures, including self-attention, cross modal attention, co attention, and hierarchical attention. Transformer based models, in particular, now play a central role in state of the art audio visual systems because they capture long range temporal and semantic relationships across modalities. This survey examines how these mechanisms improve contextual understanding and task performance, while also identifying persistent challenges related to interpretability, robustness to noisy or missing modalities, modality imbalance, and computational efficiency. Limitations associated with dataset bias and the lack of standardized evaluation metrics are also discussed. Finally, the survey presents future research directions, including the development of cross modal transformer architectures, hierarchical attention models, and comprehensive attention diagnostics frameworks to support trustworthy and effective multimodal artificial intelligence systems

Co-Authors

Title

Found 1 Documents
Search

Abstract

Title Search

Found 1 Documents Search

Abstract

Title

Found 1 Documents
Search