ComEngApp : Computer Engineering and Applications Journal
Vol. 15 No. 2 (2026)

A Comprehensive Survey of Audio-Visual Fusion with Attention Mechanisms: Trends, Challenges, and Future Directions

Rexcharles Enyinna Donatus (Air Force Instrtute of Technology, Kaduna)



Article Info

Publish Date
01 Jun 2026

Abstract

Advances in multimodal deep learning have driven growing interest in attention mechanisms that enhance audio and visual integration for tasks such as emotion recognition, event localization, and human computer interaction. This comprehensive survey synthesizes recent progress in attention based fusion methods and highlights the evolution from early fusion strategies to more advanced architectures, including self-attention, cross modal attention, co attention, and hierarchical attention. Transformer based models, in particular, now play a central role in state of the art audio visual systems because they capture long range temporal and semantic relationships across modalities. This survey examines how these mechanisms improve contextual understanding and task performance, while also identifying persistent challenges related to interpretability, robustness to noisy or missing modalities, modality imbalance, and computational efficiency. Limitations associated with dataset bias and the lack of standardized evaluation metrics are also discussed. Finally, the survey presents future research directions, including the development of cross modal transformer architectures, hierarchical attention models, and comprehensive attention diagnostics frameworks to support trustworthy and effective multimodal artificial intelligence systems

Copyrights © 2026






Journal Info

Abbrev

comengapp

Publisher

Subject

Computer Science & IT Engineering

Description

ComEngApp-Journal (Collaboration between University of Sriwijaya, Kirklareli University and IAES) is an international forum for scientists and engineers involved in all aspects of computer engineering and technology to publish high quality and refereed papers. This Journal is an open access journal ...