Garuda - Garba Rujukan Digital

Article Per Year (5 Year)

p-Index From 2021 - 2026

0.408

P-Index

This Author published in this journals

All Journal International Journal of Electrical and Computer Engineering Scientific Journal of Computer Science

Donatus, Rexcharles Enyinna

Unknown Affiliation

Author-ID : 10068621

Computer Science & IT Electrical & Electronics Engineering

Published : 2 Documents Claim Missing Document

Claim Missing Document

Articles

Title

A Structured Survey of Attention Mechanisms in Audio-Visual Fusion: Architectures, Challenges, and Evaluation Frameworks Donatus, Rexcharles Enyinna; Awodele, Oludele; Oguike, Osondu Everestus; Sambo-Magaji, Amina
Scientific Journal of Computer Science Vol. 2 No. 2 (2026): December Article in Process
Publisher : PT. Teknologi Futuristik Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.64539/sjcs.v2i2.2026.438

Audio-visual fusion plays an important role in multimodal artificial intelligence, particularly in applications such as speech processing, emotion recognition, and video understanding, where information from sound and vision improves performance and contextual understanding. Recent developments are driven by attention mechanisms and transformer-based models, which enable more flexible and context-aware interaction within and across modalities compared to conventional fusion approaches. Despite these advances, challenges remain, including sensitivity to noisy or missing modalities, modality imbalance, limited interpretability, and high computational cost. This paper presents a structured survey of attention mechanisms in audio-visual fusion, with emphasis on architectural design and evaluation practices across multiple application domains. A structured survey methodology inspired by PRISMA principles is used to identify and select relevant studies, followed by comparative analysis of model architectures, training strategies, and evaluation methods. The findings show that transformer-based and attention-centered architectures have become increasingly prominent and achieve strong performance across tasks. However, these approaches involve trade-offs between robustness, interpretability, and computational efficiency, and remain sensitive to noise and modality imbalance. Evaluation practices are also inconsistent, with limited use of standardized and robustness-focused metrics. The survey provides an attention-centered taxonomy of audio-visual fusion methods and synthesizes current approaches and evaluation strategies. It identifies key challenges and outlines directions for improving robustness, interpretability, and efficiency in practical deployment.

Co-Authors Adetifa, Abolanle Awodele, Oludele Oguike, Osondu Everestus Sambo-Magaji, Amina Udekwe, Daniel

Title Search

Found 1 Documents Search Journal : Scientific Journal of Computer Science

Abstract

Title

Found 1 Documents
Search
Journal : Scientific Journal of Computer Science