Claim Missing Document
Check
Articles

Found 1 Documents
Search

A Multi-Branch BiLSTM with Multi-Head Self-Attention for Suspicious Sound Recognition Shehu Mohammed Yusuf; Hamza Saidu; Sani Saleh Saminu
Journal of Computing Theories and Applications Vol. 3 No. 4 (2026): JCTA 3(4) 2026
Publisher : Universitas Dian Nuswantoro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.62411/jcta.15777

Abstract

Suspicious urban sound recognition is a critical component of intelligent public safety and urban monitoring systems, enabling the automated identification of anomalous acoustic events such as gunshots, sirens, and other security-sensitive sounds. However, existing deep learning approaches often struggle to simultaneously capture long-range temporal dependencies and global contextual relationships, particularly under noisy and acoustically complex urban conditions. This limitation can reduce reliability in safety-critical scenarios where missed detections carry significant risk. To address these challenges, this study proposes a Multi-Branch Bidirectional Long Short-Term Memory (BiLSTM) framework with Multi-Head Self-Attention (MHSA) for enhanced sequential and contextual feature modeling. Mel-frequency cepstral coefficients (MFCCs) are extracted from a curated subset of the UrbanSound8K dataset, comprising five suspicious sound classes, and used as input to the proposed architecture. The multi-branch design enables complementary temporal representations, while the self-attention mechanism provides lightweight contextual weighting of BiLSTM outputs. Experimental results demonstrate that the proposed model achieves a test accuracy of 95.59%, outperforming conventional Dense and LSTM-based baseline models under identical experimental settings. An ablation study further confirms the contribution of multi-branch integration and attention-based enhancement to overall performance. Class-wise evaluation reveals consistently high recall across all sound categories, particularly for safety-critical classes such as gunshots and sirens. These findings indicate that the proposed framework provides robust and reliable performance, making it suitable for real-time smart city surveillance and public safety applications.