Claim Missing Document
Check
Articles

Found 1 Documents
Search

Conformer-Performer: An Efficient Architecture for Voice Activity Detection Apriliyanto, Echa; Waluyo, Anita Fira
Journal of Scientific Research, Education, and Technology (JSRET) Vol. 4 No. 4 (2025): Vol. 4 No. 4 2025
Publisher : Kirana Publisher (KNPub)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.58526/jsret.v4i4.979

Abstract

Voice Activity Detection (VAD) is a crucial pre-processing step for speech technologies, yet standard Conformer architectures suffer from quadratic computational complexity. This study introduces the Conformer-Performer, a novel architecture that replaces standard multi-head self-attention with the Fast Attention Via positive Orthogonal Random features (FAVOR+) mechanism to achieve linear complexity. The objective was to develop an efficient VAD model that maintains high accuracy suitable for resource-constrained applications. The model was trained on the multilingual FLEURS dataset using a teacher-student approach and extensive data augmentation. Experimental results demonstrate that the Conformer-Performer achieves an F1-score of 98.29%, which is highly competitive with the standard Conformer's 98.41%, while achieving a 7.8-fold reduction in peak GPU memory usage and a 3.46-fold speedup in CPU inference time. Furthermore, the proposed model significantly outperforms the SileroVAD baseline. These findings confirm that the Conformer-Performer offers a compelling balance of accuracy and efficiency, making it highly suitable for real-time, on-device speech processing.