Journal of Scientific Research, Education, and Technology
Vol. 4 No. 4 (2025): Vol. 4 No. 4 2025

Conformer-Performer: An Efficient Architecture for Voice Activity Detection

Apriliyanto, Echa (Unknown)
Waluyo, Anita Fira (Unknown)



Article Info

Publish Date
17 Dec 2025

Abstract

Voice Activity Detection (VAD) is a crucial pre-processing step for speech technologies, yet standard Conformer architectures suffer from quadratic computational complexity. This study introduces the Conformer-Performer, a novel architecture that replaces standard multi-head self-attention with the Fast Attention Via positive Orthogonal Random features (FAVOR+) mechanism to achieve linear complexity. The objective was to develop an efficient VAD model that maintains high accuracy suitable for resource-constrained applications. The model was trained on the multilingual FLEURS dataset using a teacher-student approach and extensive data augmentation. Experimental results demonstrate that the Conformer-Performer achieves an F1-score of 98.29%, which is highly competitive with the standard Conformer's 98.41%, while achieving a 7.8-fold reduction in peak GPU memory usage and a 3.46-fold speedup in CPU inference time. Furthermore, the proposed model significantly outperforms the SileroVAD baseline. These findings confirm that the Conformer-Performer offers a compelling balance of accuracy and efficiency, making it highly suitable for real-time, on-device speech processing.

Copyrights © 2025






Journal Info

Abbrev

jrest

Publisher

Subject

Computer Science & IT Economics, Econometrics & Finance Education Engineering Social Sciences

Description

FOCUS AND SCOPE JSRET (Journal of Scientific Research, Education, and Technology) encourages scientific and technological research, particularly with regard to Indonesia, but not just in terms of authorship or regional coverage of current issues. Scientists, instructors, senior researchers, project ...