Garuda - Garba Rujukan Digital

Journal of Scientific Research, Education, and Technology

Vol. 4 No. 4 (2025): Vol. 4 No. 4 2025

Apriliyanto, Echa (Unknown)
Waluyo, Anita Fira (Unknown)

Publish Date
17 Dec 2025

Voice Activity Detection (VAD) is a crucial pre-processing step for speech technologies, yet standard Conformer architectures suffer from quadratic computational complexity. This study introduces the Conformer-Performer, a novel architecture that replaces standard multi-head self-attention with the Fast Attention Via positive Orthogonal Random features (FAVOR+) mechanism to achieve linear complexity. The objective was to develop an efficient VAD model that maintains high accuracy suitable for resource-constrained applications. The model was trained on the multilingual FLEURS dataset using a teacher-student approach and extensive data augmentation. Experimental results demonstrate that the Conformer-Performer achieves an F1-score of 98.29%, which is highly competitive with the standard Conformer's 98.41%, while achieving a 7.8-fold reduction in peak GPU memory usage and a 3.46-fold speedup in CPU inference time. Furthermore, the proposed model significantly outperforms the SileroVAD baseline. These findings confirm that the Conformer-Performer offers a compelling balance of accuracy and efficiency, making it highly suitable for real-time, on-device speech processing.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

Journal of Scientific Research, Education, and Technology

Website

Abbrev

jrest

Publisher

Kirana Publisher

Subject

Computer Science & IT Economics, Econometrics & Finance Education Engineering Social Sciences

Description

FOCUS AND SCOPE JSRET (Journal of Scientific Research, Education, and Technology) encourages scientific and technological research, particularly with regard to Indonesia, but not just in terms of authorship or regional coverage of current issues. Scientists, instructors, senior researchers, project ...

Article Info

Abstract

Conformer-Performer: An Efficient Architecture for Voice Activity Detection

Article Info

Abstract