Maskur, Maimunah
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

The Impact of Data Augmentation Techniques on Improving Speech Recognition Performance for English in Indonesian Children Based on Wav2Vec 2.0 Maskur, Maimunah; Zahra, Amalia
IJCCS (Indonesian Journal of Computing and Cybernetics Systems) Vol 19, No 2 (2025): April
Publisher : IndoCEISS in colaboration with Universitas Gadjah Mada, Indonesia.

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.22146/ijccs.104646

Abstract

Early childhood education is a crucial phase in shaping children's character and language skills. This study develops an Automatic Speech Recognition (ASR) model to recognize the speech of Indonesian children speaking English. The process begins with collecting and processing a dataset of children's speech recordings, which is then expanded using data augmentation techniques to enhance pronunciation variations. The pre-trained ASR Wav2Vec 2.0 model is fine-tuned with both the original and augmented datasets. Evaluation using Word Error Rate (WER) and Character Error Rate (CER) shows a significant accuracy improvement, with WER decreasing from 53% to 45% and CER from 33% to 27%, reflecting a performance increase of approximately 15%. Further analysis reveals pronunciation errors in phonemes such as /ð/, /θ/, /r/, /v/, /z/, and /ʃ/, which are uncommon in the Indonesian language, manifesting as substitutions, omissions, or additions in words like "three," "that," "rabbit," "very," and "zebra." These findings highlight the need for targeted phoneme training, audio-based approaches with ASR feedback, and the listen-and-repeat technique in English language instruction for children.Keywords— Early childhood education, Automatic Speech Recognition, Augmentation, Character Error Rate, Word Error Rate