The Speech Recognition System has achieved WER (Word Error Rate) up to 11.85% in English Words. Big data in speech can helps machine learning to become popular because it can maintain a good generalization to boost machine learning in speech recognition. This paper inspired by Baidu (Deep Speech), we will implement its architecture to achieve the same goal in Indonesian Words. For this research, we use many variations of datasets according to its source such as clean environment voice, noise environment voice, and speech synthesizer from Apple and Bing. The main problem is many variations of datasets influence the results of WER according to its size. Bigger variations of datasets maintain good generalization for the machine learning, but also it has big ambiguity in language model.
Copyrights © 2018