The Deep Neural Network (DNN)-based approach offers significantly higher accuracy compared to traditional methods such as Hidden Markov Model (HMM)-Gaussian Mixture Model (GMM) in acoustic model development. In this research, three popular DNN variants were evaluated: Time-Delay Neural Network (TDNN), Long Short-Term Memory (LSTM), and a hybrid combination of TDNN-LSTM for acoustic model development in Indonesian speech recognition. Using the KDW-BPPT-50K-ASR1 speech data for over 92 hours, acoustic models were trained, and experiments were conducted to analyze their performance. Research results show that the hybrid TDNN-LSTM model achieved the best performance with a Word Error Rate (WER) of 9.67%, outperforming TDNN with a WER of 12.16% and LSTM with a WER of 10.6%. This finding confirms that the hybrid model is able to improve the accuracy of Indonesian speech recognition compared to using TDNN or LSTM separately. These results provide a significant contribution to the development of more accurate and efficient speech recognition systems.
Copyrights © 2025