Speech recognition research with end-to-end model-based bidirectional RNN approach still has the constraint on high latency. speech recognition model which was built using end-to-end models still have some problems at error spelling. Also, speech recognition model is very sensitive to dialecta and types of equipment recorder speakers. This work will examine the behavior of the network in studying the acoustic features based on gradient and loss with the use of unidirectional base GRU CTC which has lower cost of computation compared to the base bidirectional RNN CTC. This study did not use the language model in helping to model the acoustics in the mapping of the acoustic signal. Using the data in the audio translation of the Quran in the dialect and Bahasa Indonesia, the data extracted using the MFCC to obtain acoustic features. Batch Normalization is also used on the GRU network to avoid covariate-shift between the layers of the network. The network consists of three-layers network MLP with activation function ReLU and forwarded with a layer of unidirectional GRU. After passing through the GRU, the data will be processed on the SLP with the function of the activation of softmax where the results will be input on the CTC. The network is optimized using Adam optimizer and generate 90.611 % WER of the best model tested. The network has a vanishing gradient and results in the slow process of learning the network in recognizing the acoustic signal. The use of unidirectional GRU base also has no big significance in the delay layer to expose the temporal information.
                        
                        
                        
                        
                            
                                Copyrights © 2021