The purpose of this study was to compare the performance of the two types of models used in the task of classifying Quran verses based on audio similarity. The first model is Model B which uses MFCC features and the MaLSTM architecture, while the second model is Model C, which is Model B with additional delta features. The stages in this study consist of determining the dataset, determining the parameters, preprocessing, training, and testing. The dataset in this study was obtained from the local dataset https://sahabatibadah.com/fasih/. This study conducted data analysis based on 172,895 samples of Al-Quran recitation sounds from Juzz 30, which includes a total of 37 surahs with 564 verses. This sound data were taken from the recording on the Qara'a application and collected from 500 users of the application. In this study, 3 out of 500 users were used as training data to train speech recognition models, while one user was used as testing data. The training model used was DeepSpeech supported by TensorFlow. In the model training process, 30% of the samples were used as a validation set. Based on the results, Model B with the MFCC feature is the best model in the task of recognizing and classifying audio-based Quran verses. The use of the delta feature in Model B and Model C show a negative impact on model performance. The MFCC feature is more recommended in the recognition and classification of audio-based Qur’an verses, especially in the LSTM model architecture.
Copyrights © 2023