The rules of mad recitation in the Qur’an are a crucial aspect of tajwīd, governing the lengthening of vowel sounds that affect both meaning and recitational accuracy. Despite its importance, there is currently no reliable automatic system capable of classifying mad rules based on voice input. This study proposes a deep learning-based approach using a hybrid Convolutional Neural Network–Long Short-Term Memory (CNN-LSTM) model to automatically classify mad rules from Qur’anic recitations. The research follows the CRISP-DM methodology, covering data understanding, preparation, modeling, and evaluation stages. Acoustic features were extracted from 3,816 annotated audio segments of Surah Al-Fātiḥah, combining Mel-Frequency Cepstral Coefficients (MFCC), Chroma, Spectral Contrast, and Root Mean Square (RMS) to represent phonetic and prosodic attributes. The CNN layers captured spatial characteristics of the spectrum, while LSTM layers modeled temporal dependencies of the audio. Experimental results show that the combination of all four features achieved an accuracy of 97.21%, precision of 95.28%, recall of 95.22%, and F1-score of 95.25%. These findings indicate that multi-feature integration enhances model robustness and interpretability. The proposed CNN-LSTM framework demonstrates potential for practical deployment in voice-based tajwīd learning tools and contributes to the broader field of Qur’anic speech recognition by offering a systematic, ethically grounded, and data-driven approach to mad classification.
Copyrights © 2026