Speaker recognition aims to identify who is speaking from their voice and is widely used in security, personalization, and archival search. A related, culturally significant task is recognizing Qur’ān reciters from their recitations. The Quran is the central religious text of Islam and is recited with codified pronunciation and melodic rules (tajwīd and maqām). Distinguishing reciters can support digital archiving, educational feedback, and retrieval of stylistically similar recitations. We present a controlled comparison of deep learning approaches for Qur’ān reciter recognition, contrasting feature-based pipelines with end-to-end waveform models under a unified protocol. Using ṣūrah Al-Tawbah recitations from 12 reciters (18,540 clips; fixed 2 s segments), an X-Vector architecture with Mel-Frequency Cepstral Coefficients (MFCCs) attains perfect test performance (accuracy/precision/recall/F1 =100%). Convolutional Neural Network (CNN) and Bidirectional LSTM (BLSTM) baselines achieve near-optimal results (99.96% accuracy and F1), while an end-to-end X-Vector trained on raw waveforms reaches 98.77% accuracy (F1 = 0.9877). These findings indicate that explicit spectral features remain advantageous for short segments requiring fine acoustic discrimination, although end-to-end learning is competitive and simplifies preprocessing. We release the curated dataset with standardized splits and training scripts to enable reproducible benchmarking. Overall, feature-informed X-Vectors constitute a strong reference for short-segment reciter identification, and our results motivate hybrid/self-supervised front ends, tajwīd-aware analysis, and real-time, on-device deployment.
Copyrights © 2026