This study investigated the use of End-to-End Automatic Speech Recognition (E2E ASR) for Qur'an recitation under low resource conditions using the Whisper model. This study follows the CRISP-DM methodology, starting with defining the research gap and preparing a curated dataset of 200 verses from Juz 30. These verses were chosen because of their short and consistent structure, allowing for efficient experimentation. Audio and transcription pairs are verified and cleaned to ensure alignment and quality. The modeling was done using Whisper in Google Colaboratory, leveraging its pre-trained architecture to reduce training time and computing costs. Evaluations use the Character Error Rate (CER) metric to measure transcription accuracy. The results showed that Whisper achieved an average CER of 0.142, corresponding to a transcription accuracy of about 85%. However, the average processing time per father is 11 seconds, almost double the time it takes for a human readout. Although Whisper provides strong accuracy for Arabic transcription, its runtime efficiency remains a challenge in real-time applications. This research contributes reproducible channels, validated datasets, and performance benchmarks for future studies of the Qur'anic ASR under computational constraints.
Copyrights © 2025