Garuda - Garba Rujukan Digital

Mohamed Ali

Computer Science and Systems, University of Washington, Tacoma

Author-ID : 7212335

Biochemistry, Genetics & Molecular Biology Engineering

Published : 2 Documents Claim Missing Document

Claim Missing Document

Articles

Arabic Diacritic-Aware Text-Audio Segmentation and Alignment Model (DASAM) Adel Sabour; Abdeltawab Hendawi; Mohamed Ali
Elkawnie: Journal of Islamic Science and Technology Vol 10, No 1 (2024)
Publisher : Universitas Islam Negeri Ar-Raniry Banda Aceh

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.22373/ekw.v10i1.23637

Abstract: This paper introduces the Diacritic-Aware Segmentation and Alignment Model for Arabic (DASAM). Diacritics are vital for pronunciation and meaning in the Arabic language but are often ignored by current speech recognition systems. DASAM is designed for word-level segmentation and alignment in unseen audio and associating them with diacritic-marked Arabic text. The DASAM approach uses linguistic analysis based on intonation rules. DASAM then applies Dynamic Time Warping (DTW) to match the reference audio word with its position in the unseen sentence audio. The model outputs a list of words with their start and end times in the recording. Tested on the Qur’an dataset, DASAM outperforms Google Speech-to-Text (STT) in predicting word timings. It achieves higher accuracy in text-audio alignment, with values of 0.959 and 0.957 for word start and end times, respectively (compared to Google STT’s 0.870 and 0.849). Additionally, DASAM employs advanced signal processing techniques and demonstrates robustness across various audio variations. These results establish that DASAM constitutes a fundamental building block for speech-to-text conversion and linguistic research in Arabic, particularly for applications involving diacritics.

Arabic Diacritic-Aware Text-Audio Segmentation and Alignment Model (DASAM) Adel Sabour; Abdeltawab Hendawi; Mohamed Ali
Elkawnie Vol. 10 No. 1 (2024)
Publisher : Faculty of Science and Technology Universitas Islam Negeri Ar-Raniry

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.22373/ekw.v10i1.23637

Abstract: This paper introduces the Diacritic-Aware Segmentation and Alignment Model for Arabic (DASAM). Diacritics are vital for pronunciation and meaning in the Arabic language but are often ignored by current speech recognition systems. DASAM is designed for word-level segmentation and alignment in unseen audio and associating them with diacritic-marked Arabic text. The DASAM approach uses linguistic analysis based on intonation rules. DASAM then applies Dynamic Time Warping (DTW) to match the reference audio word with its position in the unseen sentence audio. The model outputs a list of words with their start and end times in the recording. Tested on the Qur’an dataset, DASAM outperforms Google Speech-to-Text (STT) in predicting word timings. It achieves higher accuracy in text-audio alignment, with values of 0.959 and 0.957 for word start and end times, respectively (compared to Google STT’s 0.870 and 0.849). Additionally, DASAM employs advanced signal processing techniques and demonstrates robustness across various audio variations. These results establish that DASAM constitutes a fundamental building block for speech-to-text conversion and linguistic research in Arabic, particularly for applications involving diacritics.

Title

Found 2 Documents
Search

Abstract

Abstract

Title Search

Found 2 Documents Search

Abstract

Abstract

Title

Found 2 Documents
Search