EL Mohajir, Badr Eddine
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Arabic text diacritization using transformers: a comparative study Zubeiri, Iman; Souri, Adnan; El Mohajir, Badr Eddine
IAES International Journal of Artificial Intelligence (IJ-AI) Vol 14, No 1: February 2025
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijai.v14.i1.pp702-711

Abstract

The Arabic language presents challenges for natural language processing (NLP) tasks. One such challenge is diacritization, which involves adding diacritical marks to Arabic text to enhance readability and disambiguation. Diacritics play a crucial role in determining the correct pronunciation, meaning, and grammatical structure of words and sentences. However, Arabic texts are often written without diacritics, making NLP tasks more complex. This study investigates the efficacy of advanced machine learning models in automatic Arabic text diacritization, with a concentrated focus on the Arabic bidirectional encoder representations from transformers (AraBERT) and bidirectional long short-term memory (Bi-LSTM) models. AraBERT, a bidirectional encoder representation from transformers (BERT) derivative, leverages the transformer architecture to exploit contextual subtleties and discern linguistic patterns within a substantial corpus. Our comprehensive evaluation benchmarks the performance of these models, revealing that AraBERT significantly outperforms the Bi-LSTM with a diacritic error rate (DER) of only 0.81% and an accuracy rate of 98.15%, against the Bi-LSTM's DER of 1.02% and accuracy of 93.88%. The study also explores various optimization strategies to amplify model performance, setting a precedent for future research to enhance Arabic diacritization and contribute to the advancement of Arabic NLP.