Handwritten Arabic text recognition (HATR) presents unique challenges due to complex character shapes, contextual variations, cursive connections, and the presence of diacritical marks. This study introduces AHAD (Arabic Handwritten Alphabet with Diacritics), a novel benchmark dataset of 71,061 handwritten Arabic character images annotated with five primary vowel diacritics; Fathah, Kasrah, Dammah, Shaddah, and Sukoon, covering 492 distinct classes that combine character identity, contextual form, and diacritic. Leveraging this dataset, we propose an incremental learning framework based on Convolutional Neural Networks (CNNs) to address fine-grained recognition of handwritten Arabic characters with its corresponding diacritics. The model was initially trained on a 114-class dataset of handwritten Arabic characters (in all contextual forms) of non-diacritic characters and fine-tuned in two phases using the AHAD dataset. The two-phase strategy includes output layer expansion, learning rate adjustment, and gradual unfreezing of deeper layers to enhance knowledge retention and prevent catastrophic forgetting. The proposed method achieved a validation accuracy of 92.96% and a test accuracy of 93.26%. Our findings demonstrate the effectiveness of incremental learning for diacritic-aware Arabic handwriting recognition and establish AHAD as a strong baseline for future research in this field.
Copyrights © 2025