Garuda - Garba Rujukan Digital

Article Per Year (5 Year)

p-Index From 2020 - 2025

0.23

P-Index

This Author published in this journals

All Journal Journal of Computer Networks, Architecture and High Performance Computing

Alrekabee, Mohammed

Unknown Affiliation

Author-ID : 9019992

Computer Science & IT Education

Published : 1 Documents Claim Missing Document

Claim Missing Document

Articles

Arabic NLP: A Survey of Pre-Processing and Representation Techniques: Arabic NLP Alrekabee, Mohammed
Journal of Computer Networks, Architecture and High Performance Computing Vol. 7 No. 4 (2025): Articles Research October 2025
Publisher : Information Technology and Science (ITScience)

Show Abstract | Download Original | Original Source | Check in Google Scholar

The rapid growth of Arabic Natural Language Processing (NLP) has underscored the vital role of upstream tasks that prepare raw text for modeling. This review systematically examines the key steps in Arabic text pre-processing and representation learning, highlighting their impact on downstream NLP performance. We discuss the unique linguistic challenges posed by Arabic, such as rich morphology, orthographic ambiguity, dialectal diversity, and code-switching phenomena. The survey covers traditional rule-based and statistical methods and modern deep learning approaches, including subword tokenization and contextual embeddings. Special attention is given to how pre-trained language models like AraBERT and MARBERT interact with pre-processing pipelines, often redefining the balance between explicit text normalization and implicit representation learning. Furthermore, we analyze existing tools, benchmarks, and evaluation metrics, and identify persistent gaps such as dialect adaptation and Romanized Arabic (Arabizi) processing. By mapping current practices and open issues, this review aims to guide researchers and practitioners towards more robust, adaptive, and linguistically-aware Arabic NLP pipelines, ensuring that the data fed into models is as clean, consistent, and semantically meaningful as possible.

Co-Authors

Title

Found 1 Documents
Search

Abstract

Title Search

Found 1 Documents Search

Abstract

Title

Found 1 Documents
Search