IAES International Journal of Artificial Intelligence (IJ-AI)
Vol 13, No 3: September 2024

BERT-based models for classifying multi-dialect Arabic texts

Fouadi, Hassan (Unknown)
El Moubtahij, Hicham (Unknown)
Lamtougui, Hicham (Unknown)
Yahyaouy, Ali (Unknown)



Article Info

Publish Date
01 Sep 2024

Abstract

The area of natural language processing (NLP) is presently a rapidly developing field characterized by innovation and research. Despite this progress, several dialects of Arabic (DA) are classified as low-resource languages, making it challenging for NLP systems to process DA data. One approach to address this issue is to train NLP models on social media data sets containing DA texts. Therefore, these open-access social media datasets, as outlined in our paper, can serve as a valuable resource for developers and researchers involved in the processing of DA.To create our multilingual corpus, we gathered data from various datasets containing different versions of DA. These datasets will be used to classify texts in terms of sentiment classification, topic classification, and dialect identification. Our study contributes to the automated analysis of the classification of Arabic dialects. We aim to investigate and assess various machine learning and deep learning techniques, with a specific focus on utilizing the BERT model. The results of our experiments on our datasets show that DarijaBERT and DziriBERT trained on a similar DA outperform traditional machine learning methods and previous more general pre-trained models that were trained on multiple dialects or languages.

Copyrights © 2024






Journal Info

Abbrev

IJAI

Publisher

Subject

Computer Science & IT Engineering

Description

IAES International Journal of Artificial Intelligence (IJ-AI) publishes articles in the field of artificial intelligence (AI). The scope covers all artificial intelligence area and its application in the following topics: neural networks; fuzzy logic; simulated biological evolution algorithms (like ...