Claim Missing Document
Check
Articles

Found 2 Documents
Search

A three-phase model to keyword detection in Arabic corpora Namly, Driss; Bouzoubaa, Karim; Tachicart, Ridouane
Indonesian Journal of Electrical Engineering and Computer Science Vol 37, No 1: January 2025
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijeecs.v37.i1.pp206-213

Abstract

The exponential growth of Arabic text data in recent years has created an urgent demand for sophisticated keyword detection techniques that are specifically tailored to the nuances of the Arabic language. This study addresses the critical need for efficient tools capable of swiftly and accurately identifying keywords within a collection of Arabic documents, particularly when analyzing multiple documents in a corpus. To meet this challenge, we present a novel corpus specifically designed for keyword detection in Arabic texts, along with an innovative approach that integrates three distinct candidate keyword lists: a frequency-based list, a vector space model list, and a machine learning-based list. This hybrid methodology leverages the strengths of each technique, enabling a more comprehensive and effective keyword identification process. We conducted extensive experimental validation to assess the performance and computational efficiency of our proposed pipeline. The results demonstrate that our approach consistently achieves robust performance across a variety of domains, with evaluation metrics indicating F1-scores that consistently surpass 91%. Overall, this study contributes to the advancement of automated keyword detection in Arabic, paving the way for enhanced information retrieval and text analysis capabilities.
An innovative Arabic light stemmer developed using a hybrid approach Namly, Driss; Bouzoubaa, Karim
International Journal of Electrical and Computer Engineering (IJECE) Vol 15, No 2: April 2025
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijece.v15i2.pp2356-2363

Abstract

Our study introduces an innovative light stemming tool tailored for Arabic morphology challenges. In conformance with the templatic and concatenative structures, our stemmer utilizes a combination of clitic stripping, lexicon-based, and statistical disambiguation techniques to ensure accurate stemming. To accomplish this, we rely on our clitic rules lexicon to detect all potential combinations of clitics for each input entry. Subsequently, we depend on an extensive lexicon of over 7 million stems to verify the potential stems. Lastly, we employ a statistical model to ascertain the most likely stem based on the sentence's context. Experimental results demonstrate the effectiveness of the proposed stemmer in comparison with existing ones. Using different datasets, our stemmer achieves higher accuracy and F1 scores, highlighting its efficiency in Arabic stemming tasks.