Indonesian Journal of Electrical Engineering and Computer Science
Vol 37, No 1: January 2025

A three-phase model to keyword detection in Arabic corpora

Namly, Driss (Unknown)
Bouzoubaa, Karim (Unknown)
Tachicart, Ridouane (Unknown)



Article Info

Publish Date
01 Jan 2025

Abstract

The exponential growth of Arabic text data in recent years has created an urgent demand for sophisticated keyword detection techniques that are specifically tailored to the nuances of the Arabic language. This study addresses the critical need for efficient tools capable of swiftly and accurately identifying keywords within a collection of Arabic documents, particularly when analyzing multiple documents in a corpus. To meet this challenge, we present a novel corpus specifically designed for keyword detection in Arabic texts, along with an innovative approach that integrates three distinct candidate keyword lists: a frequency-based list, a vector space model list, and a machine learning-based list. This hybrid methodology leverages the strengths of each technique, enabling a more comprehensive and effective keyword identification process. We conducted extensive experimental validation to assess the performance and computational efficiency of our proposed pipeline. The results demonstrate that our approach consistently achieves robust performance across a variety of domains, with evaluation metrics indicating F1-scores that consistently surpass 91%. Overall, this study contributes to the advancement of automated keyword detection in Arabic, paving the way for enhanced information retrieval and text analysis capabilities.

Copyrights © 2025