Fadl Mutaher Ba-Alwi
Sana'a University

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Automatic Extraction Of Malay Compound Nouns Using A Hybrid Of Statistical And Machine Learning Methods Muneer A. S. Hazaa; Nazlia Omar; Fadl Mutaher Ba-Alwi; Mohammed Albared
International Journal of Electrical and Computer Engineering (IJECE) Vol 6, No 3: June 2016
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (15.748 KB) | DOI: 10.11591/ijece.v6i3.pp925-935

Abstract

Identifying of compound nouns is important for a wide spectrum of applications in the field of natural language processing such as machine translation and information retrieval. Extraction of compound nouns requires deep or shallow syntactic preprocessing tools and large corpora. This paper investigates several methods for extracting Noun compounds from Malay text corpora. First, we present the empirical results of sixteen statistical association measures of Malay <N+N> compound nouns extraction. Second, we introduce the possibility of integrating multiple association measures. Third, this work also provides a standard dataset intended to provide a common platform for evaluating research on the identification compound Nouns in Malay language. The standard data set contains 7,235 unique N-N candidates, 2,970 of them are N-N compound nouns collocations. The extraction algorithms are evaluated against this reference data set. The experimental results  demonstrate that a group of association measures (T-test , Piatersky-Shapiro (PS) , C_value, FGM and  rank combination method) are the best association measure and outperforms the other association measures for <N+N> collocations in the Malay  corpus. Finally, we describe several classification methods for combining association measures scores of the basic measures, followed by their evaluation. Evaluation results show that classification algorithms significantly outperform individual association measures. Experimental results obtained are quite satisfactory in terms of the Precision, Recall and F-score.