IAES International Journal of Artificial Intelligence (IJ-AI)
Vol 11, No 2: June 2022

Language lexicons for Hindi-English multilingual text processing

Mohd Zeeshan Ansari (Jamia Millia Islamia)
Tanvir Ahmad (Jamia Millia Islamia)
Mirza Mohd Sufyan Beg (Aligarh Muslim University)
Noaima Bari (Jamia Millia Islamia)



Article Info

Publish Date
01 Jun 2022

Abstract

Language identification (LI) in textual documents is the process of automatically detecting the language contained in a document based on its content. The present language identification techniques presume that a document contains text in one of the fixed set of languages. However, this presumption is incorrect when dealing with multilingual document which includes content in more than one possible language. Due to the unavailability of standard corpora for Hindi-English mixed lingual language processing tasks, we propose the language lexicons, a novel kind of lexical database that augments several bilingual language processing tasks. These lexicons are built by learning classifiers over English and transliterated Hindi vocabulary. The designed lexicons possess condensed quantitative characteristics which reflect their linguistic strength in respect of Hindi and English language. On evaluating the lexicons, it is observed that words of the same language tend to cluster together and are separable over language classes. On comparing the classifier performance with existing works, the proposed lexicon models exhibit the better performance.

Copyrights © 2022






Journal Info

Abbrev

IJAI

Publisher

Subject

Computer Science & IT Engineering

Description

IAES International Journal of Artificial Intelligence (IJ-AI) publishes articles in the field of artificial intelligence (AI). The scope covers all artificial intelligence area and its application in the following topics: neural networks; fuzzy logic; simulated biological evolution algorithms (like ...