Mirza Mohd Sufyan Beg
Aligarh Muslim University

Published : 2 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 2 Documents
Search

Language lexicons for Hindi-English multilingual text processing Mohd Zeeshan Ansari; Tanvir Ahmad; Mirza Mohd Sufyan Beg; Noaima Bari
IAES International Journal of Artificial Intelligence (IJ-AI) Vol 11, No 2: June 2022
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijai.v11.i2.pp641-648

Abstract

Language identification (LI) in textual documents is the process of automatically detecting the language contained in a document based on its content. The present language identification techniques presume that a document contains text in one of the fixed set of languages. However, this presumption is incorrect when dealing with multilingual document which includes content in more than one possible language. Due to the unavailability of standard corpora for Hindi-English mixed lingual language processing tasks, we propose the language lexicons, a novel kind of lexical database that augments several bilingual language processing tasks. These lexicons are built by learning classifiers over English and transliterated Hindi vocabulary. The designed lexicons possess condensed quantitative characteristics which reflect their linguistic strength in respect of Hindi and English language. On evaluating the lexicons, it is observed that words of the same language tend to cluster together and are separable over language classes. On comparing the classifier performance with existing works, the proposed lexicon models exhibit the better performance.
Hindi to English transliteration using multilayer gated recurrent units Mohd Zeeshan Ansari; Tanvir Ahmad; Mirza Mohd Sufyan Beg; Faiyaz Ahmad
Indonesian Journal of Electrical Engineering and Computer Science Vol 27, No 2: August 2022
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijeecs.v27.i2.pp1083-1090

Abstract

Transliteration is the task of translating text from source script to target script provided that the language of the text remains the same. In this work, we perform transliteration on less explored Devanagari to Roman Hindi transliteration and its back transliteration. The neural transliteration model in this work is based on a sequence-to-sequence neural network that is composed of two major components, an encoder that transforms source language words into a meaningful representation and the decoder that is responsible for decoding the target language words. We utilize gated recurrent units (GRU) to design the multilayer encoder and decoder network. Among the several models, the multilayer model shows the best performance in terms of coupon equivalent rate (CER) and word error rate (WER). The method generates quite satisfactory predictions in Hindi-English bilingual machine transliteration with WER of 64.8% and CER of 20.1% which is a significant improvement over existing methods.