Cross Language Information Retrieval (CLIR) stands as an essential element in multilingual information accessibility, enabling users to obtain relevant information even when the query language and the language of the documents diverge. This paper proposes a translation framework for CLIR in Tamil and Malayalam, two Dravidian languages widely spoken in South India. Different challenges prevail in CLIR of these languages due to their linguistic differences, translation equivalence, mapping source to target languages, semantic equivalence, limited dataset and tools for ongoing research in this domain. The proposed methodology resolves some of the issues around training of a corpus utilizing a Long Short-Term Memory (LSTM) based encoder-decoder translation model. The study incorporates two bilingual parallel corpora comprising 373 sentences pairs each. Evaluation of the model's accuracy is conducted by equivalency its translations against reference translations using the Bilingual Evaluation Understudy (BLEU Score). Furthermore, BLEU scores obtained from proposed LSTM-based encoder-decoder model is compared with those from Google Translate. The findings reveal that the LSTM model attains an average BLEU score of 0.933, where, performance of Google Translate, achieved a score of 0.813. Finally, the study conducts a comparative analysis with selected CLIR models in different languages, to evaluate the overall performance of the proposed approach.
Copyrights © 2024