Learning to Translate Kannada and English Queries for Mixed Script Information Retrieval

Authors

  • B. S. Sowmya Lakshmi Department of Machine Learning, B. M. S. College of Engineering, Bangalore, Karnataka, India
  • B. R. Shambhavi Department of Information Science and Engineering, B. M. S. College of Engineering, Bangalore, Karnataka, India

DOI:

https://doi.org/10.31577/cai_2021_3_628

Keywords:

Code mixing, mixed script queries, cross language information retrieval, machine translation

Abstract

Due to increase in the availability of numerous languages in the Web, cross language information retrieval is one of the happening issues in the field of natural language processing and information retrieval. Nowadays, people are habituated to combine two or more language words during oral or written discourse. Speakers have also employed intermixing of different languages and scripts in digital media while querying, blogging and on social media platforms. The way of representing two different language words of an utterance in their native scripts is known as mixed scripting. In the present work, we attempted to translate mixed script queries of Kannada and English languages into monolingual queries. We proposed three approaches for translation by constructing bilingual dictionary, word embeddings and Google translate. The proposed method outperforms the conventional dictionary based approach, when word embeddings were combined with the translations learnt from Google Translate and Dictionary.

Downloads

Download data is not yet available.

Downloads

Published

2021-11-30

How to Cite

Sowmya Lakshmi, B. S., & Shambhavi, B. R. (2021). Learning to Translate Kannada and English Queries for Mixed Script Information Retrieval. Computing and Informatics, 40(3), 628–647. https://doi.org/10.31577/cai_2021_3_628