Transliteration as Alignment vs. Transliteration as Generation for Crosslingual Information Retrieval

Anil Kumar Singh, Sethuramalingam Subramaniam, Taraka Rama
Language Technologies Research Centre
[anil@research, sethu@research, taraka@students]
Crosslingual Information Retrieval (CLIR) usually requires query translation and, due to named entities in the case of IR, query translation requires a good transliteration system when writing systems differ. Transliteration can be seen as a problem of generation or alignment. For IR, since we can extract a word list from the corpus being searched, it should be seen as an alignment problem. The shift from generation to alignment can lead to higher transliteration accuracies and significant improvements in the CLIR results. We were able to achieve an increase (over generation) in the CLIR Mean Average Precision by 22.66% and 29.08% for English to Hindi and English to Marathi, respectively.