25/07/2020

Combining contextualized and non-contextualized query translations to improve CLIR

Suraj Nair, Petra Galuscakova, Douglas W. Oard

Keywords: CLIR, machine translation

Abstract: In cross-language information retrieval using probabilistic structured queries (PSQ), translation probabilities from statistical machine translation act as a bridge between the query and document vocabulary. These translation probabilities are typically estimated from a sentence-aligned corpus on a word to word basis without taking into account the context. Neural methods, by contrast, can learn to translate using the context around the words, and this can be used as a basis for estimating context-dependent translation probabilities. However, sparsity limits the accuracy of context-specific translation probabilities for rare words, which can be important in retrieval applications. This paper presents evidence that combining such context-dependent translation probabilities with context-independent translation probabilities learned from the same parallel corpus can yield improvements in the effectiveness of cross-language ranked retrieval.

The video of this talk cannot be embedded. You can watch it here:
https://dl.acm.org/doi/10.1145/3397271.3401270#sec-supp
(Link will open in new window)
 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at SIGIR 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd Characters remaining: 140

Similar Papers