19/04/2021

A large-scale evaluation of neural machine transliteration for indic languages

Anoop Kunchukuttan, Siddharth Jain, Rahul Kejriwal

Keywords:

Abstract: We take up the task of large-scale evaluation of neural machine transliteration between English and Indic languages, with a focus on multilingual transliteration to utilize orthographic similarity between Indian languages. We create a corpus of 600K word pairs mined from parallel translation corpora and monolingual corpora, which is the largest transliteration corpora for Indian languages mined from public sources. We perform a detailed analysis of multilingual transliteration and propose an improved multilingual training recipe for Indic languages. We analyze various factors affecting transliteration quality like language family, transliteration direction and word origin.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at EACL 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd Characters remaining: 140

Similar Papers