A large-scale evaluation of neural machine transliteration for indic languages

19/04/2021

A large-scale evaluation of neural machine transliteration for indic languages

Anoop Kunchukuttan, Siddharth Jain, Rahul Kejriwal

Keywords:

Abstract Paper Similar Papers

Abstract: We take up the task of large-scale evaluation of neural machine transliteration between English and Indic languages, with a focus on multilingual transliteration to utilize orthographic similarity between Indian languages. We create a corpus of 600K word pairs mined from parallel translation corpora and monolingual corpora, which is the largest transliteration corpora for Indian languages mined from public sources. We perform a detailed analysis of multilingual transliteration and propose an improved multilingual training recipe for Indic languages. We analyze various factors affecting transliteration quality like language family, transliteration direction and word origin.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at EACL 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

16/11/2020

CCAligned: A Massive Collection of Cross-Lingual Web-Document Pairs

Ahmed El-Kishky, Vishrav Chaudhary, Francisco Guzmán, Philipp Koehn

Keywords Paper

cross-lingual alignment, mining sentences, cross-lingual nlp, cross-lingual representations

0

0

0

0

11:47

04/07/2020

Character-Level Translation with Self-attention

Yingqiang Gao, Nikola I. Nikolov, Yuhuang Hu, Richard H.R. Hahnloser

Keywords Paper

Character-Level Translation, bilingual translation, self-attention models, transformer model

0

0

0

0

8:03

08/12/2020

Harnessing Cross-lingual Features to Improve Cognate Detection for Low-resource Languages

Diptesh Kanojia, Raj Dabre, Shubham Dewangan and
Pushpak Bhattacharyya, Gholamreza Haffari, Malhar Kulkarni

Keywords Paper

0

0

0

0

12:49

08/12/2020

Semantic Structural Decomposition for Neural Machine Translation

Elior Sulem, Omri Abend, Ari Rappoport

Keywords Paper

0

0

0

0

9:54

26/04/2020

Neural Machine Translation with Universal Visual Representation

Zhuosheng Zhang, Kehai Chen, Rui Wang and
Masao Utiyama, Eiichiro Sumita, Zuchao Li, Hai Zhao

Keywords Paper

Neural Machine Translation, Visual Representation, Multimodal Machine Translation, Language Representation

0

0

0

0

4:50

04/07/2020

Unsupervised Multilingual Sentence Embeddings for Parallel Corpus Mining

Ivana Kvapilíková, Mikel Artetxe, Gorka Labaka and
Eneko Agirre, Ondřej Bojar

Keywords Paper

Unsupervised Embeddings, Parallel Mining, multilingual embeddings, parallel tasks

0

0

0

0

11:30

04/07/2020

Neural CRF Model for Sentence Alignment in Text Simplification

Chao Jiang, Mounica Maddela, Wuwei Lan and
Yang Zhong, Wei Xu

Keywords Paper

Sentence Alignment, Text Simplification, monolingual task, automatic evaluation

0

0

0

1

11:55

02/02/2021

A Unified Pretraining Framework for Passage Ranking and Expansion

Ming Yan, Chenliang Li, Bin Bi and
Wei Wang, Songfang Huang

Keywords Paper

0

0

0

0

16:33

04/07/2020

GLUECoS: An Evaluation Benchmark for Code-Switched NLP

Simran Khanuja, Sandipan Dandapat, Anirudh Srinivasan and
Sunayana Sitaram, Monojit Choudhury

Keywords Paper

Code-Switched NLP, cross-lingual tasks, NLP tasks, Language Identification

0

0

0

0

12:08

08/12/2020

Incremental Neural Lexical Coherence Modeling

Sungho Jeon, Michael Strube

Keywords Paper

0

0

0

0

9:08

04/07/2020

Parallel Sentence Mining by Constrained Decoding

Pinzhen Chen, Nikolay Bogoychev, Kenneth Heafield, Faheem Kirefu

Keywords Paper

Parallel Mining, decoding, Constrained Decoding, neural translation

0

0

0

0

6:22

04/07/2020

Multimodal Quality Estimation for Machine Translation

Shu Okabe, Frédéric Blain, Lucia Specia

Keywords Paper

Multimodal Estimation, Machine Translation, Quality Estimation, Quality QE

0

0

0

0

7:41

04/07/2020

SimulSpeech: End-to-End Simultaneous Speech to Text Translation

Yi Ren, Jinglin Liu, Xu Tan and
Chen Zhang, Tao Qin, Zhou Zhao, Tie-Yan Liu

Keywords Paper

simultaneous translation, simultaneous recognition, ASR, NMT

0

0

0

0

5:51

04/07/2020

Neural Syntactic Preordering for Controlled Paraphrase Generation

Tanya Goyal, Greg Durrett

Keywords Paper

Controlled Generation, Paraphrasing sentences, machine translation, Neural Preordering

0

0

0

0

11:37

04/07/2020

Unsupervised Cross-lingual Representation Learning at Scale

Alexis Conneau, Kartikay Khandelwal, Naman Goyal and
Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, Veselin Stoyanov

Keywords Paper

cross-lingual tasks, XNLI, MLQA, NER

0

0

0

0

12:15

04/07/2020

Fine-Grained Analysis of Cross-Linguistic Syntactic Divergences

Dmitry Nikolaev, Ofir Arviv, Taelin Karidi and
Neta Kenneth, Veronika Mitnik, Lilja Maria Saeboe, Omri Abend

Keywords Paper

Fine-Grained Divergences, cross-lingual transfer, full automation, cross-lingual parser

0

0

0

0

12:05

19/08/2021

Automatically Paraphrasing via Sentence Reconstruction and Round-trip Translation

Zilu Guo, Zhongqiang Huang, Kenny Q. Zhu and
Guandan Chen, Kaibo Zhang, Boxing Chen, Fei Huang

Keywords Paper

Natural Language Processing, Machine Translation, Natural Language Generation, NLP Applications and Tools

0

0

0

0

13:53

16/11/2020

X-SRL: A Parallel Cross-Lingual Semantic Role Labeling Dataset

Angel Daza, Anette Frank

Keywords Paper

generalization learning, multilingual learning, high-quality translation, srl

0

0

0

0

9:24

16/11/2020

Cross-Thought for Sentence Encoder Pre-training

Shuohang Wang, Yuwei Fang, Siqi Sun and
Zhe Gan, Yu Cheng, Jingjing Liu, Jing Jiang

Keywords Paper

pre-training encoder, large-scale tasks, question answering, predicting words

0

0

0

0

12:06

04/07/2020

Phonetic and Visual Priors for Decipherment of Informal Romanization

Maria Ryskina, Matthew R. Gormley, Taylor Berg-Kirkpatrick

Keywords Paper

Decipherment Romanization, Informal romanization, idiosyncratic process, noisy-channel model

0

0

0

0

11:47

08/12/2020

CogniVal in Action: An Interface for Customizable Cognitive Word Embedding Evaluation

Nora Hollenstein, Adrian van der Lek, Ce Zhang

Keywords Paper

0

0

0

0

4:03

05/12/2020

Fairseq S2T: Fast speech-to-text modeling with fairseq

Changhan Wang, Yun Tang, Xutai Ma and
Anne Wu, Dmytro Okhonko, Juan Pino

Keywords Paper

0

0

0

0

8:51

19/08/2021

Exemplification Modeling: Can You Give Me an Example, Please?

Edoardo Barba, Luigi Procopio, Caterina Lacerra and
Tommaso Pasini, Roberto Navigli

Keywords Paper

Natural Language Processing, Natural Language Semantics, Resources and Evaluation

0

0

0

0

14:47

19/08/2021

MultiMirror: Neural Cross-lingual Word Alignment for Multilingual Word Sense Disambiguation

Luigi Procopio, Edoardo Barba, Federico Martelli, Roberto Navigli

Keywords Paper

Natural Language Processing, Natural Language Semantics, Resources and Evaluation

0

0

0

0

12:25

02/02/2021

UWSpeech: Speech to Speech Translation for Unwritten Languages

Chen Zhang, Xu Tan, Yi Ren and
Tao Qin, Kejun Zhang, Tie-Yan Liu

Keywords Paper

0

0

0

0

15:14

16/11/2020

Multilingual AMR-to-Text Generation

Angela Fan, Claire Gardent

Keywords Paper

multilingual generation, cross-lingual embeddings, pretraining, multilingual models

0

0

0

0

12:06

02/02/2021

Consecutive Decoding for Speech-to-text Translation

Qianqian Dong, Mingxuan Wang, Hao Zhou and
Shuang Xu, Bo Xu, Lei Li

Keywords Paper

0

0

0

0

14:20

04/07/2020

Investigating the effect of auxiliary objectives for the automated grading of learner English speech transcriptions

Hannah Craighead, Andrew Caines, Paula Buttery, Helen Yannakoudakis

Keywords Paper

automated transcriptions, automatically speech, multi-task learning, inductive transfer

0

0

0

0

11:37

03/05/2021

Filtered Inner Product Projection for Crosslingual Embedding Alignment

Vin Sachidananda, Ziyi Yang, Chenguang Zhu

Keywords Paper

multilingual representations, natural language processing, word embeddings

0

0

0

0

5:22

04/07/2020

Multilingual Universal Sentence Encoder for Semantic Retrieval

Yinfei Yang, Daniel Cer, Amin Ahmad and
Mandy Guo, Jax Law, Noah Constant, Gustavo Hernandez Abrego, Steve Yuan, Chris Tar, Yun-hsuan Sung, Brian Strope, Ray Kurzweil

Keywords Paper

Semantic Retrieval, translation tasks, monolingual retrieval, translation retrieval

0

0

0

0

12:02

19/08/2021

Phonovisual Biases in Language: is the Lexicon Tied to the Visual World?

Andrea Gregor de Varda, Carlo Strapparava

Keywords Paper

Computer Vision, Language and Vision, Phonology, Morphology, and Word Segmentation, Psycholinguistics

0

0

0

0

15:15

04/07/2020

Automatic Machine Translation Evaluation using Source Language Inputs and Cross-lingual Language Model

Kosuke Takahashi, Katsuhito Sudoh, Satoshi Nakamura

Keywords Paper

Automatic Evaluation, machine translation, Cross-lingual Model, regression model

0

0

0

0

7:17

16/11/2020

XGLUE: A New Benchmark Datasetfor Cross-lingual Pre-training, Understanding and Generation

Yaobo Liang, Nan Duan, Yeyun Gong and
Ning Wu, Fenfei Guo, Weizhen Qi, Ming Gong, Linjun Shou, Daxin Jiang, Guihong Cao, Xiaodong Fan, Ruofei Zhang, Rahul Agrawal, Edward Cui, Sining Wei, Taroon Bharti, Ying Qiao, Jiun-Hung Chen, Winnie Wu, Shuguang Liu, Fan Yang, Daniel Campos, Rangan Majumder, Ming Zhou

Keywords Paper

large-scale models, cross-lingual tasks, natural tasks, cross-lingual pre-training

0

0

0

0

10:06

08/12/2020

A Multilingual Reading Comprehension System for more than 100 Languages

Anthony Ferritto, Sara Rosenthal, Mihaela Bornea and
Kazi Hasan, Rishav Chakravarti, Salim Roukos, Radu Florian, Avi Sil

Keywords Paper

0

0

0

0

3:26

01/07/2020

The ADAPT System Description for the STAPLE 2020 English-to-Portuguese Translation Task

Rejwanul Haque, Yasmin Moslem, Andy Way

Keywords Paper

0

0

0

0

10:22

08/12/2020

AlphaMWE: Construction of Multilingual Parallel Corpora with MWE Annotations

Lifeng Han, Gareth Jones, Alan Smeaton

Keywords Paper

0

0

0

0

14:26

08/12/2020

hinglishNorm - A Corpus of Hindi-English Code Mixed Sentences for Text Normalization

Piyush Makhija, Ankit Kumar, Anuj Gupta

Keywords Paper

0

0

0

0

14:55

08/12/2020

Delexicalized Paraphrase Generation

Boya Yu, Konstantine Arkoudas, Wael Hamza

Keywords Paper

0

0

0

0

15:06

04/07/2020

A Multi-Perspective Architecture for Semantic Code Search

Rajarshi Haldar, Lingfei Wu, JinJun Xiong, Julia Hockenmaier

Keywords Paper

Semantic Search, code matching, monolingual matching, cross-lingual task

0

0

0

0

6:45

04/07/2020

Semantic Parsing for English as a Second Language

Yuanyuan Zhao, Weiwei Sun, Junjie Cao, Xiaojun Wan

Keywords Paper

semantic parsing, second acquisition, Semantic Parsing, ESL

0

0

0

0

11:04