Translation Artifacts in Cross-lingual Transfer Learning

Abstract: Both human and machine translation play a central role in cross-lingual transfer learning: many multilingual datasets have been created through professional translation services, and using machine translation to translate either the test set or the training set is a widely used transfer technique. In this paper, we show that such translation process can introduce subtle artifacts that have a notable impact in existing cross-lingual models. For instance, in natural language inference, translating the premise and the hypothesis independently can reduce the lexical overlap between them, which current models are highly sensitive to. We show that some previous findings in cross-lingual transfer learning need to be reconsidered in the light of this phenomenon. Based on the gained insights, we also improve the state-of-the-art in XNLI for the translate-test and zero-shot approaches by 4.3 and 2.8 points, respectively.

04/07/2020

Translation Artifacts in Cross-lingual Transfer Learning

Mikel Artetxe, Gorka Labaka, Eneko Agirre

Comments

Similar Papers

MMPE: A Multi-Modal Interface using Handwriting, Touch Reordering, and Speech Commands for Post-Editing Machine Translation

Nico Herbig, Santanu Pal, Tim Düwel and Kalliopi Meladaki, Mahsa Monshizadeh, Vladislav Hnatovskiy, Antonio Krüger, Josef van Genabith

Keywords Abstract Paper

Post-Editing Translation, Post-Editing , translation, PE MT

Unsupervised Word Translation with Adversarial Autoencoder

Tasnim Mohiuddin, Shafiq Joty

Keywords Abstract Paper

Unsupervised Translation, machine translation, transfer learning, word task

Cross-lingual visual pre-training for multimodal machine translation

Ozan Caglayan, Menekse Kuyu, Mustafa Sercan Amac and Pranava Madhyastha, Erkut Erdem, Aykut Erdem, Lucia Specia

Keywords Abstract Paper

Robust Neural Machine Translation with ASR Errors

Haiyang Xue, Yang Feng, Shuhao Gu, Wei Chen

Keywords Abstract Paper

Touch editing: A flexible one-time interaction approach for translation

Qian Wang, Jiajun Zhang, Lemao Liu and Guoping Huang, Chengqing Zong

Keywords Abstract Paper

MMPE: A Multi-Modal Interface for Post-Editing Machine Translation

Nico Herbig, Tim Düwel, Santanu Pal and Kalliopi Meladaki, Mahsa Monshizadeh, Antonio Krüger, Josef van Genabith

Keywords Abstract Paper

Post-Editing Translation, machine translation, MT, translators

Dynamic Data Selection and Weighting for Iterative Back-Translation

Zi-Yi Dou, Antonios Anastasopoulos, Graham Neubig

Keywords Abstract Paper

neural translation, neural nmt, nmt, domain adaptation

LAReQA: Language-Agnostic Answer Retrieval from a Multilingual Pool

Uma Roy, Noah Constant, Rami Al-Rfou and Aditya Barua, Aaron Phillips, Yinfei Yang

Keywords Abstract Paper

language-agnostic retrieval, cross-lingual tasks, cross-lingual retrieval, alignment

Jointly Learning to Align and Summarize for Neural Cross-Lingual Summarization

Yue Cao, Hui Liu, Xiaojun Wan

Keywords Abstract Paper

Neural Summarization, Cross-lingual summarization, cross-lingual training, pipeline methods

On Negative Interference in Multilingual Models: Findings and A Meta-Learning Treatment

Zirui Wang, Zachary C. Lipton, Yulia Tsvetkov

Keywords Abstract Paper

multilingual models, meta-learning algorithm, multilingual representations, negative interference

Grounding inductive biases in natural images: invariance stems from variations in data

Diane Bouchacourt, Mark Ibrahim, Ari Morcos

Keywords Abstract Paper

machine learning, transformers

Dual learning: Theoretical study and an algorithmic extension

Zhibing Zhao, Yingce Xia, Tao Qin and Lirong Xia, Tie-Yan Liu

Keywords Abstract Paper

A Systematic Study of Inner-Attention-Based Sentence Representations in Multilingual Neural Machine Translation

Raúl Vázquez, Alessandro Raganato, Mathias Creutz, Jörg Tiedemann

Keywords Abstract Paper

Multilingual Translation, Neural translation, transfer learning, translation

Partially-Aligned Data-to-Text Generation with Distant Supervision

Zihao Fu, Bei Shi, Wai Lam and Lidong Bing, Zhiyuan Liu

Keywords Abstract Paper

data-to-text task, generation task, dataset problem, over-generation problem

Why Overfitting Isn't Always Bad: Retrofitting Cross-Lingual Word Embeddings to Dictionaries

Mozhi Zhang, Yoshinari Fujinuma, Michael J. Paul, Jordan Boyd-Graber

Keywords Abstract Paper

Dictionaries, BLI, generalization, downstream tasks

Selecting Backtranslated Data from Multiple Sources for Improved Neural Machine Translation

Xabier Soto, Dimitar Shterionov, Alberto Poncelas, Andy Way

Keywords Abstract Paper

Neural Translation, Machine MT, new systems, MT systems

Refining Language Models with Compositional Explanations

Huihan Yao, Ying Chen, Qinyuan Ye and Xisen Jin, Xiang Ren

Keywords Abstract Paper

machine learning, fairness, language

Simulated multiple reference training improves low-resource machine translation

Huda Khayrallah, Brian Thompson, Matt Post, Philipp Koehn

Keywords Abstract Paper

machine mt, mt, simulated training, simulated

Linguistic Features for Readability Assessment

Tovly Deutsch, Masoud Jasbi, Stuart Shieber

Keywords Abstract Paper

Towards Semantics-Enhanced Pre-Training: Can Lexicon Definitions Help Learning Sentence Meanings?

Xuancheng Ren, Xu Sun, Houfeng Wang, Qun Liu

Keywords Abstract Paper

GLUECoS: An Evaluation Benchmark for Code-Switched NLP

Simran Khanuja, Sandipan Dandapat, Anirudh Srinivasan and Sunayana Sitaram, Monojit Choudhury

Nico Herbig, Santanu Pal, Tim Düwel and
Kalliopi Meladaki, Mahsa Monshizadeh, Vladislav Hnatovskiy, Antonio Krüger, Josef van Genabith

Keywords Paper

Keywords Paper

Ozan Caglayan, Menekse Kuyu, Mustafa Sercan Amac and
Pranava Madhyastha, Erkut Erdem, Aykut Erdem, Lucia Specia

Keywords Paper

Keywords Paper

Qian Wang, Jiajun Zhang, Lemao Liu and
Guoping Huang, Chengqing Zong

Keywords Paper

Nico Herbig, Tim Düwel, Santanu Pal and
Kalliopi Meladaki, Mahsa Monshizadeh, Antonio Krüger, Josef van Genabith

Keywords Paper

Keywords Paper

Uma Roy, Noah Constant, Rami Al-Rfou and
Aditya Barua, Aaron Phillips, Yinfei Yang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Zhibing Zhao, Yingce Xia, Tao Qin and
Lirong Xia, Tie-Yan Liu

Keywords Paper

Keywords Paper

Zihao Fu, Bei Shi, Wai Lam and
Lidong Bing, Zhiyuan Liu

Keywords Paper

Keywords Paper

Keywords Paper

Huihan Yao, Ying Chen, Qinyuan Ye and
Xisen Jin, Xiang Ren

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Simran Khanuja, Sandipan Dandapat, Anirudh Srinivasan and
Sunayana Sitaram, Monojit Choudhury

Keywords Paper

Keywords Paper

Ye Liu, Yao Wan, Jianguo Zhang and
Wenting Zhao, Philip Yu

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Chengyi Wang, Yu Wu, Shujie Liu and
Ming Zhou, Zhenglu Yang

Keywords Paper

Keywords Paper

Keywords Paper

Rui Wang, Xin Liu, Yiu-ming Cheung and
Kai Cheng, Nannan Wang, Wentao Fan

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper