Machine translationese: Effects of algorithmic bias on linguistic complexity in machine translation

Abstract: Recent studies in the field of Machine Translation (MT) and Natural Language Processing (NLP) have shown that existing models amplify biases observed in the training data. The amplification of biases in language technology has mainly been examined with respect to specific phenomena, such as gender bias. In this work, we go beyond the study of gender in MT and investigate how bias amplification might affect language in a broader sense. We hypothesize that the ‘algorithmic bias’, i.e. an exacerbation of frequently observed patterns in combination with a loss of less frequent ones, not only exacerbates societal biases present in current datasets but could also lead to an artificially impoverished language: ‘machine translationese’. We assess the linguistic richness (on a lexical and morphological level) of translations created by different data-driven MT paradigms – phrase-based statistical (PB-SMT) and neural MT (NMT). Our experiments show that there is a loss of lexical and syntactic richness in the translations produced by all investigated MT paradigms for two language pairs (EN-FR and EN-ES).

08/12/2020

Machine translationese: Effects of algorithmic bias on linguistic complexity in machine translation

Eva Vanmassenhove, Dimitar Shterionov, Matthew Gwilliam

Comments

Similar Papers

ContraCAT: Contrastive Coreference Analytical Templates for Machine Translation

Dario Stojanovski, Benno Krojer, Denis Peskov, Alexander Fraser

Keywords Abstract Paper

Towards Robustifying NLI Models Against Lexical Dataset Biases

Xiang Zhou, Mohit Bansal

Keywords Abstract Paper

Natural Inference, data augmentation, Robustifying Models, deep models

Refining Language Models with Compositional Explanations

Huihan Yao, Ying Chen, Qinyuan Ye and Xisen Jin, Xiang Ren

Keywords Abstract Paper

machine learning, fairness, language

On Negative Interference in Multilingual Models: Findings and A Meta-Learning Treatment

Zirui Wang, Zachary C. Lipton, Yulia Tsvetkov

Keywords Abstract Paper

multilingual models, meta-learning algorithm, multilingual representations, negative interference

Translationese as a Language in "Multilingual" NMT

Parker Riley, Isaac Caswell, Markus Freitag, David Grangier

Keywords Abstract Paper

Translationese, Machine translation, zero-shot translation, Multilingual NMT

Avoiding the Hypothesis-Only Bias in Natural Language Inference via Ensemble Adversarial Training

Joe Stacey, Pasquale Minervini, Haim Dubossarsky and Sebastian Riedel, Tim Rocktäschel

Keywords Abstract Paper

neural networks, adversarial training, sentence representations, nli models

Double-Hard Debias: Tailoring Word Embeddings for Gender Bias Mitigation

Tianlu Wang, Xi Victoria Lin, Nazneen Fatema Rajani and Bryan McCann, Vicente Ordonez, Caiming Xiong

Keywords Abstract Paper

Tailoring Embeddings, Gender Mitigation, Double-Hard Debias, downstream models

On the Linguistic Representational Power of Neural Machine Translation Models

Yonatan Belinkov, Nadir Durrani, Fahim Dalvi and Hassan Sajjad, James Glass

Keywords Abstract Paper

Linguistic Models, natural processing, artificial intelligence, translating languages

Adversarial Filters of Dataset Biases

Ronan Le Bras, Swabha Swayamdipta, Chandra Bhagavatula and Rowan Zellers, Matthew Peters, Ashish Sabharwal, Yejin Choi

Keywords Abstract Paper

Deep Learning - Algorithms

Detecting Word Sense Disambiguation Biases in Machine Translation for Model-Agnostic Adversarial Attacks

Denis Emelin, Ivan Titov, Rico Sennrich

Keywords Abstract Paper

word disambiguation, nmt, prediction errors, adversarial strategy

Training effective neural CLIR by bridging the translation gap

Hamed Bonab, Sheikh Muhammad Sarwar, James Allan

Keywords Abstract Paper

cross-lingual word embedding, cross-lingual information retrieval, neural clir, translation gap

Emergent Communication Pretraining for Few-Shot Machine Translation

Yaoyiran Li, Edoardo Maria Ponti, Ivan Vulić, Anna Korhonen

Keywords Abstract Paper

LAReQA: Language-Agnostic Answer Retrieval from a Multilingual Pool

Uma Roy, Noah Constant, Rami Al-Rfou and Aditya Barua, Aaron Phillips, Yinfei Yang

Keywords Abstract Paper

language-agnostic retrieval, cross-lingual tasks, cross-lingual retrieval, alignment

An exploratory study on multilingual quality estimation

Shuo Sun, Marina Fomicheva, Frédéric Blain and Vishrav Chaudhary, Ahmed El-Kishky, Adithya Renduchintala, Francisco Guzmán, Lucia Specia

Keywords Abstract Paper

Adversarial Meta Sampling for Multilingual Low-Resource Speech Recognition

Yubei Xiao, Ke Gong, Pan Zhou and Guolin Zheng, Xiaodan Liang, Liang Lin

Keywords Abstract Paper

Uncertainty-Aware Semantic Augmentation for Neural Machine Translation

Xiangpeng Wei, Heng Yu, Yue Hu and Rongxiang Weng, Luxi Xing, Weihua Luo

Keywords Abstract Paper

sequence-to-sequence task, nmt, inference, translation tasks

Cross-Lingual Unsupervised Sentiment Classification with Multi-View Transfer Learning

Hongliang Fei, Ping Li

Keywords Abstract Paper

Cross-Lingual Classification, sentiment classification, unsupervised system, classification

Elastic weight consolidation for better bias inoculation

James Thorne, Andreas Vlachos

Keywords Abstract Paper

LIREx: Augmenting Language Inference with Relevant Explanations

Xinyan Zhao, V.G.Vinod Vydiswaran

Keywords Abstract Paper

Reducing Gender Bias in Neural Machine Translation as a Domain Adaptation Problem

Danielle Saunders, Bill Byrne

Keywords Abstract Paper

Reducing Bias, Neural Translation, Domain Problem, NLP tasks

Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer

Jieyu Zhao, Subhabrata Mukherjee, Saghar Hosseini and Kai-Wei Chang, Ahmed Hassan Awadallah

Keywords Paper

Keywords Paper

Huihan Yao, Ying Chen, Qinyuan Ye and
Xisen Jin, Xiang Ren

Keywords Paper

Keywords Paper

Keywords Paper

Joe Stacey, Pasquale Minervini, Haim Dubossarsky and
Sebastian Riedel, Tim Rocktäschel

Keywords Paper

Tianlu Wang, Xi Victoria Lin, Nazneen Fatema Rajani and
Bryan McCann, Vicente Ordonez, Caiming Xiong

Keywords Paper

Yonatan Belinkov, Nadir Durrani, Fahim Dalvi and
Hassan Sajjad, James Glass

Keywords Paper

Ronan Le Bras, Swabha Swayamdipta, Chandra Bhagavatula and
Rowan Zellers, Matthew Peters, Ashish Sabharwal, Yejin Choi

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Uma Roy, Noah Constant, Rami Al-Rfou and
Aditya Barua, Aaron Phillips, Yinfei Yang

Keywords Paper

Shuo Sun, Marina Fomicheva, Frédéric Blain and
Vishrav Chaudhary, Ahmed El-Kishky, Adithya Renduchintala, Francisco Guzmán, Lucia Specia

Keywords Paper

Yubei Xiao, Ke Gong, Pan Zhou and
Guolin Zheng, Xiaodan Liang, Liang Lin

Keywords Paper

Xiangpeng Wei, Heng Yu, Yue Hu and
Rongxiang Weng, Luxi Xing, Weihua Luo

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Jieyu Zhao, Subhabrata Mukherjee, Saghar Hosseini and
Kai-Wei Chang, Ahmed Hassan Awadallah

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Zihan Liu, Genta I Winata, Samuel Cahyawijaya and
Andrea Madotto, Zhaojiang Lin, Pascale Fung

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Yuntian Deng, Anton Bakhtin, Myle Ott and
Arthur Szlam, Marc'Aurelio Ranzato

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper