Boosting Neural Machine Translation with Similar Translations

04/07/2020

Boosting Neural Machine Translation with Similar Translations

Jitao XU, Josep Crego, Jean Senellart

Keywords: Boosting Translation, Neural Translation, data methods, human translator

Abstract Paper Similar Papers

Abstract: This paper explores data augmentation methods for training Neural Machine Translation to make use of similar translations, in a comparable way a human translator employs fuzzy matches. In particular, we show how we can simply present the neural model with information of both source and target sides of the fuzzy matches, we also extend the similarity to include semantically related translations retrieved using sentence distributed representations. We show that translations based on fuzzy matching provide the model with ``copy'' information while translations based on embedding similarities tend to extend the translation ``context''. Results indicate that the effect from both similar sentences are adding up to further boost accuracy, combine naturally with model fine-tuning and are providing dynamic adaptation for unseen translation pairs. Tests on multiple data sets and domains show consistent accuracy improvements. To foster research around these techniques, we also release an Open-Source toolkit with efficient and flexible fuzzy-match implementation.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ACL 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

16/11/2020

Dynamic Context Selection for Document-level Neural Machine Translation via Reinforcement Learning

Xiaomian Kang, Yang Zhao, Jiajun Zhang, Chengqing Zong

Keywords Paper

document-level translation, translations, document-level model, selection module

0

0

0

0

11:36

04/07/2020

A Systematic Study of Inner-Attention-Based Sentence Representations in Multilingual Neural Machine Translation

Raúl Vázquez, Alessandro Raganato, Mathias Creutz, Jörg Tiedemann

Keywords Paper

Multilingual Translation, Neural translation, transfer learning, translation

0

0

0

0

14:05

05/12/2020

Touch editing: A flexible one-time interaction approach for translation

Qian Wang, Jiajun Zhang, Lemao Liu and
Guoping Huang, Chengqing Zong

Keywords Paper

0

0

0

0

12:23

19/08/2021

On Guaranteed Optimal Robust Explanations for NLP Models

Emanuele La Malfa, Rhiannon Michelmore, Agnieszka M. Zbrzezny and
Nicola Paoletti, Marta Kwiatkowska

Keywords Paper

Machine Learning, Adversarial Machine Learning, Explainable/Interpretable Machine Learning, Sentiment Analysis and Text Mining

0

0

0

0

14:52

01/07/2020

Re-translation versus Streaming for Simultaneous Translation

Naveen Arivazhagan, Colin Cherry, Wolfgang Macherey, George Foster

Keywords Paper

0

0

0

0

23:21

16/11/2020

Dynamic Data Selection and Weighting for Iterative Back-Translation

Zi-Yi Dou, Antonios Anastasopoulos, Graham Neubig

Keywords Paper

neural translation, neural nmt, nmt, domain adaptation

0

0

0

0

11:30

05/01/2021

Cross-Domain Latent Modulation for Variational Transfer Learning

Jinyong Hou, Jeremiah D. Deng, Stephen Cranefield, Xuejie Ding

Keywords Paper

0

0

0

0

4:52

19/04/2021

Enriching non-autoregressive transformer with syntactic and semantic structures for neural machine translation

Ye Liu, Yao Wan, Jianguo Zhang and
Wenting Zhao, Philip Yu

Keywords Paper

0

0

0

0

10:18

04/07/2020

A Multi-Perspective Architecture for Semantic Code Search

Rajarshi Haldar, Lingfei Wu, JinJun Xiong, Julia Hockenmaier

Keywords Paper

Semantic Search, code matching, monolingual matching, cross-lingual task

0

0

0

0

6:45

04/07/2020

Multimodal Quality Estimation for Machine Translation

Shu Okabe, Frédéric Blain, Lucia Specia

Keywords Paper

Multimodal Estimation, Machine Translation, Quality Estimation, Quality QE

0

0

0

0

7:41

04/07/2020

Extractive Summarization as Text Matching

Ming Zhong, Pengfei Liu, Yiran Chen and
Danqing Wang, Xipeng Qiu, Xuanjing Huang

Keywords Paper

Extractive Summarization, Text Matching, extractive task, semantic problem

0

0

0

0

11:44

16/11/2020

Neural Mask Generator: Learning to Generate Adaptive Word Maskings for Language Model Adaptation

Minki Kang, Moonsu Han, Sung Ju Hwang

Keywords Paper

self-supervised pre-training, question answering, task, reinforcement learning

0

0

0

0

12:00

26/04/2020

Compositional languages emerge in a neural iterated learning model

Yi Ren, Shangmin Guo, Matthieu Labeau and
Shay B. Cohen, Simon Kirby

Keywords Paper

Compositionality, Multi-agent, Emergent language, Iterated learning

0

0

0

0

5:07

04/07/2020

Learning Source Phrase Representations for Neural Machine Translation

Hongfei Xu, Josef van Genabith, Deyi Xiong and
Qiuhui Liu, Jingyi Zhang

Keywords Paper

Neural Translation, WMT tasks, Learning Representations, Transformer model

0

0

0

0

7:18

03/05/2021

Filtered Inner Product Projection for Crosslingual Embedding Alignment

Vin Sachidananda, Ziyi Yang, Chenguang Zhu

Keywords Paper

multilingual representations, natural language processing, word embeddings

0

0

0

0

5:22

04/07/2020

Estimating the influence of auxiliary tasks for multi-task learning of sequence tagging tasks

Fynn Schröder, Chris Biemann

Keywords Paper

multi-task tasks, MTL, TL, MTL setups

0

0

0

0

12:02

16/11/2020

LAReQA: Language-Agnostic Answer Retrieval from a Multilingual Pool

Uma Roy, Noah Constant, Rami Al-Rfou and
Aditya Barua, Aaron Phillips, Yinfei Yang

Keywords Paper

language-agnostic retrieval, cross-lingual tasks, cross-lingual retrieval, alignment

0

0

0

0

12:07

16/11/2020

Translation Artifacts in Cross-lingual Transfer Learning

Mikel Artetxe, Gorka Labaka, Eneko Agirre

Keywords Paper

human translation, cross-lingual learning, natural inference, machine translation

0

0

0

0

11:30

18/11/2020

Dual learning: Theoretical study and an algorithmic extension

Zhibing Zhao, Yingce Xia, Tao Qin and
Lirong Xia, Tie-Yan Liu

Keywords Paper

0

0

0

0

10:01

16/11/2020

Local Additivity Based Data Augmentation for Semi-supervised NER

Jiaao Chen, Zhenghui Wang, Ran Tian and
Zichao Yang, Diyi Yang

Keywords Paper

named recognition, deep understanding, semi-supervised ner, entity learning

0

0

0

0

11:18

04/07/2020

Character-Level Translation with Self-attention

Yingqiang Gao, Nikola I. Nikolov, Yuhuang Hu, Richard H.R. Hahnloser

Keywords Paper

Character-Level Translation, bilingual translation, self-attention models, transformer model

0

0

0

0

8:03

04/07/2020

Automatic Machine Translation Evaluation using Source Language Inputs and Cross-lingual Language Model

Kosuke Takahashi, Katsuhito Sudoh, Satoshi Nakamura

Keywords Paper

Automatic Evaluation, machine translation, Cross-lingual Model, regression model

0

0

0

0

7:17

05/12/2020

Self-supervised learning for pairwise data refinement

Gustavo Hernandez Abrego, Bowen Liang, Wei Wang and
Zarana Parekh, Yinfei Yang, Yunhsuan Sung

Keywords Paper

0

0

0

0

15:17

16/11/2020

Partially-Aligned Data-to-Text Generation with Distant Supervision

Zihao Fu, Bei Shi, Wai Lam and
Lidong Bing, Zhiyuan Liu

Keywords Paper

data-to-text task, generation task, dataset problem, over-generation problem

0

0

0

0

11:58

08/12/2020

A Mixture-of-Experts Model for Learning Multi-Facet Entity Embeddings

Rana Alshaikh, Zied Bouraoui, Shelan Jeawak, Steven Schockaert

Keywords Paper

0

0

0

0

14:13

16/11/2020

A Bilingual Generative Transformer for Semantic Sentence Embedding

John Wieting, Graham Neubig, Taylor Berg-Kirkpatrick

Keywords Paper

source separation, semantic encoding, data distributions, unsupervised evaluations

0

0

0

0

14:32

05/01/2021

Scale Equivariance Improves Siamese Tracking

Ivan Sosnovik, Artem Moskalev, Arnold W.M. Smeulders

Keywords Paper

0

0

0

0

4:57

03/05/2021

Structured Prediction as Translation between Augmented Natural Languages

Giovanni Paolini, Ben Athiwaratkun, Jason Krone and
Jie Ma, Alessandro Achille, RISHITA ANUBHAI, Cicero Nogueira dos Santos, Bing Xiang, Stefano Soatto

Keywords Paper

sequence to sequence, structured prediction, language models, transfer learning, few-shot learning, multi-task learning, generative modeling

0

0

0

0

12:16

16/11/2020

Cross-Thought for Sentence Encoder Pre-training

Shuohang Wang, Yuwei Fang, Siqi Sun and
Zhe Gan, Yu Cheng, Jingjing Liu, Jing Jiang

Keywords Paper

pre-training encoder, large-scale tasks, question answering, predicting words

0

0

0

0

12:06

12/07/2020

Educating Text Autoencoders: Latent Representation Guidance via Denoising

Tianxiao Shen, Jonas Mueller, Regina Barzilay, Tommi Jaakkola

Keywords Paper

Deep Learning - Generative Models and Autoencoders

0

0

0

0

17:06

02/02/2021

Joint Semantic Analysis with Document-Level Cross-Task Coherence Rewards

Rahul Aralikatte, Mostafa Abdou, Heather C Lent and
Daniel Hershcovich, Anders Søgaard

Keywords Paper

0

0

0

0

14:41

12/07/2020

Sequence Generation with Mixed Representations

Lijun Wu, Shufang Xie, Yingce Xia and
Yang Fan, Jian-Huang Lai, Tao Qin, Tie-Yan Liu

Keywords Paper

Sequential, Network, and Time-Series Modeling

0

0

0

0

13:54

19/08/2021

Automatically Paraphrasing via Sentence Reconstruction and Round-trip Translation

Zilu Guo, Zhongqiang Huang, Kenny Q. Zhu and
Guandan Chen, Kaibo Zhang, Boxing Chen, Fei Huang

Keywords Paper

Natural Language Processing, Machine Translation, Natural Language Generation, NLP Applications and Tools

0

0

0

0

13:53

16/11/2020

Improving Text Generation with Student-Forcing Optimal Transport

Jianqiao Li, Chunyuan Li, Guoyin Wang and
Hao Fu, Yuhchen Lin, Liqun Chen, Yizhe Zhang, Chenyang Tao, Ruiyi Zhang, Wenlin Wang, Dinghan Shen, Qian Yang, Lawrence Carin

Keywords Paper

testing, ot learning, machine translation, text summarization

0

0

0

0

11:51

02/02/2021

A Simple and Effective Self-Supervised Contrastive Learning Framework for Aspect Detection

Tian Shi, Liuqing Li, Ping Wang, Chandan K. Reddy

Keywords Paper

0

0

0

0

19:21

16/11/2020

Sequence-Level Mixed Sample Data Augmentation

Demi Guo, Yoon Kim, Alexander Rush

Keywords Paper

sequence-to-sequence problems, scan, semantic parsing, neural networks

0

0

0

0

5:58

03/05/2021

On Learning Universal Representations Across Languages

Xiangpeng Wei, Rongxiang Weng, Yue Hu and
Luxi Xing, Heng Yu, Weihua Luo

Keywords Paper

hierarchical contrastive learning, cross-lingual pretraining, universal representation learning

0

0

0

0

3:51

19/04/2021

Interpretability for morphological inflection: From character-level predictions to subword-level rules

Tatyana Ruzsics, Olga Sozinova, Ximena Gutierrez-Vasques, Tanja Samardzic

Keywords Paper

0

0

0

0

10:53

04/07/2020

A Retrieve-and-Rewrite Initialization Method for Unsupervised Machine Translation

Shuo Ren, Yu Wu, Shujie Liu and
Ming Zhou, Shuai Ma

Keywords Paper

Unsupervised Translation, translation, Retrieve-and-Rewrite Method, translation models

0

0

0

0

6:31

25/07/2020

Unsupervised semantic hashing with pairwise reconstruction

Casper Hansen, Christian Hansen, Jakob Grue Simonsen and
Stephen Alstrup, Christina Lioma

Keywords Paper

semantic hashing, variational, pairwise reconstruction

0

0

0

0

8:59