Aligned Cross Entropy for Non-Autoregressive Machine Translation

12/07/2020

Aligned Cross Entropy for Non-Autoregressive Machine Translation

Marjan Ghazvininejad, Vladimir Karpukhin, Luke Zettlemoyer, Omer Levy

Keywords: Applications - Language, Speech and Dialog

Abstract Paper Similar Papers

Abstract: Non-autoregressive machine translation models significantly speed up decoding by allowing for parallel prediction of the entire target sequence. However, modeling word order is more challenging due to the lack of autoregressive factors in the model. This difficultly is compounded during training with cross entropy loss, which can highly penalize small shifts in word order. In this paper, we propose aligned cross entropy (AXE) as an alternate loss function for training of non-autoregressive models. AXE uses a differentiable dynamic program to assign loss based on the best possible monotonic alignment between target tokens and model predictions. AXE-based non-monotonic training of conditional masked language models (CMLMs) improves performance by 3 and 5 BLEU points respectively on WMT 16 EN-RO and WMT 14 EN-DE. It also significantly outperforms the state-of-the-art non-autoregressive models on a range of translation benchmarks.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICML 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

08/12/2020

Infusing Sequential Information into Conditional Masked Translation Model with Self-Review Mechanism

Pan Xie, Zhi Cui, Xiuying Chen and
XiaoHui Hu, Jianwei Cui, Bin Wang

Keywords Paper

0

0

0

0

6:43

04/07/2020

On the Limitations of Cross-lingual Encoders as Exposed by Reference-Free Machine Translation Evaluation

Wei Zhao, Goran Glavaš, Maxime Peyrard and
Yang Gao, Robert West, Steffen Eger

Keywords Paper

Evaluation encoders, zero-shot transfer, supervised tasks, web-scale systems

0

0

0

0

12:19

08/12/2020

Specializing Unsupervised Pretraining Models for Word-Level Semantic Similarity

Anne Lauscher, Ivan Vulić, Edoardo Maria Ponti and
Anna Korhonen, Goran Glavaš

Keywords Paper

0

0

0

0

13:01

08/12/2020

Emergent Communication Pretraining for Few-Shot Machine Translation

Yaoyiran Li, Edoardo Maria Ponti, Ivan Vulić, Anna Korhonen

Keywords Paper

0

0

0

0

14:42

18/07/2021

Order-Agnostic Cross Entropy for Non-Autoregressive Machine Translation

Cunxiao Du, Zhaopeng Tu, Jing Jiang

Keywords Paper

Applications, Natural Language Processing

0

0

0

0

17:21

30/11/2020

Fast and Differentiable Message Passing on Pairwise Markov Random Fields

Zhiwei Xu, Thalaiyasingam Ajanthan, Richard Hartley

Keywords Paper

0

0

0

0

9:41

06/12/2021

FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition

Yichong Leng, Xu Tan, Linchen Zhu and
Jin Xu, Renqian Luo, Linquan Liu, Tao Qin, Xiangyang Li, Edward Lin, Tie-Yan Liu

Keywords Paper

0

0

0

0

13:44

04/07/2020

BPE-Dropout: Simple and Effective Subword Regularization

Ivan Provilkov, Dmitrii Emelianenko, Elena Voita

Keywords Paper

open problem, machine translation, subword segmentation, training

0

0

0

0

9:33

26/04/2020

A Probabilistic Formulation of Unsupervised Text Style Transfer

Junxian He, Xinyi Wang, Graham Neubig, Taylor Berg-Kirkpatrick

Keywords Paper

unsupervised text style transfer, deep latent sequence model

0

0

0

0

5:02

02/02/2021

Generating CCG Categories

Yufang Liu, Tao Ji, Yuanbin Wu, Man Lan

Keywords Paper

0

0

0

0

15:20

14/06/2020

Multi-Scale Interactive Network for Salient Object Detection

Youwei Pang, Xiaoqi Zhao, Lihe Zhang, Huchuan Lu

Keywords Paper

saliency detection, salient object detection, feature interaction strategy, scale-insensitive loss, multi-scale features, multi-level features, fully convolutional network, deep learning

0

0

0

0

1:01

06/12/2021

Consistency Regularization for Variational Auto-Encoders

Samarth Sinha, Adji Bousso Dieng

Keywords Paper

deep learning, machine learning, self-supervised learning, generative model, contrastive learning, representation learning

0

0

0

0

10:52

19/04/2021

Enriching non-autoregressive transformer with syntactic and semantic structures for neural machine translation

Ye Liu, Yao Wan, Jianguo Zhang and
Wenting Zhao, Philip Yu

Keywords Paper

0

0

0

0

10:18

14/06/2020

ViewAL: Active Learning With Viewpoint Entropy for Semantic Segmentation

Yawar Siddiqui, Julien Valentin, Matthias Nießner

Keywords Paper

active learning, semantic segmentation, deep learning, view consistency

0

0

0

0

1:01

16/11/2020

Do Explicit Alignments Robustly Improve Multilingual Encoders?

Shijie Wu, Mark Dredze

Keywords Paper

multilingual, unsupervised encoders, cross-lingual representation, contrastive objective

0

0

0

0

7:14

14/09/2020

Orthogonal Mixture of Hidden Markov Models

Negar Safinianaini, Camila P. E. de Souza, Henrik Boström, Jens Lagergren

Keywords Paper

hidden markov models, mixture models, mixture of hidden markov models, expectation maximization, orthogonality, regularization, penalty

0

0

0

0

14:43

02/02/2021

PGNet: Real-time Arbitrarily-Shaped Text Spotting with Point Gathering Network

Pengfei Wang, Chengquan Zhang, Fei Qi and
Shanshan Liu, Xiaoqiang Zhang, Pengyuan Lyu, Junyu Han, Jingtuo Liu, Errui Ding, Guangming Shi

Keywords Paper

0

0

0

0

18:06

04/07/2020

Effectively Aligning and Filtering Parallel Corpora under Sparse Data Conditions

Steinþór Steingrímsson, Hrafn Loftsson, Andy Way

Keywords Paper

Aligning Corpora, machine systems, data problem, alignment problem

0

0

0

0

11:47

02/02/2021

HR-Depth: High Resolution Self-Supervised Monocular Depth Estimation

Xiaoyang Lyu, Liang Liu, Mengmeng Wang and
Xin Kong, Lina Liu, Yong Liu, Xinxin Chen, Yi Yuan

Keywords Paper

0

0

0

0

12:10

04/07/2020

Generative Semantic Hashing Enhanced via Boltzmann Machines

Lin Zheng, Qinliang Su, Dinghan Shen, Changyou Chen

Keywords Paper

Generative Hashing, large-scale retrieval, training, Boltzmann Machines

0

0

0

0

11:26

19/08/2021

Focus on Interaction: A Novel Dynamic Graph Model for Joint Multiple Intent Detection and Slot Filling

Zeyuan Ding, Zhihao Yang, Hongfei Lin, Jian Wang

Keywords Paper

Natural Language Processing, Dialogue, Natural Language Processing

0

0

0

0

12:36

16/11/2020

Iterative Domain-Repaired Back-Translation

Hao-Ran Wei, Zhirui Zhang, Boxing Chen, Weihua Luo

Keywords Paper

domain-specific translation, domain adaptation, back-translation method, out-of-domain systems

0

0

0

0

11:35

06/12/2021

Joint Semantic Mining for Weakly Supervised RGB-D Salient Object Detection

Jingjing Li, Wei Ji, Qi Bi and
Cheng Yan, Miao Zhang, Yongri Piao, Huchuan Lu, Li cheng

Keywords Paper

vision

0

0

0

0

9:03

16/11/2020

From Zero to Hero: On the Limitations of Zero-Shot Language Transfer with Multilingual Transformers

Anne Lauscher, Vinit Ravishankar, Ivan Vulić, Goran Glavaš

Keywords Paper

zero-shot transfer, downstream transfer, resource-lean scenarios, pos tagging

0

0

0

0

11:45

16/11/2020

Incremental Processing in the Age of Non-Incremental Encoders: An Empirical Assessment of Bidirectional Models for Incremental NLU

Brielen Madureira, David Schlangen

Keywords Paper

nlp, interactive systems, language encoders, bidirectional lstms

0

0

0

0

10:04

19/08/2021

A Sequence-to-Set Network for Nested Named Entity Recognition

Zeqi Tan, Yongliang Shen, Shuai Zhang and
Weiming Lu, Yueting Zhuang

Keywords Paper

Natural Language Processing, Information Extraction, Named Entities

0

0

0

0

10:38

06/12/2021

Drop-DTW: Aligning Common Signal Between Sequences While Dropping Outliers

Mikita Dvornik, Isma Hadji, Konstantinos Derpanis and
Animesh Garg, Allan Jepson

Keywords Paper

representation learning

0

0

0

0

13:34

04/07/2020

Addressing Posterior Collapse with Mutual Information for Improved Variational Neural Machine Translation

Arya D. McCarthy, Xian Li, Jiatao Gu, Ning Dong

Keywords Paper

Variational Translation, posterior collapse, auxiliary task, uncertainty

0

0

0

0

11:00

19/08/2021

CIMON: Towards High-quality Hash Codes

Xiao Luo, Daqing Wu, Zeyu Ma and
Chong Chen, Minghua Deng, Jinwen Ma, Zhongming Jin, Jianqiang Huang, Xian-Sheng Hua

Keywords Paper

Computer Vision, Recognition, Information Retrieval

0

0

0

0

14:20

02/02/2021

Non-Autoregressive Coarse-to-Fine Video Captioning

Bang Yang, Yuexian Zou, Fenglin Liu, Can Zhang

Keywords Paper

0

0

0

0

18:21

07/09/2020

Making L-BFGS Work with Industrial-Strength Nets

Abhay Yadav, Tom Goldstein, David Jacobs

Keywords Paper

deep network training, efficient training, second-order optimization

0

0

0

0

7:55

16/11/2020

If beam search is the answer, what was the question?

Clara Meister, Ryan Cotterell, Tim Vieira

Keywords Paper

language tasks, beam search, decoding, maximum decoding

0

0

0

0

12:18

26/04/2020

Residual Energy-Based Models for Text Generation

Yuntian Deng, Anton Bakhtin, Myle Ott and
Arthur Szlam, Marc'Aurelio Ranzato

Keywords Paper

energy-based models, text generation

0

0

0

0

4:59

16/11/2020

LAReQA: Language-Agnostic Answer Retrieval from a Multilingual Pool

Uma Roy, Noah Constant, Rami Al-Rfou and
Aditya Barua, Aaron Phillips, Yinfei Yang

Keywords Paper

language-agnostic retrieval, cross-lingual tasks, cross-lingual retrieval, alignment

0

0

0

0

12:07

04/07/2020

Better Document-level Machine Translation with Bayes' Rule

Lei Yu, Laurent Sartran, Wojciech Stokowiec and
Wang Ling, Lingpeng Kong, Phil Blunsom, Chris Dyer

Keywords Paper

Document-level Translation, inference, Bayes Rule, document models

0

0

0

0

10:57

06/12/2020

Incorporating BERT into Parallel Sequence Decoding with Adapters

Junliang Guo, Zhirui Zhang, Linli Xu and
Hao-Ran Wei, Boxing Chen, Enhong Chen

Keywords Paper

0

0

0

0

3:17

02/02/2021

FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding

Yuwei Fang, Shuohang Wang, Zhe Gan and
Siqi Sun, Jingjing Liu

Keywords Paper

0

0

0

0

17:39

02/02/2021

DecAug: Out-of-Distribution Generalization via Decomposed Feature Representation and Semantic Augmentation

Haoyue Bai, Rui Sun, Lanqing Hong and
Fengwei Zhou, Nanyang Ye, Han-Jia Ye, S.-H. Gary Chan, Zhenguo Li

Keywords Paper

0

0

0

0

15:59

04/07/2020

Unsupervised Multimodal Neural Machine Translation with Pseudo Visual Pivoting

Po-Yao Huang, Junjie Hu, Xiaojun Chang, Alexander Hauptmann

Keywords Paper

Unsupervised Translation, Unsupervised MT, MT, alignment

0

0

0

0

12:17

16/11/2020

Iterative Refinement in the Continuous Space for Non-Autoregressive Neural Machine Translation

Jason Lee, Raphael Shu, Kyunghyun Cho

Keywords Paper

non-autoregressive translation, translation, machine translation, inference procedure

0

0

0

0

11:44