Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine Translation

03/05/2021

Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine Translation

Jungo Kasai, Nikolaos Pappas, Hao Peng, James Cross, Noah Smith

Keywords: Machine Translation, Sequence Modeling, Natural Language Processing

Abstract Paper Similar Papers

Abstract: Much recent effort has been invested in non-autoregressive neural machine translation, which appears to be an efficient alternative to state-of-the-art autoregressive machine translation on modern GPUs. In contrast to the latter, where generation is sequential, the former allows generation to be parallelized across target token positions. Some of the latest non-autoregressive models have achieved impressive translation quality-speed tradeoffs compared to autoregressive baselines. In this work, we reexamine this tradeoff and argue that autoregressive baselines can be substantially sped up without loss in accuracy. Specifically, we study autoregressive models with encoders and decoders of varied depths. Our extensive experiments show that given a sufficiently deep encoder, a single-layer autoregressive decoder can substantially outperform strong non-autoregressive models with comparable inference speed. We show that the speed disadvantage for autoregressive baselines compared to non-autoregressive methods has been overestimated in three aspects: suboptimal layer allocation, insufficient speed measurement, and lack of knowledge distillation. Our results establish a new protocol for future research toward fast, accurate machine translation. Our code is available at https://github.com/jungokasai/deep-shallow.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICLR 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

19/04/2021

Enriching non-autoregressive transformer with syntactic and semantic structures for neural machine translation

Ye Liu, Yao Wan, Jianguo Zhang and
Wenting Zhao, Philip Yu

Keywords Paper

0

0

0

0

10:18

12/07/2020

Variable-Bitrate Neural Compression via Bayesian Arithmetic Coding

Yibo Yang, Robert Bamler, Stephan Mandt

Keywords Paper

Deep Learning - General

0

0

0

0

15:08

02/02/2021

Guiding Non-Autoregressive Neural Machine Translation Decoding with Reordering Information

Qiu Ran, Yankai Lin, Peng Li, Jie Zhou

Keywords Paper

0

0

0

0

14:24

06/12/2020

Cascaded Text Generation with Markov Transformers

Yuntian Deng, Alexander Rush

Keywords Paper

0

0

0

0

3:30

02/02/2021

Lexically Constrained Neural Machine Translation with Explicit Alignment Guidance

Guanhua Chen, Yun Chen, Victor O.K. Li

Keywords Paper

0

0

0

0

15:33

19/08/2021

Improving Stylized Neural Machine Translation with Iterative Dual Knowledge Transfer

Xuanxuan Wu, Jian Liu, Xinjie Li and
Jinan Xu, Yufeng Chen, Yujie Zhang, Hui Huang

Keywords Paper

Natural Language Processing, Machine Translation, Natural Language Generation

0

0

0

0

12:35

26/04/2020

A Probabilistic Formulation of Unsupervised Text Style Transfer

Junxian He, Xinyi Wang, Graham Neubig, Taylor Berg-Kirkpatrick

Keywords Paper

unsupervised text style transfer, deep latent sequence model

0

0

0

0

5:02

14/09/2020

Squeezing Correlated Neurons for Resource-Efficient Deep Neural Networks

Elbruz Ozen, Alex Orailoglu

Keywords Paper

deep learning, information redundancy, pruning

0

0

0

0

14:48

02/02/2021

Finding Sparse Structures for Domain Specific Neural Machine Translation

Jianze Liang, Chengqi Zhao, Mingxuan Wang and
Xipeng Qiu, Lei Li

Keywords Paper

0

0

0

0

14:45

06/12/2020

FleXOR: Trainable Fractional Quantization

Dongsoo Lee, Se Jung Kwon, Byeongwook Kim and
Yongkweon Jeon, Baeseong Park, Jeongin Yun

Keywords Paper

0

0

0

0

3:12

01/07/2020

Improving Autoregressive NMT with Non-Autoregressive Model

Long Zhou, Jiajun Zhang, Chengqing Zong

Keywords Paper

0

0

0

0

8:21

04/07/2020

Variational Neural Machine Translation with Normalizing Flows

Hendra Setiawan, Matthias Sperber, Udhyakumar Nallasamy, Matthias Paulik

Keywords Paper

Variational Translation, Variational VNMT, Variational, generation translations

0

0

0

0

7:09

06/12/2021

GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training

Chen Zhu, Renkun Ni, Zheng Xu and
Kezhi Kong, W. Ronny Huang, Tom Goldstein

Keywords Paper

deep learning, transformers, vision

0

0

0

0

13:17

06/12/2021

Redesigning the Transformer Architecture with Insights from Multi-particle Dynamical Systems

Subhabrata Dutta, Tanya Gautam, Soumen Chakrabarti, Tanmoy Chakraborty

Keywords Paper

deep learning, transformers

0

0

0

0

11:54

30/11/2020

Fast and Differentiable Message Passing on Pairwise Markov Random Fields

Zhiwei Xu, Thalaiyasingam Ajanthan, Richard Hartley

Keywords Paper

0

0

0

0

9:41

04/07/2020

Learning to Recover from Multi-Modality Errors for Non-Autoregressive Neural Machine Translation

Qiu Ran, Yankai Lin, Peng Li, Jie Zhou

Keywords Paper

Non-Autoregressive Translation, Non-Autoregressive , inference process, multi-modality problem

0

0

0

0

8:34

15/06/2020

Blended, precise semantic program embeddings

Ke Wang, Zhendong Su

Keywords Paper

Static and Dynamic Program Features, Attention Network, Semantic Program Embedding

0

0

0

0

15:39

18/07/2021

You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling

Zhanpeng Zeng, Yunyang Xiong, Sathya Ravi and
Shailesh Acharya, Glenn Fung, Vikas Singh

Keywords Paper

Applications, Natural Language Processing

0

0

0

0

5:16

18/07/2021

Communication-Efficient Distributed Optimization with Quantized Preconditioners

Foivos Alimisis, Peter Davies, Dan Alistarh

Keywords Paper

Optimization, Distributed and Parallel Optimization

0

0

0

0

5:33

14/06/2020

F-BRS: Rethinking Backpropagating Refinement for Interactive Segmentation

Konstantin Sofiiuk, Ilia Petrov, Olga Barinova, Anton Konushin

Keywords Paper

interactive segmentation, interactive, instance segmentation, segmentation, backpropagating refinement, refinement

0

0

0

0

4:56

08/12/2020

Meet Changes with Constancy: Learning Invariance in Multi-Source Translation

Jianfeng Liu, Ling Luo, Xiang Ao and
Yan Song, Haoran Xu, Jian Ye

Keywords Paper

0

0

0

0

13:35

05/04/2021

CODE: Compiler-based Neuron-aware Ensemble training

Ettore M. G. Trainiti, Thanapon Noraset, David Demeter and
Doug Downey, Simone Campanoni Campanoni

Keywords Paper

Neuroscience and Cognitive Science -> Memory, Neuroscience and Cognitive Science -> Plasticity and Adaptation

0

0

0

0

18:48

02/02/2021

TRQ: Ternary Neural Networks With Residual Quantization

Yue Li, Wenrui Ding, Chunlei Liu and
Baochang Zhang, Guodong Guo

Keywords Paper

0

0

0

0

15:21

26/08/2020

Improving Maximum Likelihood Training for Text Generation with Density Ratio Estimation

Yuxuan Song, Ning Miao, Hao Zhou and
Lantao Yu, Mingxuan Wang, Lei Li

Keywords Paper

0

0

0

0

12:32

02/02/2021

An Efficient Transformer Decoder with Compressed Sub-layers

Yanyang Li, Ye Lin, Tong Xiao, Jingbo Zhu

Keywords Paper

0

0

0

0

19:56

06/12/2021

Learning and Generalization in RNNs

Abhishek Panigrahi, Navin Goyal

Keywords Paper

deep learning, optimization

0

0

0

0

15:51

02/02/2021

Adversarial Turing Patterns from Cellular Automata

Nurislam Tursynbek, Ilya Vilkoviskiy, Maria Sindeeva, Ivan Oseledets

Keywords Paper

0

0

0

0

14:50

08/12/2020

Emergent Communication Pretraining for Few-Shot Machine Translation

Yaoyiran Li, Edoardo Maria Ponti, Ivan Vulić, Anna Korhonen

Keywords Paper

0

0

0

0

14:42

02/02/2021

Any-Precision Deep Neural Networks

Haichao Yu, Haoxiang Li, Humphrey Shi and
Thomas S. Huang, Gang Hua

Keywords Paper

0

0

0

0

14:26

06/12/2020

Approximate Cross-Validation with Low-Rank Data in High Dimensions

Will Stephenson, Madeleine Udell, Tamara Broderick

Keywords Paper

0

0

0

0

3:02

26/04/2020

Chameleon: Adaptive Code Optimization for Expedited Deep Neural Network Compilation

Byung Hoon Ahn, Prannoy Pilligundla, Amir Yazdanbakhsh, Hadi Esmaeilzadeh

Keywords Paper

Reinforcement Learning, Learning to Optimize, Combinatorial Optimization, Compilers, Code Optimization, Neural Networks, ML for Systems, Learning for Systems

0

0

0

0

4:55

11/08/2020

A computational approach to packet classification

Alon Rashelbach, Ori Rottenstreich, Mark Silberstein

Keywords Paper

Neural Networks, Virtual Switches, Packet Classification

0

0

0

0

16:56

04/07/2020

Multi-Task Neural Model for Agglutinative Language Translation

Yirong Pan, Xiao Li, Yating Yang, Rui Dong

Keywords Paper

Agglutinative Translation, agglutinative task, bi-directional translation, agglutinative stemming

0

0

0

0

11:49

02/02/2021

SARG: A Novel Semi Autoregressive Generator for Multi-turn Incomplete Utterance Restoration

Mengzuo Huang, Feng Li, Wuhe Zou, Weidong Zhang

Keywords Paper

0

0

0

0

14:50

06/12/2021

Differentiable Synthesis of Program Architectures

Guofeng Cui, He Zhu

Keywords Paper

optimization, machine learning, interpretability

0

0

0

0

13:31

03/05/2021

Fast and Complete: Enabling Complete Neural Network Verification with Rapid and Massively Parallel Incomplete Verifiers

Kaidi Xu, Huan Zhang, Shiqi Wang and
Yihan Wang, Suman Jana, Xue Lin, Cho-Jui Hsieh

Keywords Paper

branch and bound, neural network verification

0

0

0

0

5:08

03/05/2021

Meta Back-Translation

Hieu Pham, Xinyi Wang, Yiming Yang, Graham Neubig

Keywords Paper

back translation, machine translation, meta learning

0

0

0

0

5:07

30/11/2020

Lossless Image Compression Using a Multi-Scale Progressive Statistical Model

Honglei Zhang, Francesco Cricri, Hamed R. Tavakoli and
Nannan Zou, Emre Aksu, Miska M. Hannuksela

Keywords Paper

0

0

0

0

9:33

26/04/2020

Mirror-Generative Neural Machine Translation

Zaixiang Zheng, Hao Zhou, Shujian Huang and
Lei Li, Xin-Yu Dai, Jiajun Chen

Keywords Paper

neural machine translation, generative model, mirror

0

0

0

0

14:18

03/05/2021

Share or Not? Learning to Schedule Language-Specific Capacity for Multilingual Translation

Biao Zhang, Ankur Bapna, Rico Sennrich, Orhan Firat

Keywords Paper

multilingual transformer, multilingual translation, language-specific modeling, conditional computation

0

0

0

0

15:04