Multi-split reversible transformers can enhance neural machine translation

19/04/2021

Multi-split reversible transformers can enhance neural machine translation

Yuekai Zhao, Shuchang Zhou, Zhihua Zhang

Keywords:

Abstract Paper Similar Papers

Abstract: Large-scale transformers have been shown the state-of-the-art on neural machine translation. However, training these increasingly wider and deeper models could be tremendously memory intensive. We reduce the memory burden by employing the idea of reversible networks that a layer’s input can be reconstructed from its output. We design three types of multi-split based reversible transformers. We also devise a corresponding backpropagation algorithm, which does not need to store activations for most layers. Furthermore, we present two fine-tuning techniques: splits shuffle and self ensemble, to boost translation accuracy. Specifically, our best models surpass the vanilla transformer by at least 1.4 BLEU points in three datasets. Our large-scale reversible models achieve 30.0 BLEU in WMT’14 En-De and 43.5 BLEU in WMT’14 En-Fr, beating several very strong baselines with less than half of the training memory.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at EACL 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

04/07/2020

HAT: Hardware-Aware Transformers for Efficient Natural Language Processing

Hanrui Wang, Zhanghao Wu, Zhijian Liu and
Han Cai, Ligeng Zhu, Chuang Gan, Song Han

Keywords Paper

Natural Processing, Natural tasks, low-latency inference, machine tasks

0

0

0

0

11:26

03/05/2021

Random Feature Attention

Hao Peng, Nikolaos Pappas, Dani Yogatama and
Roy Schwartz, Noah Smith, Lingpeng Kong

Keywords Paper

machine translation, transformers, language modeling, Attention

0

0

0

0

10:20

06/12/2021

Searching for Efficient Transformers for Language Modeling

David So, Wojciech Mańke, Hanxiao Liu and
Zihang Dai, Noam Shazeer, Quoc V Le

Keywords Paper

transformers, language

0

0

0

0

13:29

26/04/2020

Poly-encoders: Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring

Samuel Humeau, Kurt Shuster, Marie-Anne Lachaux, Jason Weston

Keywords Paper

0

0

0

0

5:22

12/07/2020

Stabilizing Transformers for Reinforcement Learning

Emilio Parisotto, Francis Song, Jack Rae and
Razvan Pascanu, Caglar Gulcehre, Siddhant Jayakumar, Max Jaderberg, Raphael Lopez Kaufman, Aidan Clark, Seb Noury, Matthew Botvinick, Nicolas Heess, Raia Hadsell

Keywords Paper

Reinforcement Learning - Deep RL

0

0

0

0

14:20

26/04/2020

Reformer: The Efficient Transformer

Nikita Kitaev, Lukasz Kaiser, Anselm Levskaya

Keywords Paper

attention, locality sensitive hashing, reversible layers

0

0

0

0

14:23

06/12/2020

Data Diversification: A Simple Strategy For Neural Machine Translation

Xuan-Phi Nguyen, Shafiq Joty, Kui Wu, Ai Ti Aw

Keywords Paper

0

0

0

0

3:23

12/07/2020

Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention

Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, Francois Fleuret

Keywords Paper

Deep Learning - Algorithms

0

0

0

0

13:32

04/07/2020

Multiscale Collaborative Deep Models for Neural Machine Translation

Xiangpeng Wei, Heng Yu, Yue Hu and
Yue Zhang, Rongxiang Weng, Weihua Luo

Keywords Paper

Neural Translation, training models, IWSLT tasks, WMT14 task

0

0

0

0

10:42

03/05/2021

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu and
Dehao Chen, Orhan Firat, Yanping Huang, Maxim Krikun, Noam Shazeer, Zhifeng Chen

Keywords Paper

0

0

0

0

5:07

07/09/2020

Paying more Attention to Snapshots of Iterative Pruning: Improving Model Compression via Ensemble Distillation

Duong Le, Nhan Vo, Nam Thoai

Keywords Paper

network pruning, knowledge distillation, ensemble learning

0

0

0

0

8:30

11/08/2020

A computational approach to packet classification

Alon Rashelbach, Ori Rottenstreich, Mark Silberstein

Keywords Paper

Neural Networks, Virtual Switches, Packet Classification

0

0

0

0

16:56

06/12/2021

Adder Attention for Vision Transformer

Han Shu, Jiahao Wang, Hanting Chen and
Lin Li, Yujiu Yang, Yunhe Wang

Keywords Paper

deep learning, transformers, vision

0

0

0

0

4:03

18/07/2021

Learn-to-Share: A Hardware-friendly Transfer Learning Framework Exploiting Computation and Parameter Sharing

Cheng Fu, Hanxian Huang, Xinyun Chen and
Yuandong Tian, Jishen Zhao

Keywords Paper

Applications, Natural Language Processing

0

0

0

0

16:03

03/05/2021

Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch

Aojun Zhou, Yukun Ma, Junnan Zhu and
Jianbo Liu, Zhijie Zhang, Kun Yuan, Wenxiu Sun, Hongsheng Li

Keywords Paper

sparsity, efficient training and inference.

0

0

0

0

5:09

16/11/2020

Vector-Vector-Matrix Architecture: A Novel Hardware-Aware Framework for Low-Latency Inference in NLP Applications

Matthew Khoury, Rumen Dangovski, Longwu Ou and
Preslav Nakov, Yichen Shen, Li Jing

Keywords Paper

natural applications, neural translation, neural nmt, neural

0

0

0

0

11:54

06/12/2021

TransGAN: Two Pure Transformers Can Make One Strong GAN, and That Can Scale Up

Yifan Jiang, Shiyu Chang, Zhangyang Wang

Keywords Paper

machine learning, transformers, vision, generative model

0

0

0

0

3:44

06/12/2020

ShiftAddNet: A Hardware-Inspired Deep Network

Haoran You, Xiaohan Chen, Yongan Zhang and
Chaojian Li, Sicheng Li, Zihao Liu, Zhangyang Wang, Yingyan Lin

Keywords Paper

0

0

0

0

3:25

06/12/2021

Container: Context Aggregation Networks

peng gao, Jiasen Lu, hongsheng Li and
Roozbeh Mottaghi, Aniruddha Kembhavi

Keywords Paper

deep learning, self-supervised learning, transformers, vision, language

0

0

0

0

8:50

06/12/2021

Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding

Shengjie Luo, Shanda Li, Tianle Cai and
Di He, Dinglan Peng, Shuxin Zheng, Guolin Ke, Liwei Wang, Tie-Yan Liu

Keywords Paper

optimization, machine learning, transformers, vision

0

0

0

0

10:07

16/11/2020

Shallow-to-Deep Training for Neural Machine Translation

Bei Li, Ziyang Wang, Hui Liu and
Yufan Jiang, Quan Du, Tong Xiao, Huizhen Wang, Jingbo Zhu

Keywords Paper

neural systems, wmt tasks, deep encoders, deep encoder

0

0

0

0

12:13

02/02/2021

Learning Light-Weight Translation Models from Deep Transformer

Bei Li, Ziyang Wang, Hui Liu and
Quan Du, Tong Xiao, Chunliang Zhang, Jingbo Zhu

Keywords Paper

0

0

0

0

17:36

05/04/2021

Pipelined Backpropagation at Scale: Training Large Models without Batches

Atli Kosson, Vitaliy Chiley, Abhi Venigalla and
Joel Hestness, Urs Koster

Keywords Paper

0

0

0

0

4:14

05/04/2021

Pipelined Backpropagation at Scale: Training Large Models without Batches

Atli Kosson, Vitaliy Chiley, Abhi Venigalla and
Joel Hestness, Urs Koster

Keywords Paper

0

0

0

0

18:00

08/12/2020

E.T.: Entity-Transformers. Coreference augmented Neural Language Model for richer mention representations via Entity-Transformer blocks

Nikolaos Stylianou, Ioannis Vlahavas

Keywords Paper

0

0

0

0

8:49

02/02/2021

DPFPS: Dynamic and Progressive Filter Pruning for Compressing Convolutional Neural Networks from Scratch

Xiaofeng Ruan, Yufan Liu, Bing Li and
Chunfeng Yuan, Weiming Hu

Keywords Paper

0

0

0

0

14:38

26/04/2020

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Zhenzhong Lan, Mingda Chen, Sebastian Goodman and
Kevin Gimpel, Piyush Sharma, Radu Soricut

Keywords Paper

Natural Language Processing, BERT, Representation Learning

0

0

0

0

4:59

05/04/2021

PipeMare: Asynchronous Pipeline Parallel DNN Training

Bowen Yang, Jian Zhang, Jonathan Li and
Christopher Re, Christopher Aberger, Christopher De Sa

Keywords Paper

0

0

0

0

16:57

06/12/2020

O(n) Connections are Expressive Enough: Universal Approximability of Sparse Transformers

Chulhee Yun, Yin-Wen Chang, Srinadh Bhojanapalli and
Ankit Singh Rawat, Sashank Reddi, Sanjiv Kumar

Keywords Paper

0

0

0

0

3:23

26/04/2020

Chameleon: Adaptive Code Optimization for Expedited Deep Neural Network Compilation

Byung Hoon Ahn, Prannoy Pilligundla, Amir Yazdanbakhsh, Hadi Esmaeilzadeh

Keywords Paper

Reinforcement Learning, Learning to Optimize, Combinatorial Optimization, Compilers, Code Optimization, Neural Networks, ML for Systems, Learning for Systems

0

0

0

0

4:55

26/04/2020

Reducing Transformer Depth on Demand with Structured Dropout

Angela Fan, Edouard Grave, Armand Joulin

Keywords Paper

reduction, regularization, pruning, dropout, transformer

0

0

0

0

5:01

06/12/2021

Chasing Sparsity in Vision Transformers: An End-to-End Exploration

Tianlong Chen, Yu Cheng, Zhe Gan and
Lu Yuan, Lei Zhang, Zhangyang Wang

Keywords Paper

reinforcement learning and planning, transformers

0

0

0

0

11:29

03/05/2021

HyperGrid Transformers: Towards A Single Model for Multiple Tasks

Yi Tay, Zhe Zhao, Dara Bahri and
Donald Metzler, DA-CHENG Juan

Keywords Paper

Transformers, Multi-Task Learning

0

0

0

0

5:14

06/12/2021

XCiT: Cross-Covariance Image Transformers

Alaaeldin Ali, Hugo Touvron, Mathilde Caron and
Piotr Bojanowski, Matthijs Douze, Armand Joulin, Ivan Laptev, Natalia Neverova, Gabriel Synnaeve, Jakob Verbeek, Herve Jegou

Keywords Paper

deep learning, machine learning, transformers, vision, language

0

0

0

0

13:15

06/12/2020

MomentumRNN: Integrating Momentum into Recurrent Neural Networks

Tan Nguyen, Richard Baraniuk, Andrea Bertozzi and
Stanley Osher, Bao Wang

Keywords Paper

0

0

0

0

3:09

02/02/2021

Provable Benefits of Overparameterization in Model Compression: From Double Descent to Pruning Neural Networks

Xiangyu Chang, Yingcong Li, Samet Oymak, Christos Thrampoulidis

Keywords Paper

0

0

0

0

18:14

02/02/2021

Joint-Label Learning by Dual Augmentation for Time Series Classification

Qianli Ma, Zhenjing Zheng, Jiawei Zheng and
Sen Li, Wanqing Zhuang, Garrison W. Cottrell

Keywords Paper

0

0

0

0

15:59

06/12/2021

Long-Short Transformer: Efficient Transformers for Language and Vision

Chen Zhu, Wei Ping, Chaowei Xiao and
Mohammad Shoeybi, Tom Goldstein, Anima Anandkumar, Bryan Catanzaro

Keywords Paper

machine learning, transformers

0

0

0

0

11:44

03/05/2021

A Block Minifloat Representation for Training Deep Neural Networks

Sean Fox, Seyedramin Rasoulinezhad, Julian Faraone and
david boland, Philip Leong

Keywords Paper

0

0

0

0

5:15

06/12/2021

Augmented Shortcuts for Vision Transformers

Yehui Tang, Kai Han, Chang Xu and
An Xiao, Yiping Deng, Chao Xu, Yunhe Wang

Keywords Paper

transformers, vision

0

0

0

0

7:28