An Efficient Transformer Decoder with Compressed Sub-layers

02/02/2021

An Efficient Transformer Decoder with Compressed Sub-layers

Yanyang Li, Ye Lin, Tong Xiao, Jingbo Zhu

Keywords:

Abstract Paper Similar Papers

Abstract: The large attention-based encoder-decoder network (Transformer) has become prevailing recently due to its effectiveness. But the high computation complexity of its decoder raises the inefficiency issue. By examining the mathematic formulation of the decoder, we show that under some mild conditions, the architecture could be simplified by compressing its sub-layers, the basic building block of Transformer, and achieves a higher parallelism. We thereby propose Compressed Attention Network, whose decoder layer consists of only one sub-layer instead of three. Extensive experiments on 14 WMT machine translation tasks show that our model is 1.42x faster with performance on par with a strong baseline. This strong baseline is already 2x faster than the widely used standard baseline without loss in performance.

The video of this talk cannot be embedded. You can watch it here:

https://slideslive.com/38948349

(Link will open in new window)

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at AAAI 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

03/05/2021

Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine Translation

Jungo Kasai, Nikolaos Pappas, Hao Peng and
James Cross, Noah Smith

Keywords Paper

Machine Translation, Sequence Modeling, Natural Language Processing

0

0

0

0

5:04

06/12/2021

Long-Short Transformer: Efficient Transformers for Language and Vision

Chen Zhu, Wei Ping, Chaowei Xiao and
Mohammad Shoeybi, Tom Goldstein, Anima Anandkumar, Bryan Catanzaro

Keywords Paper

machine learning, transformers

0

0

0

0

11:44

18/07/2021

Multiplying Matrices Without Multiplying

Davis Blalock, John Guttag

Keywords Paper

Optimization, Convex Optimization, Algorithms, Sparsity and Compressed Sensing; Applications, Information Retrieval; Applications, Signal Processing, Algorithms, Others

0

0

0

0

5:27

15/06/2020

NVTraverse: In NVRAM data structures, the destination is more important than the journey

Michal Friedman, Naama Ben-David, Yuanhao Wei and
Guy E. Blelloch, Erez Petrank

Keywords Paper

Non-blocking, Lock-free, Concurrent Data Structures, Non-volatile Memory

0

1

0

1

16:56

11/08/2020

A computational approach to packet classification

Alon Rashelbach, Ori Rottenstreich, Mark Silberstein

Keywords Paper

Neural Networks, Virtual Switches, Packet Classification

0

0

0

0

16:56

05/04/2021

TT-Rec: Tensor Train Compression for Deep Learning Recommendation Models

Chunxing Yin, Bilge Acun, Carole-Jean Wu, Xing Liu

Keywords Paper

0

0

0

0

5:15

05/04/2021

TT-Rec: Tensor Train Compression for Deep Learning Recommendation Models

Chunxing Yin, Bilge Acun, Carole-Jean Wu, Xing Liu

Keywords Paper

0

0

0

0

23:05

19/08/2021

DEHB: Evolutionary Hyberband for Scalable, Robust and Efficient Hyperparameter Optimization

Noor Awad, Neeratyoy Mallik, Frank Hutter

Keywords Paper

Machine Learning, Evolutionary Learning

0

0

0

0

15:09

15/11/2020

Dynamic Dispatch of Context-Sensitive Optimizations

Gabriel Poesia, Fernando Magno Quintão Pereira

Keywords Paper

Dynamic dispatch, Compiler, Context-sensitive optimization

0

0

0

0

9:10

26/04/2020

Reducing Transformer Depth on Demand with Structured Dropout

Angela Fan, Edouard Grave, Armand Joulin

Keywords Paper

reduction, regularization, pruning, dropout, transformer

0

0

0

0

5:01

12/08/2020

Symbolic execution with SymCC: Don't interpret, compile!

Sebastian Poeplau, Aurélien Francillon

Keywords Paper

0

0

0

0

11:08

06/12/2020

BanditPAM: Almost Linear Time k-Medoids Clustering via Multi-Armed Bandits

Mo Tiwari, Martin Zhang, James J Mayclin and
Sebastian Thrun, Chris Piech, Ilan Shomorony

Keywords Paper

0

0

0

0

3:16

06/12/2020

Fast Transformers with Clustered Attention

Apoorv Vyas, Angelos Katharopoulos, François Fleuret

Keywords Paper

0

0

0

0

3:22

19/04/2021

Multi-split reversible transformers can enhance neural machine translation

Yuekai Zhao, Shuchang Zhou, Zhihua Zhang

Keywords Paper

0

0

0

0

12:00

06/12/2020

GPU-Accelerated Primal Learning for Extremely Fast Large-Scale Classification

John Halloran, David M Rocke

Keywords Paper

0

0

0

0

3:33

15/11/2020

Assertion-Based Optimization of Quantum Programs

Thomas Häner, Torsten Hoefler, Matthias Troyer

Keywords Paper

quantum circuit optimization, quantum computing

0

0

0

0

15:22

06/12/2021

GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training

Chen Zhu, Renkun Ni, Zheng Xu and
Kezhi Kong, W. Ronny Huang, Tom Goldstein

Keywords Paper

deep learning, transformers, vision

0

0

0

0

13:17

14/09/2020

Squeezing Correlated Neurons for Resource-Efficient Deep Neural Networks

Elbruz Ozen, Alex Orailoglu

Keywords Paper

deep learning, information redundancy, pruning

0

0

0

0

14:48

06/12/2020

Approximate Cross-Validation with Low-Rank Data in High Dimensions

Will Stephenson, Madeleine Udell, Tamara Broderick

Keywords Paper

0

0

0

0

3:02

03/05/2021

A Block Minifloat Representation for Training Deep Neural Networks

Sean Fox, Seyedramin Rasoulinezhad, Julian Faraone and
david boland, Philip Leong

Keywords Paper

0

0

0

0

5:15

03/05/2021

NAS-Bench-ASR: Reproducible Neural Architecture Search for Speech Recognition

Abhinav Mehrotra, Alberto Gil Couto Pimentel Ramos, Sourav Bhattacharya and
Łukasz Dudziak, Ravichander Vipperla, Thomas C Chau, Mohamed Abdelfattah, Samin Ishtiaq, Nic Lane

Keywords Paper

Benchmark, NAS, ASR

0

0

0

0

4:50

18/07/2021

A Riemannian Block Coordinate Descent Method for Computing the Projection Robust Wasserstein Distance

Minhui Huang, Shiqian Ma, Lifeng Lai

Keywords Paper

Algorithms, Optimal Transport

0

0

0

1

5:14

12/07/2020

Stabilizing Transformers for Reinforcement Learning

Emilio Parisotto, Francis Song, Jack Rae and
Razvan Pascanu, Caglar Gulcehre, Siddhant Jayakumar, Max Jaderberg, Raphael Lopez Kaufman, Aidan Clark, Seb Noury, Matthew Botvinick, Nicolas Heess, Raia Hadsell

Keywords Paper

Reinforcement Learning - Deep RL

0

0

0

0

14:20

06/12/2020

ShiftAddNet: A Hardware-Inspired Deep Network

Haoran You, Xiaohan Chen, Yongan Zhang and
Chaojian Li, Sicheng Li, Zihao Liu, Zhangyang Wang, Yingyan Lin

Keywords Paper

0

0

0

0

3:25

05/04/2021

Scaling Polyhedral Neural Network Verification on GPUs

Christoph Müller , François Serre, Gagandeep Singh and
Markus Püschel, Martin Vechev

Keywords Paper

0

0

0

0

3:37

05/04/2021

Scaling Polyhedral Neural Network Verification on GPUs

Christoph Müller , François Serre, Gagandeep Singh and
Markus Püschel, Martin Vechev

Keywords Paper

0

0

0

0

22:07

15/11/2020

Foundations of Empirical Memory Consistency Testing

Jake Kirkham, Tyler Sorensen, Esin Tureci, Margaret Martonosi

Keywords Paper

autotuning, conformance testing, memory consistency, GPUs, OpenCL

0

0

0

0

14:58

06/12/2021

Sparse is Enough in Scaling Transformers

Sebastian Jaszczur, Aakanksha Chowdhery, Afroz Mohiuddin and
Łukasz Kaiser, Wojciech Gajewski, Henryk Michalewski, Jonni Kanerva

Keywords Paper

machine learning, transformers

0

0

0

0

8:28

15/11/2020

Fast Linear Programming through Transprecision Computing on Small and Sparse Data

Tobias Grosser, Theodoros Theodoridis, Maximilian Falkenstein and
Arjun Pitchanathan, Michael Kruse, Manuel Rigger, Zhendong Su, Torsten Hoefler

Keywords Paper

Presburger Arithmetic, Transprecision, Linear Programming, Simplex

0

0

0

0

13:35

06/12/2020

Efficient Clustering Based On A Unified View Of $K$-means And Ratio-cut

Shenfei Pei, Feiping Nie, Rong Wang, Xuelong Li

Keywords Paper

0

0

0

0

3:16

06/12/2020

Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning

Woosuk Kwon, Gyeong-In Yu, Eunji Jeong, Byung-Gon Chun

Keywords Paper

0

0

1

0

3:23

26/04/2020

CLN2INV: Learning Loop Invariants with Continuous Logic Networks

Gabriel Ryan, Justin Wong, Jianan Yao and
Ronghui Gu, Suman Jana

Keywords Paper

loop invariants, deep learning, logic learning

0

0

0

0

5:12

06/12/2020

Big Bird: Transformers for Longer Sequences

Manzil Zaheer, Guru Guruganesh, Kumar Avinava Dubey and
Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed

Keywords Paper

0

0

0

0

3:17

15/06/2020

Learning fast and precise numerical analysis

Jingxuan He, Gagandeep Singh, Markus Püschel, Martin Vechev

Keywords Paper

Abstract interpretation, Performance optimization, Machine learning, Numerical domains

0

0

0

0

14:20

05/04/2021

Wavelet: Efficient DNN Training with Tick-Tock Scheduling

Guanhua Wang, Kehan Wang, Kenan Jiang and
XIANGJUN LI, Ion Stoica

Keywords Paper

0

0

0

0

5:22

05/04/2021

Wavelet: Efficient DNN Training with Tick-Tock Scheduling

Guanhua Wang, Kehan Wang, Kenan Jiang and
XIANGJUN LI, Ion Stoica

Keywords Paper

0

0

0

0

17:49

14/09/2020

Online Binary Incomplete Multi-view Clustering

Longqi Yang, Liangliang Zhang, Yuhua Tang

Keywords Paper

0

0

0

0

3:04

18/07/2021

Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution

zhaoyang zhang, Wenqi Shao, Jinwei Gu and
Xiaogang Wang, Ping Luo

Keywords Paper

Applications, Computer Vision

0

0

0

0

5:00

06/12/2021

Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label Text Classification

Jiong Zhang, Wei-Cheng Chang, Hsiang-Fu Yu, Inderjit Dhillon

Keywords Paper

machine learning, transformers

0

0

0

0

14:25

14/06/2020

ABCNet: Real-Time Scene Text Spotting With Adaptive Bezier-Curve Network

Yuliang Liu, Hao Chen, Chunhua Shen and
Tong He, Lianwen Jin, Liangwei Wang

Keywords Paper

bezier curve, scene text, end-to-end, detection, recognition, arbitrarily shaped, one stage, align, sampling, deep neural network

0

0

0

0

5:01