Balancing Cost and Benefit with Tied-Multi Transformers

01/07/2020

Balancing Cost and Benefit with Tied-Multi Transformers

Raj Dabre, Raphael Rubino, Atsushi Fujita

Keywords:

Abstract Paper Similar Papers

Abstract: We propose a novel procedure for training multiple Transformers with tied parameters which compresses multiple models into one enabling the dynamic choice of the number of encoder and decoder layers during decoding. In training an encoder-decoder model, typically, the output of the last layer of the N-layer encoder is fed to the M-layer decoder, and the output of the last decoder layer is used to compute loss. Instead, our method computes a single loss consisting of NxM losses, where each loss is computed from the output of one of the M decoder layers connected to one of the N encoder layers. Such a model subsumes NxM models with different number of encoder and decoder layers, and can be used for decoding with fewer than the maximum number of encoder and decoder layers. Given our flexible tied model, we also address to a-priori selection of the number of encoder and decoder layers for faster decoding, and explore recurrent stacking of layers and knowledge distillation for model compression. We present a cost-benefit analysis of applying the proposed approaches for neural machine translation and show that they reduce decoding costs while preserving translation quality.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ACL Workshops virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

18/07/2021

Thinking Like Transformers

Gail Weiss, Yoav Goldberg, Eran Yahav

Keywords Paper

Deep Learning, Others

0

0

0

0

5:15

03/05/2021

Parameter Efficient Multimodal Transformers for Video Representation Learning

Sangho Lee, Youngjae Yu, Gunhee Kim and
Thomas Breuel, Jan Kautz, Yale Song

Keywords Paper

Self-supervised learning, audio-visual representation learning, video representation learning

0

0

0

0

5:02

14/09/2020

Incremental training of a recurrent neural network exploiting a multi-scale dynamic memory

Antonio Carta, Alessandro Sperduti, Davide Bacciu

Keywords Paper

recurrent neural networks, linear dynamical systems, incremental learning

0

0

0

0

15:12

18/07/2021

Align, then memorise: the dynamics of learning with feedback alignment

Maria Refinetti, Stéphane d'Ascoli, Ruben Ohana, Sebastian Goldt

Keywords Paper

Theory, Models of Learning and Generalization

0

0

0

0

4:38

06/12/2021

Post-Training Quantization for Vision Transformer

Zhenhua Liu, Yunhe Wang, Kai Han and
Wei Zhang, Siwei Ma, Wen Gao

Keywords Paper

deep learning, transformers, vision

0

0

0

0

5:52

03/05/2021

Random Feature Attention

Hao Peng, Nikolaos Pappas, Dani Yogatama and
Roy Schwartz, Noah Smith, Lingpeng Kong

Keywords Paper

machine translation, transformers, language modeling, Attention

0

0

0

0

10:20

06/12/2020

Bayesian Bits: Unifying Quantization and Pruning

Mart van Baalen, Christos Louizos, Markus Nagel and
Rana Ali Amjad, Ying Wang, Tijmen Blankevoort, Max Welling

Keywords Paper

0

0

0

0

3:15

19/08/2021

Progressive Open-Domain Response Generation with Multiple Controllable Attributes

Haiqin Yang, Xiaoyuan Yao, Yiqun Duan and
Jianping Shen, Jie Zhong, Kun Zhang

Keywords Paper

Machine Learning, Learning Generative Models, Dialogue

0

0

0

0

14:43

30/11/2020

Horizontal Flipping Assisted Disentangled Feature Learning for Semi-Supervised Person Re-Identification

Gehan Hao, Yang Yang, Xue Zhou and
Guanan Wang, Zhen Lei

Keywords Paper

0

0

0

0

5:09

06/12/2021

Combiner: Full Attention Transformer with Sparse Computation Cost

Hongyu Ren, Hanjun Dai, Zihang Dai and
Mengjiao Yang, Jure Leskovec, Dale Schuurmans, Bo Dai

Keywords Paper

transformers

0

0

0

0

14:31

22/11/2021

Hardware-Aware Mixed-Precision Neural Networks using In-Train Quantization

Manoj Rohit Vemparala, Nael Fasfous, Lukas Frickenstein and
Alexander Frickenstein, Anmol Singh, Driton Salihu, Christian Unger, Naveen Shankar Nagaraja, WALTER STECHELE

Keywords Paper

Quantization, Inference, Neural Network Compression, Mixed Precision, Hardware Aware Networks

0

0

0

0

2:58

26/04/2020

Mixed Precision DNNs: All you need is a good parametrization

Stefan Uhlich, Lukas Mauch, Fabien Cardinaux and
Kazuki Yoshiyama, Javier Alonso Garcia, Stephen Tiedemann, Thomas Kemp, Akira Nakamura

Keywords Paper

Deep Neural Network Compression, Quantization, Straight through gradients

1

0

0

0

5:11

14/06/2020

Automatic Neural Network Compression by Sparsity-Quantization Joint Learning: A Constrained Optimization-Based Approach

Haichuan Yang, Shupeng Gui, Yuhao Zhu, Ji Liu

Keywords Paper

model compression, pruning, quantization, structured projection

0

0

0

0

1:01

03/05/2021

HyperGrid Transformers: Towards A Single Model for Multiple Tasks

Yi Tay, Zhe Zhao, Dara Bahri and
Donald Metzler, DA-CHENG Juan

Keywords Paper

Transformers, Multi-Task Learning

0

0

0

0

5:14

14/06/2020

APQ: Joint Search for Network Architecture, Pruning and Quantization Policy

Tianzhe Wang, Kuan Wang, Han Cai and
Ji Lin, Zhijian Liu, Hanrui Wang, Yujun Lin, Song Han

Keywords Paper

efficiency, model compression, joint design, neural architecture search, channel pruning, mixed-precision quantization

0

0

0

0

1:00

02/02/2021

Lexically Constrained Neural Machine Translation with Explicit Alignment Guidance

Guanhua Chen, Yun Chen, Victor O.K. Li

Keywords Paper

0

0

0

0

15:33

06/12/2021

A Framework to Learn with Interpretation

Jayneel Parekh, Pavlo Mozharovskyi, Florence d'Alché-Buc

Keywords Paper

deep learning, interpretability

0

0

0

0

14:05

06/12/2021

Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding

Shengjie Luo, Shanda Li, Tianle Cai and
Di He, Dinglan Peng, Shuxin Zheng, Guolin Ke, Liwei Wang, Tie-Yan Liu

Keywords Paper

optimization, machine learning, transformers, vision

0

0

0

0

10:07

13/04/2021

Neural function modules with sparse arguments: A dynamic approach to integrating information across layers

Alex Lamb, Anirudh Goyal, Agnieszka Słowik and
Michael Mozer, Philippe Beaudoin, Yoshua Bengio

Keywords Paper

0

0

0

0

3:01

06/12/2020

O(n) Connections are Expressive Enough: Universal Approximability of Sparse Transformers

Chulhee Yun, Yin-Wen Chang, Srinadh Bhojanapalli and
Ankit Singh Rawat, Sashank Reddi, Sanjiv Kumar

Keywords Paper

0

0

0

0

3:23

06/12/2021

ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias

Yufei Xu, Qiming ZHANG, Jing Zhang, Dacheng Tao

Keywords Paper

machine learning, transformers, vision

0

0

0

0

10:16

06/12/2021

Amortized Synthesis of Constrained Configurations Using a Differentiable Surrogate

Xingyuan Sun, Tianju Xue, Szymon Rusinkiewicz, Ryan Adams

Keywords Paper

deep learning, optimization

0

0

0

0

12:41

04/07/2020

Lipschitz Constrained Parameter Initialization for Deep Transformers

Hongfei Xu, Qiuhui Liu, Josef van Genabith and
Deyi Xiong, Jingyi Zhang

Keywords Paper

Lipschitz Initialization, Deep Transformers, Transformer model, layer normalization

1

0

0

0

4:54

18/07/2021

Learn-to-Share: A Hardware-friendly Transfer Learning Framework Exploiting Computation and Parameter Sharing

Cheng Fu, Hanxian Huang, Xinyun Chen and
Yuandong Tian, Jishen Zhao

Keywords Paper

Applications, Natural Language Processing

0

0

0

0

16:03

02/02/2021

DPFPS: Dynamic and Progressive Filter Pruning for Compressing Convolutional Neural Networks from Scratch

Xiaofeng Ruan, Yufan Liu, Bing Li and
Chunfeng Yuan, Weiming Hu

Keywords Paper

0

0

0

0

14:38

06/12/2021

Redesigning the Transformer Architecture with Insights from Multi-particle Dynamical Systems

Subhabrata Dutta, Tanya Gautam, Soumen Chakrabarti, Tanmoy Chakraborty

Keywords Paper

deep learning, transformers

0

0

0

0

11:54

22/11/2021

Adaptive End-to-End Budgeted Network Learning via Inverse Scale Space

Zuyuan Zhong, Chen Liu, Yanwei Fu

Keywords Paper

deep learning, network architecture, growing network, budgeted network learning, pruning

0

0

0

0

2:58

26/04/2020

Learned step size quantization

Steven K. Esser, Jeffrey L. McKinstry, Deepika Bablani and
Rathinakumar Appuswamy, Dharmendra S. Modha

Keywords Paper

deep learning, low precision, classification, quantization

0

0

0

0

4:40

04/07/2020

Improving Transformer Models by Reordering their Sublayers

Ofir Press, Noah A. Smith, Omer Levy

Keywords Paper

task-specific reorderings, Transformer Models, Multilayer networks, randomly transformers

1

1

0

0

12:29

06/12/2021

Only Train Once: A One-Shot Neural Network Training And Pruning Framework

Tianyi Chen, Bo Ji, Tianyu Ding and
Biyi Fang, Guanyi Wang, Zhihui Zhu, Luming Liang, Yixin Shi, Sheng Yi, Xiao Tu

Keywords Paper

deep learning, optimization, reinforcement learning and planning

0

0

0

0

12:53

12/07/2020

Variable-Bitrate Neural Compression via Bayesian Arithmetic Coding

Yibo Yang, Robert Bamler, Stephan Mandt

Keywords Paper

Deep Learning - General

0

0

0

0

15:08

04/07/2020

Multi-Domain Neural Machine Translation with Word-Level Adaptive Layer-wise Domain Mixing

Haoming Jiang, Chen Liang, Chong Wang, Tuo Zhao

Keywords Paper

knowledge transfer, domain sharing, NMT tasks, Multi-Domain Translation

0

0

0

0

11:21

18/07/2021

OmniNet: Omnidirectional Representations from Transformers

Yi Tay, Mostafa Dehghani, Vamsi Aribandi and
Jai Gupta, Philip Pham, Zhen Qin, Dara Bahri, Da-Cheng Juan, Don Metzler

Keywords Paper

Deep Learning, Predictive Models, Algorithms, Representation Learning; Neuroscience and Cognitive Science; Neuroscience and Cognitive Science, Problem Solvin, Deep Learning, Architectures

0

0

0

0

17:00

06/12/2020

FleXOR: Trainable Fractional Quantization

Dongsoo Lee, Se Jung Kwon, Byeongwook Kim and
Yongkweon Jeon, Baeseong Park, Jeongin Yun

Keywords Paper

0

0

0

0

3:12

20/08/2020

Sparcl: A Language for Partially-Invertible Computation

Kazutaka Matsuda, Meng Wang

Keywords Paper

linear types, reversible computation

0

0

0

0

14:33

19/04/2021

Adv-OLM: Generating textual adversaries via OLM

Vijit Malik, Ashwani Bhat, Ashutosh Modi

Keywords Paper

0

0

0

0

7:04

26/04/2020

DivideMix: Learning with Noisy Labels as Semi-supervised Learning

Junnan Li, Richard Socher, Steven C.H. Hoi

Keywords Paper

label noise, semi-supervised learning

0

0

0

0

5:00

03/08/2020

Batch norm with entropic regularization turns deterministic autoencoders into generative models

Amur Ghose, Abdullah Rashwan, Pascal Poupart

Keywords Paper

0

0

0

0

8:18

26/04/2020

Are Transformers universal approximators of sequence-to-sequence functions?

Chulhee Yun, Srinadh Bhojanapalli, Ankit Singh Rawat and
Sashank Reddi, Sanjiv Kumar

Keywords Paper

Transformer, universal approximation, contextual mapping, expressive power, permutation equivariance

1

1

0

0

4:55

15/06/2020

Optimizing homomorphic evaluation circuits by program synthesis and term rewriting

DongKwon Lee, Woosuk Lee, Hakjoo Oh, Kwangkeun Yi

Keywords Paper

Term Rewriting, Program Synthesis, Homomorphic Encryption Circuit

0

0

0

0

15:40