Learning Light-Weight Translation Models from Deep Transformer

02/02/2021

Learning Light-Weight Translation Models from Deep Transformer

Bei Li, Ziyang Wang, Hui Liu, Quan Du, Tong Xiao, Chunliang Zhang, Jingbo Zhu

Keywords:

Abstract Paper Similar Papers

Abstract: Recently, deep models have shown tremendous improvements in neural machine translation (NMT). However, systems of this kind are computationally expensive and memory intensive. In this paper, we take a natural step towards learning strong but light-weight NMT systems. We proposed a novel group-permutation based knowledge distillation approach to compressing the deep Transformer model into a shallow model. The experimental results on several benchmarks validate the effectiveness of our method. Our compressed model is 8 times shallower than the deep model, with almost no loss in BLEU. To further enhance the teacher model, we present a Skipping Sub-Layer method to randomly omit sub-layers to introduce perturbation into training, which achieves a BLEU score of 30.63 on English-German newstest2014. The code is publicly available at https://github.com/libeineu/GPKD.

The video of this talk cannot be embedded. You can watch it here:

https://slideslive.com/38949243

(Link will open in new window)

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at AAAI 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

16/11/2020

Shallow-to-Deep Training for Neural Machine Translation

Bei Li, Ziyang Wang, Hui Liu and
Yufan Jiang, Quan Du, Tong Xiao, Huizhen Wang, Jingbo Zhu

Keywords Paper

neural systems, wmt tasks, deep encoders, deep encoder

0

0

0

0

12:13

06/12/2020

Data Diversification: A Simple Strategy For Neural Machine Translation

Xuan-Phi Nguyen, Shafiq Joty, Kui Wu, Ai Ti Aw

Keywords Paper

0

0

0

0

3:23

04/07/2020

Multiscale Collaborative Deep Models for Neural Machine Translation

Xiangpeng Wei, Heng Yu, Yue Hu and
Yue Zhang, Rongxiang Weng, Weihua Luo

Keywords Paper

Neural Translation, training models, IWSLT tasks, WMT14 task

0

0

0

0

10:42

08/12/2020

Dynamic Curriculum Learning for Low-Resource Neural Machine Translation

Chen Xu, Bojie Hu, Yufan Jiang and
Kai Feng, Zeyang Wang, Shen Huang, Qi Ju, Tong Xiao, Jingbo Zhu

Keywords Paper

0

0

0

0

13:28

03/05/2021

Dataset Meta-Learning from Kernel Ridge-Regression

Timothy Nguyen, Zhourong Chen, Jaehoon Lee

Keywords Paper

dataset corruption, infinite-width networks, neural kernels, kernel-ridge regression, dataset compression, dataset distillation, meta-learning

0

0

0

0

4:59

06/12/2020

Unsupervised Data Augmentation for Consistency Training

Qizhe Xie, Zihang Dai, Eduard Hovy and
Thang Luong, Quoc V Le

Keywords Paper

0

0

0

0

3:29

06/12/2021

R-Drop: Regularized Dropout for Neural Networks

xiaobo liang, Lijun Wu, Juntao Li and
Yue Wang, Qi Meng, Tao Qin, Wei Chen, Min Zhang, Tie-Yan Liu

Keywords Paper

deep learning, machine learning, transformers, vision, language

0

0

0

0

13:02

04/07/2020

Norm-Based Curriculum Learning for Neural Machine Translation

Xuebo Liu, Houtim Lai, Derek F. Wong, Lidia S. Chao

Keywords Paper

Neural Translation, norm-based difficulty, WMT'14 tasks, Norm-Based Learning

0

0

0

0

11:36

18/07/2021

Dataset Condensation with Differentiable Siamese Augmentation

Bo Zhao, Hakan Bilen

Keywords Paper

Algorithms, Multitask, Transfer, and Meta Learning

0

0

0

0

5:02

04/07/2020

Uncertainty-Aware Curriculum Learning for Neural Machine Translation

Yikai Zhou, Baosong Yang, Derek F. Wong and
Yu Wan, Lidia S. Chao

Keywords Paper

Neural Translation, assessment difficulty, translation tasks, Uncertainty-Aware Learning

0

0

0

0

8:20

04/07/2020

Lexically Constrained Neural Machine Translation with Levenshtein Transformer

Raymond Hendy Susanto, Shamil Chollampatt, Liling Tan

Keywords Paper

Lexically Translation, neural translation, Levenshtein Transformer, beam decoding

0

0

0

0

7:05

02/02/2021

Incremental Embedding Learning via Zero-Shot Translation

Kun Wei, Cheng Deng, Xu Yang, Maosen Li

Keywords Paper

0

0

0

0

13:42

02/02/2021

Joint-Label Learning by Dual Augmentation for Time Series Classification

Qianli Ma, Zhenjing Zheng, Jiawei Zheng and
Sen Li, Wanqing Zhuang, Garrison W. Cottrell

Keywords Paper

0

0

0

0

15:59

18/07/2021

Self-supervised and Supervised Joint Training for Resource-rich Machine Translation

Yong Cheng, Wei Wang, Lu Jiang, Wolfgang Macherey

Keywords Paper

Applications, Natural Language Processing

0

0

0

0

5:21

11/08/2020

A computational approach to packet classification

Alon Rashelbach, Ori Rottenstreich, Mark Silberstein

Keywords Paper

Neural Networks, Virtual Switches, Packet Classification

0

0

0

0

16:56

03/05/2021

Neural Pruning via Growing Regularization

Huan Wang, Can Qin, Yulun Zhang, Yun Fu

Keywords Paper

deep neural network pruning, regularization, Hessian matrix, model compression

0

0

0

0

6:15

26/04/2020

Training Recurrent Neural Networks Online by Learning Explicit State Variables

Somjit Nath, Vincent Liu, Alan Chan and
Xin Li, Adam White, Martha White

Keywords Paper

Recurrent Neural Network, Partial Observability, Online Prediction, Incremental Learning

0

0

0

0

5:06

06/12/2021

GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training

Chen Zhu, Renkun Ni, Zheng Xu and
Kezhi Kong, W. Ronny Huang, Tom Goldstein

Keywords Paper

deep learning, transformers, vision

0

0

0

0

13:17

03/05/2021

Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch

Aojun Zhou, Yukun Ma, Junnan Zhu and
Jianbo Liu, Zhijie Zhang, Kun Yuan, Wenxiu Sun, Hongsheng Li

Keywords Paper

sparsity, efficient training and inference.

0

0

0

0

5:09

04/07/2020

Harvesting and Refining Question-Answer Pairs for Unsupervised QA

Zhongli Li, Wenhui Wang, Li Dong and
Furu Wei, Ke Xu

Keywords Paper

Unsupervised QA, Question Answering, Question QA, QA

0

0

0

0

10:28

18/07/2021

Opening the Blackbox: Accelerating Neural Differential Equations by Regularizing Internal Solver Heuristics

Avik Pal, Yingbo Ma, Viral Shah, Christopher Rackauckas

Keywords Paper

Deep Learning

0

0

0

0

5:11

06/12/2021

MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the Edge

Geng Yuan, Xiaolong Ma, Wei Niu and
Zhengang Li, Zhenglun Kong, Ning Liu, Yifan Gong, Zheng Zhan, Chaoyang He, Qing Jin, Siyue Wang, Minghai Qin, Bin Ren, Yanzhi Wang, Sijia Liu, Xue Lin

Keywords Paper

deep learning, reinforcement learning and planning

0

0

0

0

15:00

07/09/2020

Meta-RetinaNet for Few-shot Object Detection

Shaoqi Li, Wenfeng Song, Shuai Li and
Aimin Hao, Hong Qin

Keywords Paper

Few shot, object detection, meta-learning, Meta-RetinaNet, Balanced Loss, coefficient vector

0

0

0

0

8:51

06/12/2020

DeepI2I: Enabling Deep Hierarchical Image-to-Image Translation by Transferring from GANs

yaxing wang, Lu Yu, Joost van de Weijer

Keywords Paper

Algorithms -> Online Learning, Optimization -> Stochastic Optimization

0

0

0

0

3:23

06/12/2020

ExpandNets: Linear Over-parameterization to Train Compact Convolutional Networks

Shuxuan Guo, Jose M. Alvarez, Mathieu Salzmann

Keywords Paper

0

0

0

0

3:20

06/12/2021

FlexMatch: Boosting Semi-Supervised Learning with Curriculum Pseudo Labeling

Bowen Zhang, Yidong Wang, Wenxin Hou and
HAO WU, Jindong Wang, Manabu Okumura, Takahiro Shinozaki

Keywords Paper

semi-supervised learning

0

0

0

0

7:33

30/11/2020

Data-Efficient Ranking Distillation for Image Retrieval

Zakaria Laskar, Juho Kannala

Keywords Paper

0

0

0

0

7:58

03/05/2021

Meta-Learning with Neural Tangent Kernels

Yufan Zhou, Zhenyi Wang, Jiayi Xian and
Changyou Chen, Jinhui Xu

Keywords Paper

neural tangent kernel, meta-learning

0

0

0

0

3:54

04/07/2020

AdvAug: Robust Adversarial Augmentation for Neural Machine Translation

Yong Cheng, Lu Jiang, Wolfgang Macherey, Jacob Eisenstein

Keywords Paper

Robust Augmentation, Neural Translation, Neural NMT, Neural

0

0

0

0

12:16

03/05/2021

MixKD: Towards Efficient Distillation of Large-scale Language Models

Kevin Liang, Weituo Hao, Dinghan Shen and
Yufan Zhou, Weizhu Chen, Changyou Chen, Lawrence Carin

Keywords Paper

Representation Learning, Natural Language Processing

0

0

0

0

3:52

19/04/2021

Multi-split reversible transformers can enhance neural machine translation

Yuekai Zhao, Shuchang Zhou, Zhihua Zhang

Keywords Paper

0

0

0

0

12:00

16/11/2020

Language Model Prior for Low-Resource Neural Machine Translation

Christos Baziotis, Barry Haddow, Alexandra Birch

Keywords Paper

neural translation, neural tm, knowledge distillation, training time

0

0

0

0

11:16

12/07/2020

Training Neural Networks for and by Interpolation

Leonard Berrada, M. Pawan Kumar, Andrew Zisserman

Keywords Paper

Deep Learning - General

0

0

0

0

16:12

18/07/2021

Training Data Subset Selection for Regression with Controlled Generalization Error

Durga S, Rishabh Iyer, Ganesh Ramakrishnan, Abir De

Keywords Paper

, Algorithms, Online Learning, Algorithms, Supervised Learning

0

0

0

0

4:15

22/11/2021

FFNB: Forgetting-Free Neural Blocks for Deep Continual Learning

Hichem Sahbi, Haoming Zhan

Keywords Paper

Continual and incremental learning, lifelong learning, catastrophic interference, catastrophic forgetting, dynamic neural networks, visual recognition

0

0

0

0

3:05

12/07/2020

Improving Transformer Optimization Through Better Initialization

Xiao Shi Huang, Felipe Perez, Jimmy Ba, Maksims Volkovs

Keywords Paper

Sequential, Network, and Time-Series Modeling

0

0

0

0

14:52

16/11/2020

Autoregressive Knowledge Distillation through Imitation Learning

Alexander Lin, Jeremy Wohlwend, Howard Chen, Tao Lei

Keywords Paper

natural tasks, knowledge distillation, exposure problem, prototypical tasks

0

0

0

0

12:43

02/02/2021

GLISTER: Generalization based Data Subset Selection for Efficient and Robust Learning

Krishnateja Killamsetty, Durga Sivasubramanian, Ganesh Ramakrishnan, Rishabh Iyer

Keywords Paper

0

0

0

0

19:14

18/07/2021

A Theory of Label Propagation for Subpopulation Shift

Tianle Cai, Ruiqi Gao, Jason Lee, Qi Lei

Keywords Paper

Theory, Statistical Learning Theory

0

1

0

0

5:08

26/04/2020

Reducing Transformer Depth on Demand with Structured Dropout

Angela Fan, Edouard Grave, Armand Joulin

Keywords Paper

reduction, regularization, pruning, dropout, transformer

0

0

0

0

5:01