Extremely small BERT models from mixed-vocabulary training

19/04/2021

Extremely small BERT models from mixed-vocabulary training

Sanqiang Zhao, Raghav Gupta, Yang Song, Denny Zhou

Keywords:

Abstract Paper Similar Papers

Abstract: Pretrained language models like BERT have achieved good results on NLP tasks, but are impractical on resource-limited devices due to memory footprint. A large fraction of this footprint comes from the input embeddings with large input vocabulary and embedding dimensions. Existing knowledge distillation methods used for model compression cannot be directly applied to train student models with reduced vocabulary sizes. To this end, we propose a distillation method to align the teacher and student embeddings via mixed-vocabulary training. Our method compresses BERT-LARGE to a task-agnostic model with smaller vocabulary and hidden dimensions, which is an order of magnitude smaller than other distilled BERT models and offers a better size-accuracy trade-off on language understanding benchmarks as well as a practical dialogue task.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at EACL 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

16/11/2020

TernaryBERT: Distillation-aware Ultra-low Bit BERT

Wei Zhang, Lu Hou, Yichun Yin and
Lifeng Shang, Xiao Chen, Xin Jiang, Qun Liu

Keywords Paper

natural tasks, training process, transformer-based models, bert

0

0

0

0

8:41

19/10/2020

TwinBERT: Distilling knowledge to twin-structured compressed BERT models for large-scale retrieval

Wenhao Lu, Jian Jiao, Ruofei Zhang

Keywords Paper

knowledge distillation, semantic embedding, sponsored search, bert, information retrieval, deep neural network, deep learning

0

0

0

0

10:20

02/02/2021

Learning to Augment for Data-scarce Domain BERT Knowledge Distillation

Lingyun Feng, Minghui Qiu, Yaliang Li and
Hai-Tao Zheng, Ying Shen

Keywords Paper

0

0

0

0

17:11

03/05/2021

MixKD: Towards Efficient Distillation of Large-scale Language Models

Kevin Liang, Weituo Hao, Dinghan Shen and
Yufan Zhou, Weizhu Chen, Changyou Chen, Lawrence Carin

Keywords Paper

Representation Learning, Natural Language Processing

0

0

0

0

3:52

06/12/2020

DynaBERT: Dynamic BERT with Adaptive Width and Depth

Lu Hou, Zhiqi Huang, Lifeng Shang and
Xin Jiang, Xiao Chen, Qun Liu

Keywords Paper

0

0

0

0

2:59

16/11/2020

Adversarial Self-Supervised Data-Free Distillation for Text Classification

Xinyin Ma, Yongliang Shen, Gongfan Fang and
Chen Chen, Chenghao Jia, Weiming Lu

Keywords Paper

nlp tasks, nlp, compressing models, text generation

0

0

0

0

9:36

16/11/2020

Why Skip If You Can Combine: A Simple Knowledge Distillation Technique for Intermediate Layers

Yimeng Wu, Peyman Passban, Mehdi Rezagholizadeh, Qun Liu

Keywords Paper

knowledge distillation, neural models, knowledge kd, kd

0

0

0

0

6:41

06/12/2020

MPNet: Masked and Permuted Pre-training for Language Understanding

Kaitao Song, Xu Tan, Tao Qin and
Jianfeng Lu, Tie-Yan Liu

Keywords Paper

0

0

0

0

3:23

06/12/2020

ConvBERT: Improving BERT with Span-based Dynamic Convolution

Zi-Hang Jiang, Weihao Yu, Daquan Zhou and
Yunpeng Chen, Jiashi Feng, Shuicheng Yan

Keywords Paper

0

0

0

0

3:20

08/12/2020

TableGPT: Few-shot Table-to-Text Generation with Table Structure Reconstruction and Content Matching

Heng Gong, Yawei Sun, Xiaocheng Feng and
Bing Qin, Wei Bi, Xiaojiang Liu, Ting Liu

Keywords Paper

0

0

0

0

8:45

05/12/2020

Towards non-task-specific distillation of BERT via sentence representation approximation

Bowen Wu, Huan Zhang, MengYuan Li and
Zongsheng Wang, Qihang Feng, Junhong Huang, Baoxun Wang

Keywords Paper

0

0

0

0

10:51

06/12/2020

Modular Meta-Learning with Shrinkage

Yutian Chen, Abe Friesen, Feryal Behbahani and
Arnaud Doucet, David Budden, Matthew Hoffman, Nando de Freitas

Keywords Paper

0

0

0

0

3:21

12/07/2020

Train Big, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Zhuohan Li, Eric Wallace, Sheng Shen and
Kevin Lin, Kurt Keutzer, Dan Klein, Joseph Gonzalez

Keywords Paper

Applications - Language, Speech and Dialog

0

0

0

0

15:21

04/07/2020

Multi-source Meta Transfer for Low Resource Multiple-Choice Question Answering

Ming Yan, Hao Zhang, Di Jin, Joey Tianyi Zhou

Keywords Paper

Multi-source Transfer, Low Answering, Multiple-choice answering, machine comprehension

0

0

0

0

7:40

16/11/2020

Combining Self-Training and Self-Supervised Learning for Unsupervised Disfluency Detection

Shaolei Wang, Zhongyuan Wang, Wanxiang Che, Ting Liu

Keywords Paper

disfluency detection, self-supervised techniques, unsupervised paradigm, noisy training

0

0

0

0

9:57

19/04/2021

How fast can BERT learn simple natural language inference?

Yi-Chung Lin, Keh-Yih Su

Keywords Paper

0

0

0

0

6:59

02/02/2021

Meta-Curriculum Learning for Domain Adaptation in Neural Machine Translation

Runzhe Zhan, Xuebo Liu, Derek F. Wong, Lidia S. Chao

Keywords Paper

0

0

0

0

14:20

06/12/2021

Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation

Jixuan Wang, Kuan-Chieh Wang, Frank Rudzicz, Michael Brudno

Keywords Paper

machine learning, transformers, meta learning, language, transfer learning

0

0

0

0

14:45

30/11/2020

Data-Efficient Ranking Distillation for Image Retrieval

Zakaria Laskar, Juho Kannala

Keywords Paper

0

0

0

0

7:58

01/07/2020

Linguistic Features for Readability Assessment

Tovly Deutsch, Masoud Jasbi, Stuart Shieber

Keywords Paper

0

0

0

0

12:06

16/11/2020

Syntactic Structure Distillation Pretraining for Bidirectional Encoders

Adhiguna Kuncoro, Lingpeng Kong, Daniel Fried and
Dani Yogatama, Laura Rimell, Chris Dyer, Phil Blunsom

Keywords Paper

bert pretraining, structured tasks, natural understanding, textual learners

0

0

0

0

12:23

19/04/2021

Better neural machine translation by extracting linguistic information from BERT

Hassan S. Shavarani, Anoop Sarkar

Keywords Paper

0

0

0

0

12:15

08/12/2020

Less is Better: A cognitively inspired unsupervised model for language segmentation

Jinbiao Yang, Stefan L. Frank, Antal van den Bosch

Keywords Paper

0

0

0

0

10:27

25/07/2020

A pairwise probe for understanding BERT fine-tuning on machine reading comprehension

Jie Cai, Zhengzhou Zhu, Ping Nie, Qian Liu

Keywords Paper

machine reading comprehension, pairwise, fine-tune, BERT

0

0

0

0

6:38

02/02/2021

Effective Slot Filling via Weakly-Supervised Dual-Model Learning

Jue Wang, Ke Chen, Lidan Shou and
Sai Wu, Gang Chen

Keywords Paper

0

0

0

0

18:02

25/04/2020

An Interaction Design for Machine Teaching to Develop AI Tutors

Daniel Weitekamp, Erik Harpstead, Ken Koedinger

Keywords Paper

{simulated learners, interaction design, programming-by-demonstration, machine teaching, intelligent tutoring systems

0

0

0

0

14:56

03/05/2021

Pre-training Text-to-Text Transformers for Concept-centric Common Sense

Wangchunshu Zhou, Dong-Ho Lee, Ravi Kiran Selvam and
Seyeon Lee, Xiang Ren

Keywords Paper

Self-supervised Learning, Commonsense Reasoning, Language Model Pre-training

0

0

0

0

4:56

06/12/2020

TaylorGAN: Neighbor-Augmented Policy Update Towards Sample-Efficient Natural Language Generation

Chun-Hsing Lin, Siang-Ruei Wu, Hung-yi Lee, Yun-Nung (Vivian) Chen

Keywords Paper

0

0

0

0

3:16

04/07/2020

Intermediate-Task Transfer Learning with Pretrained Language Models: When and Why Does It Work?

Yada Pruksachatkun, Jason Phang, Haokun Liu and
Phu Mon Htut, Xiaoyi Zhang, Richard Yuanzhe Pang, Clara Vania, Katharina Kann, Samuel R. Bowman

Keywords Paper

Intermediate-Task Learning, natural tasks, data-rich task, intermediate-task training

0

0

0

0

14:47

14/06/2020

Heterogeneous Knowledge Distillation Using Information Flow Modeling

Nikolaos Passalis, Maria Tzelepi, Anastasios Tefas

Keywords Paper

neural network distillation, lightweight learning, information flow

0

0

0

0

1:00

08/12/2020

Pre-trained Language Model Based Active Learning for Sentence Matching

Guirong Bai, Shizhu He, Kang Liu and
Jun Zhao, Zaiqing Nie

Keywords Paper

0

0

0

0

6:34

02/02/2021

Token-Aware Virtual Adversarial Training in Natural Language Understanding

Linyang Li, Xipeng Qiu

Keywords Paper

0

0

0

0

12:49

03/05/2021

Neural Topic Model via Optimal Transport

He Zhao, Dinh Phung, Viet Huynh and
Trung Le, Wray Buntine

Keywords Paper

optimal transport, document analysis, topic modelling

0

0

0

1

9:29

06/12/2021

Distilling Object Detectors with Feature Richness

Du Zhixing, Rui Zhang, Ming Chang and
xishan zhang, Shaoli Liu, Tianshi Chen, Yunji Chen

Keywords Paper

vision

0

0

0

0

8:42

02/02/2021

Memory-Augmented Image Captioning

Zhengcong Fei

Keywords Paper

0

0

0

0

16:31

19/08/2021

Automatic Mixed-Precision Quantization Search of BERT

Changsheng Zhao, Ting Hua, Yilin Shen and
Qian Lou, Hongxia Jin

Keywords Paper

Machine Learning, Deep Learning, NLP Applications and Tools, Text Classification

0

0

0

0

12:12

03/05/2021

ChipNet: Budget-Aware Pruning with Heaviside Continuous Approximations

Rishabh Tiwari, Udbhav Bamba, Arnav Chavan, Deepak Gupta

Keywords Paper

Budget constraints, Budget-Aware Pruning, Structured Pruning, Sparsity Learning

0

0

0

0

6:01

06/12/2020

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Patrick Lewis, Ethan Perez, Aleksandra Piktus and
Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Scott Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela

Keywords Paper

0

0

0

0

3:25

06/12/2021

Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data

Gongfan Fang, Yifan Bao, Jie Song and
Xinchao Wang, Donglin Xie, Chengchao Shen, Mingli Song

Keywords Paper

machine learning, vision, privacy

0

0

0

0

5:35

04/07/2020

Perturbed Masking: Parameter-free Probing for Analyzing and Interpreting BERT

Zhiyong Wu, Yun Chen, Ben Kao, Qun Liu

Keywords Paper

Analyzing BERT, linguistic tasks, dependency parsing, probing tasks

0

0

0

0

11:00