Learning to Augment for Data-scarce Domain BERT Knowledge Distillation

02/02/2021

Learning to Augment for Data-scarce Domain BERT Knowledge Distillation

Lingyun Feng, Minghui Qiu, Yaliang Li, Hai-Tao Zheng, Ying Shen

Keywords:

Abstract Paper Similar Papers

Abstract: Despite pre-trained language models such as BERT have achieved appealing performance in a wide range of Natural Language Processing (NLP) tasks, they are computationally expensive to be deployed in real-time applications. A typical method is to adopt knowledge distillation to compress these large pre-trained models (teacher models) to small student models. However, for a target domain with scarce training data, the teacher can hardly pass useful knowledge to the student, which yields performance degradation for the student models. To tackle this problem, we propose a method to learn to augment data for BERT Knowledge Distillation in target domains with scarce labeled data, by learning a cross-domain manipulation scheme that automatically augments the target domain with the help of resource-rich source domains. Specifically, the proposed method generates samples acquired from a stationary distribution near the target data and adopts a reinforced controller to automatically refine the augmentation strategy according to the performance of the student. Extensive experiments demonstrate that the proposed method significantly outperforms state-of-the-art baselines on different NLP tasks, and for the data-scarce domains, the compressed student models even perform better than the original large teacher model, with much fewer parameters (only ~13.3%) when only a few labeled examples available.

The video of this talk cannot be embedded. You can watch it here:

https://slideslive.com/38948470

(Link will open in new window)

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at AAAI 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

03/05/2021

MixKD: Towards Efficient Distillation of Large-scale Language Models

Kevin Liang, Weituo Hao, Dinghan Shen and
Yufan Zhou, Weizhu Chen, Changyou Chen, Lawrence Carin

Keywords Paper

Representation Learning, Natural Language Processing

0

0

0

0

3:52

19/04/2021

Annealing knowledge distillation

Aref Jafari, Mehdi Rezagholizadeh, Pranav Sharma, Ali Ghodsi

Keywords Paper

0

0

0

0

12:38

05/01/2021

Effectiveness of Arbitrary Transfer Sets for Data-Free Knowledge Distillation

Gaurav Kumar Nayak, Konda Reddy Mopuri, Anirban Chakraborty

Keywords Paper

0

0

0

0

5:00

06/12/2020

MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers

Wenhui Wang, Furu Wei, Li Dong and
Hangbo Bao, Nan Yang, Ming Zhou

Keywords Paper

0

0

0

0

3:21

30/11/2020

Compensating for the Lack of Extra Training Data by Learning Extra Representation

Hyeonseong Jeon, Siho Han, Sangwon Lee, Simon S. Woo

Keywords Paper

0

0

0

0

9:19

03/05/2021

SEED: Self-supervised Distillation For Visual Representation

Jacob Zhiyuan Fang, Jianfeng Wang, Lijuan Wang and
Lei Zhang, 'YZ' Yezhou Yang, Zicheng Liu

Keywords Paper

Representation Learning, Self Supervised Learning, Knowledge Distillation

0

0

0

0

5:09

16/11/2020

BERT-EMD: Many-to-Many Layer Mapping for BERT Compression with Earth Mover's Distance

Jianquan Li, Xiaokang Liu, Honghong Zhao and
Ruifeng Xu, Min Yang, Yaohong Jin

Keywords Paper

natural tasks, nlp tasks, matching, many-to-many mapping

0

0

0

0

11:58

16/11/2020

Adversarial Self-Supervised Data-Free Distillation for Text Classification

Xinyin Ma, Yongliang Shen, Gongfan Fang and
Chen Chen, Chenghao Jia, Weiming Lu

Keywords Paper

nlp tasks, nlp, compressing models, text generation

0

0

0

0

9:36

02/02/2021

Reinforced Multi-Teacher Selection for Knowledge Distillation

Fei Yuan, Linjun Shou, Jian Pei and
Wutao Lin, Ming Gong, Yan Fu, Daxin Jiang

Keywords Paper

0

0

0

0

14:18

06/12/2021

Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data

Gongfan Fang, Yifan Bao, Jie Song and
Xinchao Wang, Donglin Xie, Chengchao Shen, Mingli Song

Keywords Paper

machine learning, vision, privacy

0

0

0

0

5:35

14/06/2020

Heterogeneous Knowledge Distillation Using Information Flow Modeling

Nikolaos Passalis, Maria Tzelepi, Anastasios Tefas

Keywords Paper

neural network distillation, lightweight learning, information flow

0

0

0

0

1:00

08/12/2020

Collective Wisdom: Improving Low-resource Neural Machine Translation using Adaptive Knowledge Distillation

Fahimeh Saleh, Wray Buntine, Gholamreza Haffari

Keywords Paper

0

0

0

0

9:03

30/11/2020

Data-Efficient Ranking Distillation for Image Retrieval

Zakaria Laskar, Juho Kannala

Keywords Paper

0

0

0

0

7:58

03/05/2021

Knowledge Distillation as Semiparametric Inference

Tri Dao, Govinda Kamath, Vasilis Syrgkanis, Lester Mackey

Keywords Paper

generalization bounds, knowledge distillation, model compression, loss correction, orthogonal machine learning, cross-fitting, semiparametric inference

0

0

0

0

5:10

19/08/2021

Self-boosting for Feature Distillation

Yulong Pei, Yanyun Qu, Junping Zhang

Keywords Paper

Computer Vision, 2D and 3D Computer Vision, Recognition

0

0

0

0

12:57

18/07/2021

REPAINT: Knowledge Transfer in Deep Reinforcement Learning

Yunzhe Tao, Sahika Genc, Jonathan Chung and
TAO SUN, Sunil Mallya

Keywords Paper

Algorithms, Ranking and Preference Learning, Algorithms, Regression; Applications, Health; Theory, Learning Theory; Theory, Regularization, Algorithms, Multitask, Transfer, and Meta Learning

0

0

0

0

5:04

22/11/2021

Class-Balanced Distillation for Long-Tailed Visual Recognition

Ahmet Iscen, Andre Araujo, Boqing Gong, Cordelia Schmid

Keywords Paper

Long tailed recognition, dataset imbalance

0

0

0

0

3:02

18/07/2021

Zero-Shot Knowledge Distillation from a Decision-Based Black-Box Model

Zi Wang

Keywords Paper

Deep Learning

0

0

0

0

5:08

14/06/2020

Neural Networks Are More Productive Teachers Than Human Raters: Active Mixup for Data-Efficient Knowledge Distillation From a Blackbox Model

Dongdong Wang, Yandong Li, Liqiang Wang, Boqing Gong

Keywords Paper

blackbox knowledge distillation, data-efficient learning, active learning, mixup

0

0

0

0

4:59

04/07/2020

Single-/Multi-Source Cross-Lingual NER via Teacher-Student Learning on Unlabeled Data in Target Language

Qianhui Wu, Zijia Lin, Börje Karlsson and
Jian-Guang Lou, Biqing Huang

Keywords Paper

Single-/Multi-Source NER, named problem, cross-lingual NER, single-source NER

0

0

0

0

10:54

03/05/2021

Contrastive Learning with Adversarial Perturbations for Conditional Text Generation

Seanie Lee, Dong Bok Lee, Sung Ju Hwang

Keywords Paper

contrastive learning, conditional text generation

0

0

0

0

4:51

05/01/2021

Real-Time Uncertainty Estimation in Computer Vision via Uncertainty-Aware Distribution Distillation

Yichen Shen, Zhilu Zhang, Mert R. Sabuncu, Lin Sun

Keywords Paper

0

0

0

0

5:01

14/06/2020

Knowledge As Priors: Cross-Modal Knowledge Generalization for Datasets Without Superior Knowledge

Long Zhao, Xi Peng, Yuxiao Chen and
Mubbasir Kapadia, Dimitris N. Metaxas

Keywords Paper

knowledge distillation, transfer learning, meta-learning, 3d hand pose estimation

0

0

0

0

1:01

06/12/2021

Dynamic Distillation Network for Cross-Domain Few-Shot Recognition with Unlabeled Data

Ashraful Islam, Chun-Fu (Richard) Chen, Rameswar Panda and
Leonid Karlinsky, Rogerio Feris, Richard J. Radke

Keywords Paper

machine learning, meta learning, few shot learning

0

0

0

0

10:10

14/06/2020

Revisiting Knowledge Distillation via Label Smoothing Regularization

Li Yuan, Francis EH Tay, Guilin Li and
Tao Wang, Jiashi Feng

Keywords Paper

knowledge distillation, label smoothing regularization

0

0

0

0

4:21

06/12/2021

Comprehensive Knowledge Distillation with Causal Intervention

Xiang Deng, Zhongfei Zhang

Keywords Paper

representation learning, causality

0

0

0

0

12:24

22/11/2021

Object Re-identification Using Teacher-Like and Light Students

Yi Xie, Hanxiao Wu, Fei Shen and
Jianqing Zhu, Huanqiang Zeng

Keywords Paper

object re-identification, knowledge distillation, pruning, re-parameterization

0

0

0

0

3:19

03/05/2021

Undistillable: Making A Nasty Teacher That CANNOT teach students

Haoyu Ma, Tianlong Chen, Ting-Kuei Hu and
Chenyu You, Xiaohui Xie, Zhangyang Wang

Keywords Paper

avoid knowledge leaking, knowledge distillation

0

0

0

0

9:38

06/12/2021

Memory Efficient Meta-Learning with Large Images

John Bronskill, Daniela Massiceti, Massimiliano Patacchiola and
Katja Hofmann, Sebastian Nowozin, Richard Turner

Keywords Paper

optimization, machine learning, vision, meta learning, transfer learning, few shot learning

0

0

0

0

6:39

04/07/2020

Norm-Based Curriculum Learning for Neural Machine Translation

Xuebo Liu, Houtim Lai, Derek F. Wong, Lidia S. Chao

Keywords Paper

Neural Translation, norm-based difficulty, WMT'14 tasks, Norm-Based Learning

0

0

0

0

11:36

06/12/2021

Unsupervised Representation Transfer for Small Networks: I Believe I Can Distill On-the-Fly

Hee Min Choi, Hyoa Kang, Dokwan Oh

Keywords Paper

self-supervised learning, representation learning

0

0

0

0

3:35

06/12/2021

MixACM: Mixup-Based Robustness Transfer via Distillation of Activated Channel Maps

Awais Muhammad, Fengwei Zhou, Chuanlong Xie and
Jiawei Li, Sung-Ho Bae, Zhenguo Li

Keywords Paper

deep learning, optimization, robustness, adversarial robustness and security

0

0

0

0

12:51

13/04/2021

Understanding robustness in teacher-student setting: A new perspective

Zhuolin Yang, Zhaoxi Chen, Tiffany Cai and
Xinyun Chen, Bo Li, Yuandong Tian

Keywords Paper

0

0

0

0

3:03

02/02/2021

Automatic Curriculum Learning With Over-repetition Penalty for Dialogue Policy Learning

Yangyang Zhao, Zhenyu Wang, Zhenhua Huang

Keywords Paper

0

0

0

0

15:41

16/11/2020

Autoregressive Knowledge Distillation through Imitation Learning

Alexander Lin, Jeremy Wohlwend, Howard Chen, Tao Lei

Keywords Paper

natural tasks, knowledge distillation, exposure problem, prototypical tasks

0

0

0

0

12:43

19/08/2021

Object Detection in Densely Packed Scenes via Semi-Supervised Learning with Dual Consistency

Chao Ye, Huaidong Zhang, Xuemiao Xu and
Weiwei Cai, Jing Qin, Kup-Sze Choi

Keywords Paper

Computer Vision, Recognition, Deep Learning, Semi-Supervised Learning

0

0

0

0

10:19

16/11/2020

Contrastive Distillation on Intermediate Representations for Language Model Compression

Siqi Sun, Zhe Gan, Yuwei Fang and
Yu Cheng, Shuohang Wang, Jingjing Liu

Keywords Paper

contrastive distillation, compress models, pre-training stages, existing methods

0

0

0

0

8:19

03/05/2021

Knowledge distillation via softmax regression representation learning

Jing Yang, Brais Martinez, Adrian Bulat, Georgios Tzimiropoulos

Keywords Paper

0

0

0

0

4:56

06/12/2021

Exponential Separation between Two Learning Models and Adversarial Robustness

Grzegorz Gluch, Ruediger Urbanke

Keywords Paper

theory, robustness, adversarial robustness and security

0

0

0

0

15:11

26/04/2020

Contrastive Representation Distillation

Yonglong Tian, Dilip Krishnan, Phillip Isola

Keywords Paper

Knowledge Distillation, Representation Learning, Contrastive Learning, Mutual Information

0

0

0

0

4:55