Reinforced Multi-Teacher Selection for Knowledge Distillation

02/02/2021

Reinforced Multi-Teacher Selection for Knowledge Distillation

Fei Yuan, Linjun Shou, Jian Pei, Wutao Lin, Ming Gong, Yan Fu, Daxin Jiang

Keywords:

Abstract Paper Similar Papers

Abstract: In natural language processing (NLP) tasks, slow inference speed and huge footprints in GPU usage remain the bottleneck of applying pre-trained deep models in production. As a popular method for model compression, knowledge distillation transfers knowledge from one or multiple large (teacher) models to a small (student) model. When multiple teacher models are available in distillation, the state-of-the-art methods assign a fixed weight to a teacher model in the whole distillation. Furthermore, most of the existing methods allocate an equal weight to every teacher model. In this paper, we observe that, due to the complexity of training examples and the differences in student model capability, learning differentially from teacher models can lead to better performance of student models distilled. We systematically develop a reinforced method to dynamically assign weights to teacher models for different training instances and optimize the performance of student model. Our extensive experimental results on several NLP tasks clearly verify the feasibility and effectiveness of our approach.

The video of this talk cannot be embedded. You can watch it here:

https://slideslive.com/38949352

(Link will open in new window)

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at AAAI 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

08/12/2020

Collective Wisdom: Improving Low-resource Neural Machine Translation using Adaptive Knowledge Distillation

Fahimeh Saleh, Wray Buntine, Gholamreza Haffari

Keywords Paper

0

0

0

0

9:03

02/02/2021

Show, Attend and Distill: Knowledge Distillation via Attention-based Feature Matching

Mingi Ji, Byeongho Heo, Sungrae Park

Keywords Paper

0

0

0

0

14:18

02/02/2021

Progressive Network Grafting for Few-Shot Knowledge Distillation

Chengchao Shen, Xinchao Wang, Youtan Yin and
Jie Song, Sihui Luo, Mingli Song

Keywords Paper

0

0

0

0

9:23

19/04/2021

Annealing knowledge distillation

Aref Jafari, Mehdi Rezagholizadeh, Pranav Sharma, Ali Ghodsi

Keywords Paper

0

0

0

0

12:38

16/11/2020

Contrastive Distillation on Intermediate Representations for Language Model Compression

Siqi Sun, Zhe Gan, Yuwei Fang and
Yu Cheng, Shuohang Wang, Jingjing Liu

Keywords Paper

contrastive distillation, compress models, pre-training stages, existing methods

0

0

0

0

8:19

19/08/2021

Graph Consistency Based Mean-Teaching for Unsupervised Domain Adaptive Person Re-Identification

Xiaobin Liu, Shiliang Zhang

Keywords Paper

Computer Vision, Recognition, Applications of Unsupervised Learning

0

0

0

0

11:05

06/12/2021

Comprehensive Knowledge Distillation with Causal Intervention

Xiang Deng, Zhongfei Zhang

Keywords Paper

representation learning, causality

0

0

0

0

12:24

22/11/2021

Class-Balanced Distillation for Long-Tailed Visual Recognition

Ahmet Iscen, Andre Araujo, Boqing Gong, Cordelia Schmid

Keywords Paper

Long tailed recognition, dataset imbalance

0

0

0

0

3:02

06/12/2021

Learning Student-Friendly Teacher Networks for Knowledge Distillation

Dae Young Park, Moon-Hyun Cha, changwook jeong and
Daesin Kim, Bohyung Han

Keywords Paper

deep learning, transfer learning

0

0

0

0

13:41

23/08/2020

Multimodal learning with incomplete modalities by knowledge distillation

Qi Wang, Liang Zhan, Paul Thompson, Jiayu Zhou

Keywords Paper

knowledge distillation, multimodal learning, incomplete modalities

0

0

0

0

17:53

06/12/2020

Agree to Disagree: Adaptive Ensemble Knowledge Distillation in Gradient Space

Shangchen Du, Shan You, Xiaojie Li and
Jianlong Wu, Fei Wang, Chen Qian, Changshui Zhang

Keywords Paper

0

0

0

0

3:22

02/02/2021

Learning to Augment for Data-scarce Domain BERT Knowledge Distillation

Lingyun Feng, Minghui Qiu, Yaliang Li and
Hai-Tao Zheng, Ying Shen

Keywords Paper

0

0

0

0

17:11

04/07/2020

XtremeDistil: Multi-stage Distillation for Massive Multilingual Models

Subhabrata Mukherjee, Ahmed Hassan Awadallah

Keywords Paper

natural tasks, knowledge distillation, multilingual Recognition, multilingual NER

0

0

0

0

10:58

03/05/2021

SEED: Self-supervised Distillation For Visual Representation

Jacob Zhiyuan Fang, Jianfeng Wang, Lijuan Wang and
Lei Zhang, 'YZ' Yezhou Yang, Zicheng Liu

Keywords Paper

Representation Learning, Self Supervised Learning, Knowledge Distillation

0

0

0

0

5:09

06/12/2021

Learning curves of generic features maps for realistic datasets with a teacher-student model

Bruno Loureiro, Cedric Gerbelot, Hugo Cui and
Sebastian Goldt, Florent Krzakala, Marc Mezard, Lenka Zdeborová

Keywords Paper

deep learning, machine learning, kernel methods

0

0

0

0

12:59

02/02/2021

Learning to Reweight with Deep Interactions

Yang Fan, Yingce Xia, Lijun Wu and
Shufang Xie, Weiqing Liu, Jiang Bian, Tao Qin, Xiang-Yang Li

Keywords Paper

0

0

0

0

14:06

06/12/2020

MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers

Wenhui Wang, Furu Wei, Li Dong and
Hangbo Bao, Nan Yang, Ming Zhou

Keywords Paper

0

0

0

0

3:21

06/12/2021

Dynamic Distillation Network for Cross-Domain Few-Shot Recognition with Unlabeled Data

Ashraful Islam, Chun-Fu (Richard) Chen, Rameswar Panda and
Leonid Karlinsky, Rogerio Feris, Richard J. Radke

Keywords Paper

machine learning, meta learning, few shot learning

0

0

0

0

10:10

03/05/2021

Knowledge distillation via softmax regression representation learning

Jing Yang, Brais Martinez, Adrian Bulat, Georgios Tzimiropoulos

Keywords Paper

0

0

0

0

4:56

14/06/2020

Heterogeneous Knowledge Distillation Using Information Flow Modeling

Nikolaos Passalis, Maria Tzelepi, Anastasios Tefas

Keywords Paper

neural network distillation, lightweight learning, information flow

0

0

0

0

1:00

05/01/2021

Effectiveness of Arbitrary Transfer Sets for Data-Free Knowledge Distillation

Gaurav Kumar Nayak, Konda Reddy Mopuri, Anirban Chakraborty

Keywords Paper

0

0

0

0

5:00

03/05/2021

MixKD: Towards Efficient Distillation of Large-scale Language Models

Kevin Liang, Weituo Hao, Dinghan Shen and
Yufan Zhou, Weizhu Chen, Changyou Chen, Lawrence Carin

Keywords Paper

Representation Learning, Natural Language Processing

0

0

0

0

3:52

30/11/2020

Data-Efficient Ranking Distillation for Image Retrieval

Zakaria Laskar, Juho Kannala

Keywords Paper

0

0

0

0

7:58

06/12/2021

Exponential Separation between Two Learning Models and Adversarial Robustness

Grzegorz Gluch, Ruediger Urbanke

Keywords Paper

theory, robustness, adversarial robustness and security

0

0

0

0

15:11

06/12/2020

Knowledge Distillation in Wide Neural Networks: Risk Bound, Data Efficiency and Imperfect Teacher

Guangda Ji, Zhanxing Zhu

Keywords Paper

0

0

0

0

3:19

14/06/2020

Distilling Cross-Task Knowledge via Relationship Matching

Han-Jia Ye, Su Lu, De-Chuan Zhan

Keywords Paper

knowledge distillation, model reuse, knowledge transfer, cross-task learning, embedding learning

0

0

0

0

4:54

03/05/2021

Knowledge Distillation as Semiparametric Inference

Tri Dao, Govinda Kamath, Vasilis Syrgkanis, Lester Mackey

Keywords Paper

generalization bounds, knowledge distillation, model compression, loss correction, orthogonal machine learning, cross-fitting, semiparametric inference

0

0

0

0

5:10

02/02/2021

ALP-KD: Attention-Based Layer Projection for Knowledge Distillation

Peyman Passban, Yimeng Wu, Mehdi Rezagholizadeh, Qun Liu

Keywords Paper

0

0

0

0

18:53

14/06/2020

Few Sample Knowledge Distillation for Efficient Network Compression

Tianhong Li, Jianguo Li, Zhuang Liu, Changshui Zhang

Keywords Paper

efficient network compression, few samples, knowledge distillation

0

0

0

0

1:01

26/04/2020

Contrastive Representation Distillation

Yonglong Tian, Dilip Krishnan, Phillip Isola

Keywords Paper

Knowledge Distillation, Representation Learning, Contrastive Learning, Mutual Information

0

0

0

0

4:55

12/07/2020

Student Specialization in Deep Rectified Networks With Finite Width and Input Dimension

Yuandong Tian

Keywords Paper

Deep Learning - Theory

0

0

0

0

15:31

12/07/2020

Teaching with Limited Information on the Learner's Behaviour

Ferdinando Cicalese, Francisco Sergio de Freitas Filho, Eduardo Laber, Marco Molinaro

Keywords Paper

Learning Theory

0

0

0

0

15:07

22/11/2021

PDF-Distil: including Prediction Disagreements in Feature-based Distillation for object detection

Heng ZHANG, Elisa Fromont, Sébastien Lefèvre, Bruno AVIGNON

Keywords Paper

knowledge distillation: object detection

0

0

0

0

2:57

03/05/2021

Rethinking Soft Labels for Knowledge Distillation: A Bias–Variance Tradeoff Perspective

Helong Zhou, Liangchen Song, Jiajie Chen and
Ye Zhou, Guoli Wang, Junsong Yuan, Qian Zhang

Keywords Paper

teacher-student model, soft labels, Knowledge distillation

0

0

0

0

2:20

19/08/2021

Self-boosting for Feature Distillation

Yulong Pei, Yanyun Qu, Junping Zhang

Keywords Paper

Computer Vision, 2D and 3D Computer Vision, Recognition

0

0

0

0

12:57

14/06/2020

Revisiting Knowledge Distillation via Label Smoothing Regularization

Li Yuan, Francis EH Tay, Guilin Li and
Tao Wang, Jiashi Feng

Keywords Paper

knowledge distillation, label smoothing regularization

0

0

0

0

4:21

06/12/2021

Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data

Gongfan Fang, Yifan Bao, Jie Song and
Xinchao Wang, Donglin Xie, Chengchao Shen, Mingli Song

Keywords Paper

machine learning, vision, privacy

0

0

0

0

5:35

06/12/2021

Spatial Ensemble: a Novel Model Smoothing Mechanism for Student-Teacher Framework

Tengteng Huang, Yifan Sun, Xun Wang and
Haotian Yao, Chi Zhang

Keywords Paper

0

0

0

0

10:07

02/02/2021

Multi-View Feature Representation for Dialogue Generation with Bidirectional Distillation

Shaoxiong Feng, Xuancheng Ren, Kan Li, Xu Sun

Keywords Paper

0

0

0

0

19:13

22/11/2021

Object Re-identification Using Teacher-Like and Light Students

Yi Xie, Hanxiao Wu, Fei Shen and
Jianqing Zhu, Huanqiang Zeng

Keywords Paper

object re-identification, knowledge distillation, pruning, re-parameterization

0

0

0

0

3:19