Contrastive Distillation on Intermediate Representations for Language Model Compression

16/11/2020

Contrastive Distillation on Intermediate Representations for Language Model Compression

Siqi Sun, Zhe Gan, Yuwei Fang, Yu Cheng, Shuohang Wang, Jingjing Liu

Keywords: contrastive distillation, compress models, pre-training stages, existing methods

Abstract Paper Similar Papers

Abstract: Existing language model compression methods mostly use a simple L_2 loss to distill knowledge in the intermediate representations of a large BERT model to a smaller one. Although widely used, this objective by design assumes that all the dimensions of hidden representations are independent, failing to capture important structural knowledge in the intermediate layers of the teacher network. To achieve better distillation efficacy, we propose Contrastive Distillation on Intermediate Representations (CoDIR), a principled knowledge distillation framework where the student is trained to distill knowledge through intermediate layers of the teacher via a contrastive objective. By learning to distinguish positive sample from a large set of negative samples, CoDIR facilitates the student′s exploitation of rich information in teacher′s hidden layers. CoDIR can be readily applied to compress large-scale language models in both pre-training and finetuning stages, and achieves superb performance on the GLUE benchmark, outperforming state-of-the-art compression methods.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at EMNLP 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

08/12/2020

Collective Wisdom: Improving Low-resource Neural Machine Translation using Adaptive Knowledge Distillation

Fahimeh Saleh, Wray Buntine, Gholamreza Haffari

Keywords Paper

0

0

0

0

9:03

02/02/2021

Reinforced Multi-Teacher Selection for Knowledge Distillation

Fei Yuan, Linjun Shou, Jian Pei and
Wutao Lin, Ming Gong, Yan Fu, Daxin Jiang

Keywords Paper

0

0

0

0

14:18

14/06/2020

Distilling Cross-Task Knowledge via Relationship Matching

Han-Jia Ye, Su Lu, De-Chuan Zhan

Keywords Paper

knowledge distillation, model reuse, knowledge transfer, cross-task learning, embedding learning

0

0

0

0

4:54

22/11/2021

Class-Balanced Distillation for Long-Tailed Visual Recognition

Ahmet Iscen, Andre Araujo, Boqing Gong, Cordelia Schmid

Keywords Paper

Long tailed recognition, dataset imbalance

0

0

0

0

3:02

06/12/2021

Comprehensive Knowledge Distillation with Causal Intervention

Xiang Deng, Zhongfei Zhang

Keywords Paper

representation learning, causality

0

0

0

0

12:24

04/07/2020

Single-/Multi-Source Cross-Lingual NER via Teacher-Student Learning on Unlabeled Data in Target Language

Qianhui Wu, Zijia Lin, Börje Karlsson and
Jian-Guang Lou, Biqing Huang

Keywords Paper

Single-/Multi-Source NER, named problem, cross-lingual NER, single-source NER

0

0

0

0

10:54

06/12/2020

MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers

Wenhui Wang, Furu Wei, Li Dong and
Hangbo Bao, Nan Yang, Ming Zhou

Keywords Paper

0

0

0

0

3:21

22/11/2021

PDF-Distil: including Prediction Disagreements in Feature-based Distillation for object detection

Heng ZHANG, Elisa Fromont, Sébastien Lefèvre, Bruno AVIGNON

Keywords Paper

knowledge distillation: object detection

0

0

0

0

2:57

04/07/2020

XtremeDistil: Multi-stage Distillation for Massive Multilingual Models

Subhabrata Mukherjee, Ahmed Hassan Awadallah

Keywords Paper

natural tasks, knowledge distillation, multilingual Recognition, multilingual NER

0

0

0

0

10:58

03/05/2021

Knowledge distillation via softmax regression representation learning

Jing Yang, Brais Martinez, Adrian Bulat, Georgios Tzimiropoulos

Keywords Paper

0

0

0

0

4:56

03/05/2021

Understanding and Improving Lexical Choice in Non-Autoregressive Translation

Liam Ding, Longyue Wang, Xuebo Liu and
Derek Wong, Dacheng Tao, Zhaopeng Tu

Keywords Paper

0

0

0

0

11:37

02/02/2021

ALP-KD: Attention-Based Layer Projection for Knowledge Distillation

Peyman Passban, Yimeng Wu, Mehdi Rezagholizadeh, Qun Liu

Keywords Paper

0

0

0

0

18:53

06/12/2021

Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data

Gongfan Fang, Yifan Bao, Jie Song and
Xinchao Wang, Donglin Xie, Chengchao Shen, Mingli Song

Keywords Paper

machine learning, vision, privacy

0

0

0

0

5:35

02/02/2021

Progressive Network Grafting for Few-Shot Knowledge Distillation

Chengchao Shen, Xinchao Wang, Youtan Yin and
Jie Song, Sihui Luo, Mingli Song

Keywords Paper

0

0

0

0

9:23

22/11/2021

Beyond Classification: Knowledge Distillation using Multi-Object Impressions

Gaurav Kumar Nayak, Monish K Keswani, Sharan Seshadri, Anirban Chakraborty

Keywords Paper

Knowledge Distillation (KD), zero-shot, data-free, object detection, data privacy, multi-object impressions, pseudo-data, pseudo-targets, synthetic data, Faster RCNN

0

0

0

0

3:06

30/11/2020

Fully Supervised and Guided Distillation for One-Stage Detectors

Deyu Wang, Dongchao Wen, Junjie Liu and
Wei Tao, Tse-Wei Chen, Kinya Osa, Masami Kato

Keywords Paper

0

0

0

0

7:14

02/02/2021

Show, Attend and Distill: Knowledge Distillation via Attention-based Feature Matching

Mingi Ji, Byeongho Heo, Sungrae Park

Keywords Paper

0

0

0

0

14:18

19/04/2021

Annealing knowledge distillation

Aref Jafari, Mehdi Rezagholizadeh, Pranav Sharma, Ali Ghodsi

Keywords Paper

0

0

0

0

12:38

06/12/2021

VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer

Zineng Tang, Jaemin Cho, Hao Tan, Mohit Bansal

Keywords Paper

language

0

0

0

0

10:13

06/12/2021

Dynamic Distillation Network for Cross-Domain Few-Shot Recognition with Unlabeled Data

Ashraful Islam, Chun-Fu (Richard) Chen, Rameswar Panda and
Leonid Karlinsky, Rogerio Feris, Richard J. Radke

Keywords Paper

machine learning, meta learning, few shot learning

0

0

0

0

10:10

05/01/2021

Enhancing Diversity in Teacher-Student Networks via Asymmetric Branches for Unsupervised Person Re-Identification

Hao Chen, Benoit Lagadec, Francois Bremond

Keywords Paper

0

0

0

0

5:01

02/02/2021

Learning to Augment for Data-scarce Domain BERT Knowledge Distillation

Lingyun Feng, Minghui Qiu, Yaliang Li and
Hai-Tao Zheng, Ying Shen

Keywords Paper

0

0

0

0

17:11

23/08/2020

Multimodal learning with incomplete modalities by knowledge distillation

Qi Wang, Liang Zhan, Paul Thompson, Jiayu Zhou

Keywords Paper

knowledge distillation, multimodal learning, incomplete modalities

0

0

0

0

17:53

14/06/2020

Revisiting Knowledge Distillation via Label Smoothing Regularization

Li Yuan, Francis EH Tay, Guilin Li and
Tao Wang, Jiashi Feng

Keywords Paper

knowledge distillation, label smoothing regularization

0

0

0

0

4:21

05/01/2021

Data-Free Knowledge Distillation for Object Detection

Akshay Chawla, Hongxu Yin, Pavlo Molchanov, Jose Alvarez

Keywords Paper

0

0

0

0

4:36

19/08/2021

Self-boosting for Feature Distillation

Yulong Pei, Yanyun Qu, Junping Zhang

Keywords Paper

Computer Vision, 2D and 3D Computer Vision, Recognition

0

0

0

0

12:57

19/08/2021

Perturb, Predict & Paraphrase: Semi-Supervised Learning using Noisy Student for Image Captioning

Arjit Jain, Pranay Reddy Samala, Preethi Jyothi and
Deepak Mittal, Maneesh Singh

Keywords Paper

Computer Vision, Language and Vision, Semi-Supervised Learning

0

0

0

0

10:06

16/11/2020

Adversarial Self-Supervised Data-Free Distillation for Text Classification

Xinyin Ma, Yongliang Shen, Gongfan Fang and
Chen Chen, Chenghao Jia, Weiming Lu

Keywords Paper

nlp tasks, nlp, compressing models, text generation

0

0

0

0

9:36

03/05/2021

Contrastive Learning with Adversarial Perturbations for Conditional Text Generation

Seanie Lee, Dong Bok Lee, Sung Ju Hwang

Keywords Paper

contrastive learning, conditional text generation

0

0

0

0

4:51

22/11/2021

Object Re-identification Using Teacher-Like and Light Students

Yi Xie, Hanxiao Wu, Fei Shen and
Jianqing Zhu, Huanqiang Zeng

Keywords Paper

object re-identification, knowledge distillation, pruning, re-parameterization

0

0

0

0

3:19

22/11/2021

Self-supervised Knowledge Distillation for Few-shot Learning

Jathushan Rajasegaran, Salman Khan, Munawar Hayat and
Fahad Shahbaz Khan, Mubarak Shah

Keywords Paper

Self-supervision, Knowledge Distillation, Few-shot Learning

0

0

0

0

2:49

02/02/2021

Learning to Reweight with Deep Interactions

Yang Fan, Yingce Xia, Lijun Wu and
Shufang Xie, Weiqing Liu, Jiang Bian, Tao Qin, Xiang-Yang Li

Keywords Paper

0

0

0

0

14:06

03/05/2021

SEED: Self-supervised Distillation For Visual Representation

Jacob Zhiyuan Fang, Jianfeng Wang, Lijuan Wang and
Lei Zhang, 'YZ' Yezhou Yang, Zicheng Liu

Keywords Paper

Representation Learning, Self Supervised Learning, Knowledge Distillation

0

0

0

0

5:09

14/06/2020

Neural Networks Are More Productive Teachers Than Human Raters: Active Mixup for Data-Efficient Knowledge Distillation From a Blackbox Model

Dongdong Wang, Yandong Li, Liqiang Wang, Boqing Gong

Keywords Paper

blackbox knowledge distillation, data-efficient learning, active learning, mixup

0

0

0

0

4:59

03/05/2021

MixKD: Towards Efficient Distillation of Large-scale Language Models

Kevin Liang, Weituo Hao, Dinghan Shen and
Yufan Zhou, Weizhu Chen, Changyou Chen, Lawrence Carin

Keywords Paper

Representation Learning, Natural Language Processing

0

0

0

0

3:52

14/06/2020

Distilling Image Dehazing With Heterogeneous Task Imitation

Ming Hong, Yuan Xie, Cuihua Li, Yanyun Qu

Keywords Paper

knowledge-distill, image dehazing, heterogeneous task imitation

0

0

0

0

0:57

03/05/2021

Undistillable: Making A Nasty Teacher That CANNOT teach students

Haoyu Ma, Tianlong Chen, Ting-Kuei Hu and
Chenyu You, Xiaohui Xie, Zhangyang Wang

Keywords Paper

avoid knowledge leaking, knowledge distillation

0

0

0

0

9:38

06/12/2021

Unsupervised Representation Transfer for Small Networks: I Believe I Can Distill On-the-Fly

Hee Min Choi, Hyoa Kang, Dokwan Oh

Keywords Paper

self-supervised learning, representation learning

0

0

0

0

3:35

06/12/2020

Knowledge Distillation in Wide Neural Networks: Risk Bound, Data Efficiency and Imperfect Teacher

Guangda Ji, Zhanxing Zhu

Keywords Paper

0

0

0

0

3:19

16/11/2020

Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision

Hao Tan, Mohit Bansal

Keywords Paper

speaking, writing, text-only self-supervision, pure-language tasks

0

0

0

0

11:59