Self-boosting for Feature Distillation

Abstract: Knowledge distillation is a simple but effective method for model compression, which obtains a better-performing small network (Student) by learning from a well-trained large network (Teacher). However, when the difference in the model sizes of Student and Teacher is large, the gap in capacity leads to poor performance of Student. Existing methods focus on seeking simplified or more effective knowledge from Teacher to narrow the Teacher-Student gap, while we address this problem by Student's self-boosting. Specifically, we propose a novel distillation method named Self-boosting Feature Distillation (SFD), which eases the Teacher-Student gap by feature integration and self-distillation of Student. Three different modules are designed for feature integration to enhance the discriminability of Student's feature, which leads to improving the order of convergence in theory. Moreover, an easy-to-operate self-distillation strategy is put forward to stabilize the training process and promote the performance of Student, without additional forward propagation or memory consumption. Extensive experiments on multiple benchmarks and networks show that our method is significantly superior to existing methods.

03/05/2021

generalization bounds, knowledge distillation, model compression, loss correction, orthogonal machine learning, cross-fitting, semiparametric inference

5:10

30/11/2020

Self-boosting for Feature Distillation

Yulong Pei, Yanyun Qu, Junping Zhang

Comments

Similar Papers

Knowledge Distillation as Semiparametric Inference

Tri Dao, Govinda Kamath, Vasilis Syrgkanis, Lester Mackey

Keywords Abstract Paper

generalization bounds, knowledge distillation, model compression, loss correction, orthogonal machine learning, cross-fitting, semiparametric inference

Online Knowledge Distillation via Multi-branch Diversity Enhancement

Zheng Li, Ying Huang, Defang Chen and Tianren Luo, Ning Cai, Zhigeng Pan

Keywords Abstract Paper

Semi-Online Knowledge Distillation

Zhiqiang Liu, Yanxia Liu, Chengkai Huang

Keywords Abstract Paper

Knowledge Distillation, Model Compression

Annealing knowledge distillation

Aref Jafari, Mehdi Rezagholizadeh, Pranav Sharma, Ali Ghodsi

Keywords Abstract Paper

Revisiting Knowledge Distillation via Label Smoothing Regularization

Li Yuan, Francis EH Tay, Guilin Li and Tao Wang, Jiashi Feng

Keywords Abstract Paper

knowledge distillation, label smoothing regularization

Knowledge distillation via softmax regression representation learning

Jing Yang, Brais Martinez, Adrian Bulat, Georgios Tzimiropoulos

Keywords Abstract Paper

ALP-KD: Attention-Based Layer Projection for Knowledge Distillation

Peyman Passban, Yimeng Wu, Mehdi Rezagholizadeh, Qun Liu

Keywords Abstract Paper

A statistical perspective on distillation

Aditya Menon, Ankit Singh Rawat, Sashank Jakkam Reddi and Seungyeon Kim, Sanjiv Kumar

Keywords Abstract Paper

Theory, Deep learning Theory

Neural Networks Are More Productive Teachers Than Human Raters: Active Mixup for Data-Efficient Knowledge Distillation From a Blackbox Model

Dongdong Wang, Yandong Li, Liqiang Wang, Boqing Gong

Keywords Abstract Paper

blackbox knowledge distillation, data-efficient learning, active learning, mixup

MixKD: Towards Efficient Distillation of Large-scale Language Models

Kevin Liang, Weituo Hao, Dinghan Shen and Yufan Zhou, Weizhu Chen, Changyou Chen, Lawrence Carin

Keywords Abstract Paper

Representation Learning, Natural Language Processing

SEED: Self-supervised Distillation For Visual Representation

Jacob Zhiyuan Fang, Jianfeng Wang, Lijuan Wang and Lei Zhang, 'YZ' Yezhou Yang, Zicheng Liu

Keywords Abstract Paper

Representation Learning, Self Supervised Learning, Knowledge Distillation

Comprehensive Knowledge Distillation with Causal Intervention

Xiang Deng, Zhongfei Zhang

Keywords Abstract Paper

representation learning, causality

Deep Knowledge Distillation using Trainable Dense Attention

Bharat Sau, Soumya Roy, Vinay P Namboodiri, Raghu Sesha Iyengar

Keywords Abstract Paper

Knowledge Distillation, Network Compression, Visual Attention

Adaptive Distillation: Aggregating Knowledge from Multiple Paths for Efficient Distillation

Sumanth Chennupati, Mohammad Mahdi Kamani, Zhongwei Cheng, Lin Chen

Keywords Abstract Paper

Knowledge Distillation, Multitask Learning, Model Compression, Adaptive Distillation, Efficient Training

Learning to Augment for Data-scarce Domain BERT Knowledge Distillation

Lingyun Feng, Minghui Qiu, Yaliang Li and Hai-Tao Zheng, Ying Shen

Keywords Abstract Paper

Knowledge Distillation in Wide Neural Networks: Risk Bound, Data Efficiency and Imperfect Teacher

Guangda Ji, Zhanxing Zhu

Keywords Abstract Paper

Does Knowledge Distillation Really Work?

Samuel Stanton, Pavel Izmailov, Polina Kirichenko and Alexander A Alemi, Andrew Wilson

Keywords Abstract Paper

optimization

Spatial Ensemble: a Novel Model Smoothing Mechanism for Student-Teacher Framework

Tengteng Huang, Yifan Sun, Xun Wang and Haotian Yao, Chi Zhang

Keywords Abstract Paper

Unsupervised Representation Transfer for Small Networks: I Believe I Can Distill On-the-Fly

Hee Min Choi, Hyoa Kang, Dokwan Oh

Keywords Abstract Paper

self-supervised learning, representation learning

RankDistil: Knowledge distillation for ranking

Sashank Reddi, Rama Kumar Pasumarthi, Aditya Menon and Ankit Singh Rawat, Felix Yu, Seungyeon Kim, Andreas Veit, Sanjiv Kumar

Keywords Abstract Paper

Learning Student-Friendly Teacher Networks for Knowledge Distillation

Dae Young Park, Moon-Hyun Cha, changwook jeong and Daesin Kim, Bohyung Han

Keywords Abstract Paper

deep learning, transfer learning

Keywords Paper

Zheng Li, Ying Huang, Defang Chen and
Tianren Luo, Ning Cai, Zhigeng Pan

Keywords Paper

Keywords Paper

Keywords Paper

Li Yuan, Francis EH Tay, Guilin Li and
Tao Wang, Jiashi Feng

Keywords Paper

Keywords Paper

Keywords Paper

Aditya Menon, Ankit Singh Rawat, Sashank Jakkam Reddi and
Seungyeon Kim, Sanjiv Kumar

Keywords Paper

Keywords Paper

Kevin Liang, Weituo Hao, Dinghan Shen and
Yufan Zhou, Weizhu Chen, Changyou Chen, Lawrence Carin

Keywords Paper

Jacob Zhiyuan Fang, Jianfeng Wang, Lijuan Wang and
Lei Zhang, 'YZ' Yezhou Yang, Zicheng Liu

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Lingyun Feng, Minghui Qiu, Yaliang Li and
Hai-Tao Zheng, Ying Shen

Keywords Paper

Keywords Paper

Samuel Stanton, Pavel Izmailov, Polina Kirichenko and
Alexander A Alemi, Andrew Wilson

Keywords Paper

Tengteng Huang, Yifan Sun, Xun Wang and
Haotian Yao, Chi Zhang

Keywords Paper

Keywords Paper

Sashank Reddi, Rama Kumar Pasumarthi, Aditya Menon and
Ankit Singh Rawat, Felix Yu, Seungyeon Kim, Andreas Veit, Sanjiv Kumar

Keywords Paper

Dae Young Park, Moon-Hyun Cha, changwook jeong and
Daesin Kim, Bohyung Han

Keywords Paper

Fei Yuan, Linjun Shou, Jian Pei and
Wutao Lin, Ming Gong, Yan Fu, Daxin Jiang

Keywords Paper

Keywords Paper

Helong Zhou, Liangchen Song, Jiajie Chen and
Ye Zhou, Guoli Wang, Junsong Yuan, Qian Zhang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Qiushan Guo, Xinjiang Wang, Yichao Wu and
Zhipeng Yu, Ding Liang, Xiaolin Hu, Ping Luo

Keywords Paper

Yang Fan, Yingce Xia, Lijun Wu and
Shufang Xie, Weiqing Liu, Jiang Bian, Tao Qin, Xiang-Yang Li

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Awais Muhammad, Fengwei Zhou, Chuanlong Xie and
Jiawei Li, Sung-Ho Bae, Zhenguo Li

Keywords Paper

Shangchen Du, Shan You, Xiaojie Li and
Jianlong Wu, Fei Wang, Chen Qian, Changshui Zhang

Keywords Paper

Shaoxiong Feng, Hongshen Chen, Xuancheng Ren and
Zhuoye Ding, Kan Li, Xu Sun

Keywords Paper

Keywords Paper