Semi-Online Knowledge Distillation

22/11/2021

Semi-Online Knowledge Distillation

Zhiqiang Liu, Yanxia Liu, Chengkai Huang

Keywords: Knowledge Distillation, Model Compression

Abstract Paper Code Similar Papers

Abstract: Knowledge distillation is an effective and stable method for model compression via knowledge transfer. Conventional knowledge distillation (KD) is to transfer knowledge from a large and well pre-trained teacher network to a small student network, which is a one-way process. Recently, deep mutual learning (DML) has been proposed to help student networks learn collaboratively and simultaneously. However, to the best of our knowledge, KD and DML have never been jointly explored in a unified framework to solve the knowledge distillation problem. In this paper, we investigate that the teacher model supports more trustworthy supervision signals in KD, while the student captures more similar behaviors from the teacher in DML. Based on these observations, we first propose to combine KD with DML in a unified framework. Furthermore, we propose a Semi-Online Knowledge Distillation (SOKD) method that effectively improves the performance of the student and the teacher. In this method, we introduce the peer-teaching training fashion in DML in order to alleviate the student's imitation difficulty, and we also leverage the supervision signals provided by the well-trained teacher in KD. Besides, we also show our framework can be easily extended to feature-based distillation methods for improved learning accuracy. Extensive experiments on CIFAR-100 and ImageNet datasets demonstrate the proposed method achieves state-of-the-art performance.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at BMVC 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2021

Learning Student-Friendly Teacher Networks for Knowledge Distillation

Dae Young Park, Moon-Hyun Cha, changwook jeong and
Daesin Kim, Bohyung Han

Keywords Paper

deep learning, transfer learning

0

0

0

0

13:41

14/06/2020

Online Knowledge Distillation via Collaborative Learning

Qiushan Guo, Xinjiang Wang, Yichao Wu and
Zhipeng Yu, Ding Liang, Xiaolin Hu, Ping Luo

Keywords Paper

knowledge distillation, collaborative learning, transfer learning, deep neural network

0

0

0

0

4:37

16/11/2020

Amalgamating Knowledge from Two Teachers for Task-oriented Dialogue System with Adversarial Training

Wanwei He, Min Yang, Rui Yan and
Chengming Li, Ying Shen, Ruifeng Xu

Keywords Paper

task completion, generating responses, task-oriented dialogue, task-oriented systems

0

0

0

0

9:15

19/08/2021

Hierarchical Self-supervised Augmented Knowledge Distillation

Chuanguang Yang, Zhulin An, Linhang Cai, Yongjun Xu

Keywords Paper

Computer Vision, Recognition

0

0

0

0

13:30

03/05/2021

Knowledge distillation via softmax regression representation learning

Jing Yang, Brais Martinez, Adrian Bulat, Georgios Tzimiropoulos

Keywords Paper

0

0

0

0

4:56

06/12/2021

Unsupervised Representation Transfer for Small Networks: I Believe I Can Distill On-the-Fly

Hee Min Choi, Hyoa Kang, Dokwan Oh

Keywords Paper

self-supervised learning, representation learning

0

0

0

0

3:35

02/02/2021

Data-Free Knowledge Distillation with Soft Targeted Transfer Set Synthesis

Zi Wang

Keywords Paper

0

0

0

0

14:19

30/11/2020

Introspective Learning by Distilling Knowledge from Online Self-explanation

Jindong Gu, Zhiliang Wu, Volker Tresp

Keywords Paper

0

0

0

0

10:18

02/02/2021

Cross-Layer Distillation with Semantic Calibration

Defang Chen, Jian-Ping Mei, Yuan Zhang and
Can Wang, Zhe Wang, Yan Feng, Chun Chen

Keywords Paper

0

0

0

0

17:05

19/08/2021

Self-boosting for Feature Distillation

Yulong Pei, Yanyun Qu, Junping Zhang

Keywords Paper

Computer Vision, 2D and 3D Computer Vision, Recognition

0

0

0

0

12:57

03/05/2021

SEED: Self-supervised Distillation For Visual Representation

Jacob Zhiyuan Fang, Jianfeng Wang, Lijuan Wang and
Lei Zhang, 'YZ' Yezhou Yang, Zicheng Liu

Keywords Paper

Representation Learning, Self Supervised Learning, Knowledge Distillation

0

0

0

0

5:09

02/02/2021

Stochastic Precision Ensemble: Self-Knowledge Distillation for Quantized Deep Neural Networks

Yoonho Boo, Sungho Shin, Jungwook Choi, Wonyong Sung

Keywords Paper

0

0

0

0

19:03

02/02/2021

Learning to Reweight with Deep Interactions

Yang Fan, Yingce Xia, Lijun Wu and
Shufang Xie, Weiqing Liu, Jiang Bian, Tao Qin, Xiang-Yang Li

Keywords Paper

0

0

0

0

14:06

06/12/2021

Iterative Teacher-Aware Learning

Luyao Yuan, Dongruo Zhou, Junhong Shen and
Jingdong Gao, Jeffrey L Chen, Quanquan Gu, Ying Nian Wu, Song-Chun Zhu

Keywords Paper

theory, optimization, reinforcement learning and planning, machine learning

0

0

0

0

6:40

19/04/2021

Annealing knowledge distillation

Aref Jafari, Mehdi Rezagholizadeh, Pranav Sharma, Ali Ghodsi

Keywords Paper

0

0

0

0

12:38

22/11/2021

Deep Knowledge Distillation using Trainable Dense Attention

Bharat Sau, Soumya Roy, Vinay P Namboodiri, Raghu Sesha Iyengar

Keywords Paper

Knowledge Distillation, Network Compression, Visual Attention

0

0

0

0

3:04

02/02/2021

Peer Collaborative Learning for Online Knowledge Distillation

Guile Wu, Shaogang Gong

Keywords Paper

0

0

0

0

17:46

06/12/2021

Spatial Ensemble: a Novel Model Smoothing Mechanism for Student-Teacher Framework

Tengteng Huang, Yifan Sun, Xun Wang and
Haotian Yao, Chi Zhang

Keywords Paper

0

0

0

0

10:07

06/12/2021

Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data

Gongfan Fang, Yifan Bao, Jie Song and
Xinchao Wang, Donglin Xie, Chengchao Shen, Mingli Song

Keywords Paper

machine learning, vision, privacy

0

0

0

0

5:35

14/06/2020

Heterogeneous Knowledge Distillation Using Information Flow Modeling

Nikolaos Passalis, Maria Tzelepi, Anastasios Tefas

Keywords Paper

neural network distillation, lightweight learning, information flow

0

0

0

0

1:00

06/12/2021

Comprehensive Knowledge Distillation with Causal Intervention

Xiang Deng, Zhongfei Zhang

Keywords Paper

representation learning, causality

0

0

0

0

12:24

02/02/2021

Multi-View Feature Representation for Dialogue Generation with Bidirectional Distillation

Shaoxiong Feng, Xuancheng Ren, Kan Li, Xu Sun

Keywords Paper

0

0

0

0

19:13

14/06/2020

Search to Distill: Pearls Are Everywhere but Not the Eyes

Yu Liu, Xuhui Jia, Mingxing Tan and
Raviteja Vemulapalli, Yukun Zhu, Bradley Green, Xiaogang Wang

Keywords Paper

neural architecture search, knowledge distillation, nas, neural architecture

0

0

0

0

4:26

02/02/2021

ALP-KD: Attention-Based Layer Projection for Knowledge Distillation

Peyman Passban, Yimeng Wu, Mehdi Rezagholizadeh, Qun Liu

Keywords Paper

0

0

0

0

18:53

02/02/2021

Progressive Network Grafting for Few-Shot Knowledge Distillation

Chengchao Shen, Xinchao Wang, Youtan Yin and
Jie Song, Sihui Luo, Mingli Song

Keywords Paper

0

0

0

0

9:23

02/02/2021

Robust Knowledge Transfer via Hybrid Forward on the Teacher-Student Model

Liangchen Song, Jialian Wu, Ming Yang and
Qian Zhang, Yuan Li, Junsong Yuan

Keywords Paper

0

0

0

0

16:09

14/06/2020

Revisiting Knowledge Distillation via Label Smoothing Regularization

Li Yuan, Francis EH Tay, Guilin Li and
Tao Wang, Jiashi Feng

Keywords Paper

knowledge distillation, label smoothing regularization

0

0

0

0

4:21

22/11/2021

Adaptive Distillation: Aggregating Knowledge from Multiple Paths for Efficient Distillation

Sumanth Chennupati, Mohammad Mahdi Kamani, Zhongwei Cheng, Lin Chen

Keywords Paper

Knowledge Distillation, Multitask Learning, Model Compression, Adaptive Distillation, Efficient Training

0

0

0

0

3:07

30/11/2020

Online Knowledge Distillation via Multi-branch Diversity Enhancement

Zheng Li, Ying Huang, Defang Chen and
Tianren Luo, Ning Cai, Zhigeng Pan

Keywords Paper

0

0

0

0

5:36

06/12/2020

MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers

Wenhui Wang, Furu Wei, Li Dong and
Hangbo Bao, Nan Yang, Ming Zhou

Keywords Paper

0

0

0

0

3:21

13/04/2021

RankDistil: Knowledge distillation for ranking

Sashank Reddi, Rama Kumar Pasumarthi, Aditya Menon and
Ankit Singh Rawat, Felix Yu, Seungyeon Kim, Andreas Veit, Sanjiv Kumar

Keywords Paper

0

0

0

0

2:58

19/08/2021

Graph Consistency Based Mean-Teaching for Unsupervised Domain Adaptive Person Re-Identification

Xiaobin Liu, Shiliang Zhang

Keywords Paper

Computer Vision, Recognition, Applications of Unsupervised Learning

0

0

0

0

11:05

02/02/2021

Show, Attend and Distill: Knowledge Distillation via Attention-based Feature Matching

Mingi Ji, Byeongho Heo, Sungrae Park

Keywords Paper

0

0

0

0

14:18

30/11/2020

Fully Supervised and Guided Distillation for One-Stage Detectors

Deyu Wang, Dongchao Wen, Junjie Liu and
Wei Tao, Tse-Wei Chen, Kinya Osa, Masami Kato

Keywords Paper

0

0

0

0

7:14

18/07/2021

Zero-Shot Knowledge Distillation from a Decision-Based Black-Box Model

Zi Wang

Keywords Paper

Deep Learning

0

0

0

0

5:08

06/12/2021

Adversarial Teacher-Student Representation Learning for Domain Generalization

Fu-En Yang, Yuan-Chia Cheng, Zu-Yun Shiau, Yu-Chiang Frank Wang

Keywords Paper

machine learning, vision, domain adaptation, representation learning

0

0

0

0

8:14

14/06/2020

Distilling Cross-Task Knowledge via Relationship Matching

Han-Jia Ye, Su Lu, De-Chuan Zhan

Keywords Paper

knowledge distillation, model reuse, knowledge transfer, cross-task learning, embedding learning

0

0

0

0

4:54

03/05/2021

Knowledge Distillation as Semiparametric Inference

Tri Dao, Govinda Kamath, Vasilis Syrgkanis, Lester Mackey

Keywords Paper

generalization bounds, knowledge distillation, model compression, loss correction, orthogonal machine learning, cross-fitting, semiparametric inference

0

0

0

0

5:10

19/10/2020

Ensembled CTR prediction via knowledge distillation

Jieming Zhu, Jinyang Liu, Weiqi Li and
Jincai Lai, Xiuqiang He, Liang Chen, Zibin Zheng

Keywords Paper

model ensemble, knowledge distillation, ctr prediction, online advertising, recommender systems

0

0

0

0

9:26

14/06/2020

Neural Networks Are More Productive Teachers Than Human Raters: Active Mixup for Data-Efficient Knowledge Distillation From a Blackbox Model

Dongdong Wang, Yandong Li, Liqiang Wang, Boqing Gong

Keywords Paper

blackbox knowledge distillation, data-efficient learning, active learning, mixup

0

0

0

0

4:59