Search to Distill: Pearls Are Everywhere but Not the Eyes

14/06/2020

Search to Distill: Pearls Are Everywhere but Not the Eyes

Yu Liu, Xuhui Jia, Mingxing Tan, Raviteja Vemulapalli, Yukun Zhu, Bradley Green, Xiaogang Wang

Keywords: neural architecture search, knowledge distillation, nas, neural architecture

Abstract Paper Similar Papers

Abstract: Standard Knowledge Distillation (KD) approaches distill the knowledge of a cumbersome teacher model into the parameters of a student model with a pre-defined architecture. However, the knowledge of a neural network, which is represented by the network's output distribution conditioned on its input, depends not only on its parameters but also on its architecture. Hence, a more generalized approach for KD is to distill the teacher's knowledge into both the parameters and architecture of the student. To achieve this, we present a new \textit{Architecture-aware Knowledge Distillation (AKD)} approach that finds student models (pearls for the teacher) that are best for distilling the given teacher model. In particular, we leverage Neural Architecture Search (NAS), equipped with our KD-guided reward, to search for the best student architectures for a given teacher. Experimental results show our proposed AKD consistently outperforms the conventional NAS plus KD approach, and achieves state-of-the-art results on the ImageNet classification task under various latency settings. Furthermore, the best AKD student architecture for the ImageNet classification task also transfers well to other tasks such as million level face recognition and ensemble learning.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at CVPR 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2021

Learning Student-Friendly Teacher Networks for Knowledge Distillation

Dae Young Park, Moon-Hyun Cha, changwook jeong and
Daesin Kim, Bohyung Han

Keywords Paper

deep learning, transfer learning

0

0

0

0

13:41

06/12/2020

Black-Box Ripper: Copying black-box models using generative evolutionary algorithms

Antonio Barbalau, Adrian Cosma, Radu Tudor Ionescu, Marius Popescu

Keywords Paper

0

0

0

0

3:18

30/11/2020

Fully Supervised and Guided Distillation for One-Stage Detectors

Deyu Wang, Dongchao Wen, Junjie Liu and
Wei Tao, Tse-Wei Chen, Kinya Osa, Masami Kato

Keywords Paper

0

0

0

0

7:14

22/11/2021

Object Re-identification Using Teacher-Like and Light Students

Yi Xie, Hanxiao Wu, Fei Shen and
Jianqing Zhu, Huanqiang Zeng

Keywords Paper

object re-identification, knowledge distillation, pruning, re-parameterization

0

0

0

0

3:19

22/11/2021

Semi-Online Knowledge Distillation

Zhiqiang Liu, Yanxia Liu, Chengkai Huang

Keywords Paper

Knowledge Distillation, Model Compression

0

0

0

0

3:00

22/11/2021

Teacher-Class Network: A Neural Network Compression Mechanism

Shaiq Munir Malik, Fnu Mohbat, Muhammad Umair Haider and
Muhammad Musab Rasheed, Murtaza Taj

Keywords Paper

model compression, knowledge distillation, teacher-student network

0

0

0

0

3:17

03/05/2021

Knowledge distillation via softmax regression representation learning

Jing Yang, Brais Martinez, Adrian Bulat, Georgios Tzimiropoulos

Keywords Paper

0

0

0

0

4:56

02/02/2021

Stochastic Precision Ensemble: Self-Knowledge Distillation for Quantized Deep Neural Networks

Yoonho Boo, Sungho Shin, Jungwook Choi, Wonyong Sung

Keywords Paper

0

0

0

0

19:03

06/12/2020

MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers

Wenhui Wang, Furu Wei, Li Dong and
Hangbo Bao, Nan Yang, Ming Zhou

Keywords Paper

0

0

0

0

3:21

06/12/2021

Unsupervised Representation Transfer for Small Networks: I Believe I Can Distill On-the-Fly

Hee Min Choi, Hyoa Kang, Dokwan Oh

Keywords Paper

self-supervised learning, representation learning

0

0

0

0

3:35

18/07/2021

AlphaNet: Improved Training of Supernets with Alpha-Divergence

Dilin Wang, Chengyue Gong, Meng Li and
Qiang Liu, Vikas Chandra

Keywords Paper

Deep Learning, Architectures

0

0

0

0

16:14

14/06/2020

Distilling Image Dehazing With Heterogeneous Task Imitation

Ming Hong, Yuan Xie, Cuihua Li, Yanyun Qu

Keywords Paper

knowledge-distill, image dehazing, heterogeneous task imitation

0

0

0

0

0:57

13/04/2021

Understanding robustness in teacher-student setting: A new perspective

Zhuolin Yang, Zhaoxi Chen, Tiffany Cai and
Xinyun Chen, Bo Li, Yuandong Tian

Keywords Paper

0

0

0

0

3:03

18/07/2021

Zero-Shot Knowledge Distillation from a Decision-Based Black-Box Model

Zi Wang

Keywords Paper

Deep Learning

0

0

0

0

5:08

02/02/2021

Teacher Guided Neural Architecture Search for Face Recognition

Xiaobo Wang

Keywords Paper

0

0

0

0

13:54

19/04/2021

Annealing knowledge distillation

Aref Jafari, Mehdi Rezagholizadeh, Pranav Sharma, Ali Ghodsi

Keywords Paper

0

0

0

0

12:38

16/11/2020

Amalgamating Knowledge from Two Teachers for Task-oriented Dialogue System with Adversarial Training

Wanwei He, Min Yang, Rui Yan and
Chengming Li, Ying Shen, Ruifeng Xu

Keywords Paper

task completion, generating responses, task-oriented dialogue, task-oriented systems

0

0

0

0

9:15

02/02/2021

ALP-KD: Attention-Based Layer Projection for Knowledge Distillation

Peyman Passban, Yimeng Wu, Mehdi Rezagholizadeh, Qun Liu

Keywords Paper

0

0

0

0

18:53

22/11/2021

Deep Knowledge Distillation using Trainable Dense Attention

Bharat Sau, Soumya Roy, Vinay P Namboodiri, Raghu Sesha Iyengar

Keywords Paper

Knowledge Distillation, Network Compression, Visual Attention

0

0

0

0

3:04

14/06/2020

Online Knowledge Distillation via Collaborative Learning

Qiushan Guo, Xinjiang Wang, Yichao Wu and
Zhipeng Yu, Ding Liang, Xiaolin Hu, Ping Luo

Keywords Paper

knowledge distillation, collaborative learning, transfer learning, deep neural network

0

0

0

0

4:37

06/12/2021

Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data

Gongfan Fang, Yifan Bao, Jie Song and
Xinchao Wang, Donglin Xie, Chengchao Shen, Mingli Song

Keywords Paper

machine learning, vision, privacy

0

0

0

0

5:35

02/02/2021

Learning to Reweight with Deep Interactions

Yang Fan, Yingce Xia, Lijun Wu and
Shufang Xie, Weiqing Liu, Jiang Bian, Tao Qin, Xiang-Yang Li

Keywords Paper

0

0

0

0

14:06

02/02/2021

Data-Free Knowledge Distillation with Soft Targeted Transfer Set Synthesis

Zi Wang

Keywords Paper

0

0

0

0

14:19

30/11/2020

Introspective Learning by Distilling Knowledge from Online Self-explanation

Jindong Gu, Zhiliang Wu, Volker Tresp

Keywords Paper

0

0

0

0

10:18

03/05/2021

SEED: Self-supervised Distillation For Visual Representation

Jacob Zhiyuan Fang, Jianfeng Wang, Lijuan Wang and
Lei Zhang, 'YZ' Yezhou Yang, Zicheng Liu

Keywords Paper

Representation Learning, Self Supervised Learning, Knowledge Distillation

0

0

0

0

5:09

14/06/2020

Heterogeneous Knowledge Distillation Using Information Flow Modeling

Nikolaos Passalis, Maria Tzelepi, Anastasios Tefas

Keywords Paper

neural network distillation, lightweight learning, information flow

0

0

0

0

1:00

22/11/2021

Adaptive Distillation: Aggregating Knowledge from Multiple Paths for Efficient Distillation

Sumanth Chennupati, Mohammad Mahdi Kamani, Zhongwei Cheng, Lin Chen

Keywords Paper

Knowledge Distillation, Multitask Learning, Model Compression, Adaptive Distillation, Efficient Training

0

0

0

0

3:07

14/06/2020

Neural Networks Are More Productive Teachers Than Human Raters: Active Mixup for Data-Efficient Knowledge Distillation From a Blackbox Model

Dongdong Wang, Yandong Li, Liqiang Wang, Boqing Gong

Keywords Paper

blackbox knowledge distillation, data-efficient learning, active learning, mixup

0

0

0

0

4:59

19/08/2021

Hierarchical Self-supervised Augmented Knowledge Distillation

Chuanguang Yang, Zhulin An, Linhang Cai, Yongjun Xu

Keywords Paper

Computer Vision, Recognition

0

0

0

0

13:30

16/11/2020

BERT-EMD: Many-to-Many Layer Mapping for BERT Compression with Earth Mover's Distance

Jianquan Li, Xiaokang Liu, Honghong Zhao and
Ruifeng Xu, Min Yang, Yaohong Jin

Keywords Paper

natural tasks, nlp tasks, matching, many-to-many mapping

0

0

0

0

11:58

06/12/2021

Spatial Ensemble: a Novel Model Smoothing Mechanism for Student-Teacher Framework

Tengteng Huang, Yifan Sun, Xun Wang and
Haotian Yao, Chi Zhang

Keywords Paper

0

0

0

0

10:07

02/02/2021

Robust Knowledge Transfer via Hybrid Forward on the Teacher-Student Model

Liangchen Song, Jialian Wu, Ming Yang and
Qian Zhang, Yuan Li, Junsong Yuan

Keywords Paper

0

0

0

0

16:09

06/12/2021

Learning curves of generic features maps for realistic datasets with a teacher-student model

Bruno Loureiro, Cedric Gerbelot, Hugo Cui and
Sebastian Goldt, Florent Krzakala, Marc Mezard, Lenka Zdeborová

Keywords Paper

deep learning, machine learning, kernel methods

0

0

0

0

12:59

02/02/2021

Cross-Layer Distillation with Semantic Calibration

Defang Chen, Jian-Ping Mei, Yuan Zhang and
Can Wang, Zhe Wang, Yan Feng, Chun Chen

Keywords Paper

0

0

0

0

17:05

14/06/2020

Distilling Cross-Task Knowledge via Relationship Matching

Han-Jia Ye, Su Lu, De-Chuan Zhan

Keywords Paper

knowledge distillation, model reuse, knowledge transfer, cross-task learning, embedding learning

0

0

0

0

4:54

14/06/2020

Inter-Region Affinity Distillation for Road Marking Segmentation

Yuenan Hou, Zheng Ma, Chunxiao Liu and
Tak-Wai Hui, Chen Change Loy

Keywords Paper

road marking segmentation, knowledge distillation, representation learning, structural knowledge, affinity distillation, moment pooling, inter-region similarity, lightweight models, feature embeddings, graph matching

0

0

0

0

1:00

03/05/2021

Undistillable: Making A Nasty Teacher That CANNOT teach students

Haoyu Ma, Tianlong Chen, Ting-Kuei Hu and
Chenyu You, Xiaohui Xie, Zhangyang Wang

Keywords Paper

avoid knowledge leaking, knowledge distillation

0

0

0

0

9:38

08/12/2020

Collective Wisdom: Improving Low-resource Neural Machine Translation using Adaptive Knowledge Distillation

Fahimeh Saleh, Wray Buntine, Gholamreza Haffari

Keywords Paper

0

0

0

0

9:03

06/12/2021

Dynamic Distillation Network for Cross-Domain Few-Shot Recognition with Unlabeled Data

Ashraful Islam, Chun-Fu (Richard) Chen, Rameswar Panda and
Leonid Karlinsky, Rogerio Feris, Richard J. Radke

Keywords Paper

machine learning, meta learning, few shot learning

0

0

0

0

10:10

13/04/2021

RankDistil: Knowledge distillation for ranking

Sashank Reddi, Rama Kumar Pasumarthi, Aditya Menon and
Ankit Singh Rawat, Felix Yu, Seungyeon Kim, Andreas Veit, Sanjiv Kumar

Keywords Paper

0

0

0

0

2:58