22/11/2021

Regression as a Simple Yet Effective Tool for Self-supervised Knowledge Distillation

K L Navaneet, Soroush Abbasi Koohpayegani, Ajinkya B Tejankar, Hamed Pirsiavash

Keywords: knowledge distillation, self supervised distillation, regression

Abstract: Feature regression is a simple way to distill larger neural network models to lighter ones. In this work we show that, with simple changes to network architecture, regression can outperform more complex state-of-the-art approaches for knowledge distillation from self-supervised models. Surprisingly, the addition of multi-layer perceptron head to CNN backbone is beneficial even if used only during distillation and discarded for downstream task. Deeper non-linear projections can thus be used to accurately mimic the teacher without changing inference architecture and time. We utilize independent projection heads to simultaneously distill multiple teacher networks. Additionally, we find that using the same weakly augmented image as input for both teacher and student networks is crucial for distillation. Experiments on large scale ImageNet dataset demonstrate the efficacy of the proposed changes in various self-supervised distillation settings.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at BMVC 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd

Similar Papers