22/11/2021

Multi-bit Adaptive Distillation for Binary Neural Networks

Ying Nie, Kai Han, Yunhe Wang

Keywords: binary, distillation, 1bit

Abstract: Binary neural networks (BNNs) represent weights and activations using 1-bit values, which has extremely lower memory costs and computational complexities, but usually suffer from severe accuracy degradation. Knowledge distillation is an effective way to improve the performance of BNN by inheriting the knowledge from higher-bit network. However, faced with the accuracy gap and bit gap between 1-bit network and different higher-bit networks, it is uncertain which higher-bit network is more suitable to be the teacher of a certain BNN. Therefore, we propose a novel multi-bit adaptive distillation(MAD) method for maximally integrating the advantages of various bit-width teacher networks(e.g. 2-bit, 4-bit, 8-bit and 32-bit). In practice, intermediate features and output logits of teachers will be simultaneously utilized for improving the performance of BNN. Moreover, an adaptive knowledge adjusting scheme is explored to dynamically adjust the contribution of different teachers in the distillation process. Comprehensive experiments conducted on CIFAR-10/100 and ImageNet datasets with various network architectures demonstrate the superiorities of MAD over many state-of-the-arts binarization methods. For instance, without introducing any extra inference calculations, our binarized ResNet-18 achieves 1.5% improvement for BirealNet binarization method on ImageNet.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at BMVC 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd

Similar Papers