07/09/2020

STQ-Nets: Unifying Network Binarization and Structured Pruning

Aurobindo Munagala, Ameya Prabhu, Anoop Namboodiri

Keywords: quantization, binary networks, binarization, pruning, compression, inference

Abstract: We discuss a formulation for network compression combining two major paradigms: binarization and pruning. Past works on network binarization have demonstrated that networks are robust to the removal of activation/weight magnitude information, and can perform comparably to full-precision networks with signs alone. Pruning focuses on generating efficient and sparse networks. Both compression paradigms aid deployment in portable settings, where storage, compute and power are limited. We argue that these paradigms are complementary, and can be combined to offer high levels of compression and speedup without any significant accuracy loss. Intuitively, weights/activations closer to zero have higher binarization error making them good candidates for pruning. Our proposed formulation incorporates speedups from binary convolution algorithms through structured pruning, enabling the removal of pruned parts of the network entirely post-training, beating previous works attempting the same by a significant margin. Overall, our method brings up to 89x layer-wise compression over the corresponding full-precision networks -- achieving only 0.33% loss on CIFAR-10 with ResNet-18 with a 40% PFR (Prune Factor Ratio for filters), and 0.3% on ImageNet with ResNet-18 with a 19% PFR.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at BMVC 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd

Similar Papers