06/12/2021

DominoSearch: Find layer-wise fine-grained N:M sparse schemes from dense neural networks

Wei Sun, Aojun Zhou, Sander Stuijk, Rob Wijnhoven, Andrew Nelson, hongsheng Li, Henk Corporaal

Keywords: deep learning

Abstract: Neural pruning is a widely-used compression technique for Deep Neural Networks (DNNs). Recent innovations in Hardware Architectures (e.g. Nvidia Ampere Sparse Tensor Core) and N:M fine-grained Sparse Neural Network algorithms (i.e. every M-weights contains N non-zero values) reveal a promising research line of neural pruning. However, the existing N:M algorithms only address the challenge of how to train N:M sparse neural networks in a uniform fashion (i.e. every layer has the same N:M sparsity) and suffer from a significant accuracy drop for high sparsity (i.e. when sparsity > 80\%). To tackle this problem, we present a novel technique -- \textbf{\textit{DominoSearch}} to find mixed N:M sparsity schemes from pre-trained dense deep neural networks to achieve higher accuracy than the uniform-sparsity scheme with equivalent complexity constraints (e.g. model size or FLOPs). For instance, for the same model size with 2.1M parameters (87.5\% sparsity), our layer-wise N:M sparse ResNet18 outperforms its uniform counterpart by 2.1\% top-1 accuracy, on the large-scale ImageNet dataset. For the same computational complexity of 227M FLOPs, our layer-wise sparse ResNet18 outperforms the uniform one by 1.3\% top-1 accuracy. Furthermore, our layer-wise fine-grained N:M sparse ResNet50 achieves 76.7\% top-1 accuracy with 5.0M parameters. {This is competitive to the results achieved by layer-wise unstructured sparsity} that is believed to be the upper-bound of Neural Network pruning with respect to the accuracy-sparsity trade-off. We believe that our work can build a strong baseline for further sparse DNN research and encourage future hardware-algorithm co-design work. Our code and models are publicly available at \url{https://github.com/NM-sparsity/DominoSearch}.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at NeurIPS 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd

Similar Papers