Boosting Deep Neural Network Efficiency with Dual-Module Inference

Abstract: Using Deep Neural Networks (DNNs) in machine learning tasks is promising in delivering high-quality results but challenging to meet stringent latency requirements and energy constraints because of the memory-bound and the compute-bound execution pattern of DNNs. We propose a big-little dual-module inference to dynamically skip unnecessary memory access and computation to speedup DNN inference. Leveraging the error-resilient feature of nonlinear activation functions used in DNNs, we propose to use a lightweight little module that approximates the original DNN layer, which is referred to as the big module, to compute activations of the insensitive region that are more error-resilient. The expensive memory access and computation of the big module can be reduced as the results are only used in the sensitive region. For memory-bound models, our method can reduce the overall memory access by 40% on average and achieve 1.54x to 1.75x speedup on a commodity CPU-based server platform with a negligible impact on model quality. In addition, our method can reduce the operations of the compute-bound ResNet model by 3.02x, with only a 0.5% accuracy drop.

06/12/2021

Spike-Thrift: Towards Energy-Efficient Deep Spiking Neural Networks by Limiting Spiking Activity via Attention-Guided Compression

backpropagation, rtrl, real time recurrent learning, forward mode, biologically plausible, bptt, recurrent neural networks

10:12

12/07/2020

Boosting Deep Neural Network Efficiency with Dual-Module Inference

Liu Liu, Lei Deng, Zhaodong Chen, yuke wang, Shuangchen Li, Jingwei Zhang, Yihua Yang, Zhenyu Gu, Yufei Ding, Yuan Xie

Comments

Similar Papers

Memory-efficient Patch-based Inference for Tiny Deep Learning

Ji Lin, Wei-Ming Chen, Han Cai and Chuang Gan, Song Han

Keywords Abstract Paper

deep learning, machine learning, vision

Structured Multi-Hashing for Model Compression

Elad Eban, Yair Movshovitz-Attias, Hao Wu and Mark Sandler, Andrew Poon, Yerlan Idelbayev, Miguel Á. Carreira-Perpiñán

Keywords Abstract Paper

compression, weight hashing, on device

Spike-Thrift: Towards Energy-Efficient Deep Spiking Neural Networks by Limiting Spiking Activity via Attention-Guided Compression

Souvik Kundu, Gourav Datta, Massoud Pedram, Peter A. Beerel

Keywords Abstract Paper

Network Pruning by Greedy Subnetwork Selection

Mao Ye, Chengyue Gong, Lizhen Nie and Denny Zhou, Adam Klivans, Qiang Liu

Keywords Abstract Paper

Deep Learning - General

Train Big, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Zhuohan Li, Eric Wallace, Sheng Shen and Kevin Lin, Kurt Keutzer, Dan Klein, Joseph Gonzalez

Keywords Abstract Paper

Applications - Language, Speech and Dialog

Revisiting Locally Supervised Learning: an Alternative to End-to-end Training

Yulin Wang, Zanlin Ni, Shiji Song and Le Yang, Gao Huang

Keywords Abstract Paper

Deep learning, Locally supervised training

Greedy Optimization Provably Wins the Lottery: Logarithmic Number of Winning Tickets is Enough

Mao Ye, lemon woo, Qiang Liu

Keywords Abstract Paper

RETRIEVE: Coreset Selection for Efficient and Robust Semi-Supervised Learning

Krishnateja Killamsetty, Xujiang Zhao, Feng Chen, Rishabh Iyer

Keywords Abstract Paper

optimization, semi-supervised learning

Dynamic Routing Networks

Shaofeng Cai, Yao Shu, Wei Wang

Keywords Abstract Paper

Minimizing FLOPs to Learn Efficient Sparse Representations

Biswajit Paria, Chih-Kuan Yeh, Ian E.H. Yen and Ning Xu, Pradeep Ravikumar, Barnabás Póczos

Keywords Abstract Paper

sparse embeddings, deep representations, metric learning, regularization

GPU-Accelerated Primal Learning for Extremely Fast Large-Scale Classification

John Halloran, David M Rocke

Keywords Abstract Paper

Low-Rank Compression of Neural Nets: Learning the Rank of Each Layer

Yerlan Idelbayev, Miguel Á. Carreira-Perpiñán

Keywords Abstract Paper

low-rank compression, rank selection, optimization, discrete-continuous optimization

Rethinking pruning for accelerating deep inference at the edge

Dawei Gao, Xiaoxi He, Zimu Zhou and Yongxin Tong, Ke Xu, Lothar Thiele

Keywords Abstract Paper

automatic speech recognition, deep learning, name entity recognition, network pruning, sequence labelling

AC-GC: Lossy Activation Compression with Guaranteed Convergence

R David Evans, Tor Aamodt

Keywords Abstract Paper

deep learning, optimization, graph learning

A Novel Sequential Coreset Method for Gradient Descent Algorithms

Jiawei Huang, Ruomin Huang, wenjie liu and Nikolaos Freris, Hu Ding

Keywords Abstract Paper

Optimization

Probabilistic Connection Importance Inference and Lossless Compression of Deep Neural Networks

Xin Xing, Long Sha, Pengyu Hong and Zuofeng Shang, Jun S. Liu

Keywords Abstract Paper

Training Adversarially Robust Sparse Networks via Bayesian Connectivity Sampling

Ozan Özdenizci, Robert Legenstein

Keywords Abstract Paper

Algorithms, Adversarial Examples

Dynamic Model Pruning with Feedback

Tao Lin, Sebastian U. Stich, Luis Barba and Daniil Dmitriev, Martin Jaggi

Keywords Abstract Paper

network pruning, dynamic reparameterization, model compression

VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference

Steve Dai, Rangha Venkatesan, Mark Ren and Brian Zimmer, William Dally, Brucek Khailany

Keywords Abstract Paper

Deep Learning -> Generative Models, Algorithms -> Similarity and Distance Learning

VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference

Steve Dai, Rangha Venkatesan, Mark Ren and Brian Zimmer, William Dally, Brucek Khailany

Keywords Abstract Paper

Deep Learning -> Generative Models, Algorithms -> Similarity and Distance Learning

Shapeshifter: a Parameter-efficient Transformer using Factorized Reshaped Matrices

Ji Lin, Wei-Ming Chen, Han Cai and
Chuang Gan, Song Han

Keywords Paper

Elad Eban, Yair Movshovitz-Attias, Hao Wu and
Mark Sandler, Andrew Poon, Yerlan Idelbayev, Miguel Á. Carreira-Perpiñán

Keywords Paper

Keywords Paper

Mao Ye, Chengyue Gong, Lizhen Nie and
Denny Zhou, Adam Klivans, Qiang Liu

Keywords Paper

Zhuohan Li, Eric Wallace, Sheng Shen and
Kevin Lin, Kurt Keutzer, Dan Klein, Joseph Gonzalez

Keywords Paper

Yulin Wang, Zanlin Ni, Shiji Song and
Le Yang, Gao Huang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Biswajit Paria, Chih-Kuan Yeh, Ian E.H. Yen and
Ning Xu, Pradeep Ravikumar, Barnabás Póczos

Keywords Paper

Keywords Paper

Keywords Paper

Dawei Gao, Xiaoxi He, Zimu Zhou and
Yongxin Tong, Ke Xu, Lothar Thiele

Keywords Paper

Keywords Paper

Jiawei Huang, Ruomin Huang, wenjie liu and
Nikolaos Freris, Hu Ding

Keywords Paper

Xin Xing, Long Sha, Pengyu Hong and
Zuofeng Shang, Jun S. Liu

Keywords Paper

Keywords Paper

Tao Lin, Sebastian U. Stich, Luis Barba and
Daniil Dmitriev, Martin Jaggi

Keywords Paper

Steve Dai, Rangha Venkatesan, Mark Ren and
Brian Zimmer, William Dally, Brucek Khailany

Keywords Paper

Steve Dai, Rangha Venkatesan, Mark Ren and
Brian Zimmer, William Dally, Brucek Khailany

Keywords Paper

Keywords Paper

Yinpeng Chen, Xiyang Dai, Mengchen Liu and
Dongdong Chen, Lu Yuan, Zicheng Liu

Keywords Paper

Alessandro De Palma, Harkirat Singh Behl, Rudy R Bunel and
Philip Torr, M. Pawan Kumar

Keywords Paper

Matthew Khoury, Rumen Dangovski, Longwu Ou and
Preslav Nakov, Yichen Shen, Li Jing

Keywords Paper

Jacob Menick, Erich Elsen, Utku Evci and
Simon Osindero, Karen Simonyan, Alex Graves

Keywords Paper

Hanjun Dai, Azade Nazi, Yujia Li and
Bo Dai, Dale Schuurmans

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Mark Kurtz, Justin Kopinsky, Rati Gelashvili and
Alexander Matveev, John Carr, Michael Goin, William Leiserson, Sage Moore, Nir Shavit, Dan Alistarh

Keywords Paper

Hedi Xia, Vai Suliafu, Hangjie Ji and
Tan Nguyen, Andrea Bertozzi, Stanley Osher, Bao Wang

Keywords Paper

Keywords Paper

Keywords Paper

Parichehr Behjati, Pau Rodriguez, Armin Mehri and
Isabelle Hupont, Carles Fernandez Tena, Jordi Gonzalez

Keywords Paper

Keywords Paper