FracBits: Mixed Precision Quantization via Fractional Bit-Widths

02/02/2021

FracBits: Mixed Precision Quantization via Fractional Bit-Widths

Linjie Yang, Qing Jin

Keywords:

Abstract Paper Similar Papers

Abstract: Model quantization helps to reduce model size and latency of deep neural networks. Mixed precision quantization is favorable with customized hardwares supporting arithmetic operations at multiple bit-widths to achieve maximum efficiency. We propose a novel learning-based algorithm to derive mixed precision models end-to-end under target computation constraints and model sizes. During the optimization, the bit-width of each layer / kernel in the model is at a fractional status of two consecutive bit-widths which can be adjusted gradually. With a differentiable regularization term, the resource constraints can be met during the quantization-aware training which results in an optimized mixed precision model. Our final models achieve comparable or better performance than previous quantization methods with mixed precision on MobilenetV1/V2, ResNet18 under different resource constraints on ImageNet dataset.

The video of this talk cannot be embedded. You can watch it here:

https://slideslive.com/38948035

(Link will open in new window)

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at AAAI 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

02/02/2021

Harmonized Dense Knowledge Distillation Training for Multi-Exit Architectures

Xinglu Wang, Yingming Li

Keywords Paper

0

0

0

0

15:12

06/12/2020

Bayesian Bits: Unifying Quantization and Pruning

Mart van Baalen, Christos Louizos, Markus Nagel and
Rana Ali Amjad, Ying Wang, Tijmen Blankevoort, Max Welling

Keywords Paper

0

0

0

0

3:15

06/12/2020

Memory-Efficient Learning of Stable Linear Dynamical Systems for Prediction and Control

Giorgos Mamakoukas, Orest Xherija, Todd Murphey

Keywords Paper

Optimization -> Non-Convex Optimization, Optimization -> Stochastic Optimization

0

0

0

0

3:13

12/07/2020

Learning Autoencoders with Relational Regularization

Hongteng Xu, Dixin Luo, Ricardo Henao and
Svati Shah, Lawrence Carin

Keywords Paper

Deep Learning - Generative Models and Autoencoders

0

0

0

0

13:59

03/08/2020

An Interpretable and Sample Efficient Deep Kernel for Gaussian Process

Yijue Dai, Tianjian Zhang, Zhidi Lin and
Feng Yin, Sergios Theodoridis, Shuguang Cui

Keywords Paper

0

0

0

0

8:31

03/05/2021

LambdaNetworks: Modeling long-range Interactions without Attention

Irwan Bello

Keywords Paper

attention, neural networks, image classification, deep learning, vision, transformer

0

0

0

0

9:59

06/12/2021

Post-Training Quantization for Vision Transformer

Zhenhua Liu, Yunhe Wang, Kai Han and
Wei Zhang, Siwei Ma, Wen Gao

Keywords Paper

deep learning, transformers, vision

0

0

0

0

5:52

03/05/2021

Reducing the Computational Cost of Deep Generative Models with Binary Neural Networks

Thomas Bird, Friso Kingma, David Barber

Keywords Paper

generative, binary, optimization, compression

0

0

0

0

5:14

02/02/2021

A Flexible Framework for Communication-Efficient Machine Learning

Sarit Khirirat, Sindri Magnússon, Arda Aytekin, Mikael Johansson

Keywords Paper

0

0

0

0

17:49

06/12/2020

Adaptive Discretization for Model-Based Reinforcement Learning

Sean Sinclair, Tianyu Wang, Gauri Jain and
Sid Banerjee, Christina Yu

Keywords Paper

0

0

0

0

3:12

05/04/2021

Pipelined Backpropagation at Scale: Training Large Models without Batches

Atli Kosson, Vitaliy Chiley, Abhi Venigalla and
Joel Hestness, Urs Koster

Keywords Paper

0

0

0

0

4:14

05/04/2021

Pipelined Backpropagation at Scale: Training Large Models without Batches

Atli Kosson, Vitaliy Chiley, Abhi Venigalla and
Joel Hestness, Urs Koster

Keywords Paper

0

0

0

0

18:00

06/12/2020

Collegial Ensembles

Etai Littwin, Ben Myara, Sima Sabah and
Joshua Susskind, Shuangfei Zhai, Oren Golan

Keywords Paper

0

0

0

0

3:17

06/12/2020

Delta-STN: Efficient Bilevel Optimization for Neural Networks using Structured Response Jacobians

Juhan Bae, Roger Grosse

Keywords Paper

0

0

0

0

3:20

14/06/2020

Exemplar Normalization for Learning Deep Representation

Ruimao Zhang, Zhanglin Peng, Lingyun Wu and
Zhen Li, Ping Luo

Keywords Paper

normalization, learning to normalize, sample-adaptive, deep learning, image classification, semantic segmentation

0

0

0

0

1:00

06/12/2020

Efficient Learning of Generative Models via Finite-Difference Score Matching

Tianyu Pang, Kun Xu, Chongxuan LI and
Yang Song, Stefano Ermon, Jun Zhu

Keywords Paper

0

0

0

0

2:59

05/01/2021

Multi-Path Neural Networks for On-Device Multi-Domain Visual Classification

Qifei Wang, Junjie Ke, Joshua Greaves and
Grace Chu, Gabriel Bender, Luciano Sbaiz, Alec Go, Andrew Howard, Ming-Hsuan Yang, Jeff Gilbert, Peyman Milanfar, Feng Yang

Keywords Paper

0

0

0

1

5:01

07/09/2020

From Quantized DNNs to Quantizable DNNs

Kunyuan Du, Ya Zhang, Haibing Guan

Keywords Paper

Quantized DNNs, Dynamic Bit-width

0

0

0

0

4:05

06/12/2021

Global Filter Networks for Image Classification

Yongming Rao, Wenliang Zhao, Zheng Zhu and
Jiwen Lu, Jie Zhou

Keywords Paper

machine learning, robustness, transformers, vision

0

0

0

0

9:28

06/12/2020

Revisiting the Sample Complexity of Sparse Spectrum Approximation of Gaussian Processes

Minh Hoang, Nghia Hoang, Hai Pham, David Woodruff

Keywords Paper

, Deep Learning

0

0

0

0

3:25

14/06/2020

MUXConv: Information Multiplexing in Convolutional Neural Networks

Zhichao Lu, Kalyanmoy Deb, Vishnu Naresh Boddeti

Keywords Paper

convolutional neural networks, neural architecture search, evolutionary algorithms

0

0

0

0

0:56

14/06/2020

APQ: Joint Search for Network Architecture, Pruning and Quantization Policy

Tianzhe Wang, Kuan Wang, Han Cai and
Ji Lin, Zhijian Liu, Hanrui Wang, Yujun Lin, Song Han

Keywords Paper

efficiency, model compression, joint design, neural architecture search, channel pruning, mixed-precision quantization

0

0

0

0

1:00

05/01/2021

RGPNet: A Real-Time General Purpose Semantic Segmentation

Elahe Arani, Shabbir Marzban, Andrei Pata, Bahram Zonooz

Keywords Paper

0

0

0

0

3:36

05/04/2021

VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference

Steve Dai, Rangha Venkatesan, Mark Ren and
Brian Zimmer, William Dally, Brucek Khailany

Keywords Paper

Deep Learning -> Generative Models, Algorithms -> Similarity and Distance Learning

0

0

0

0

5:01

05/04/2021

VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference

Steve Dai, Rangha Venkatesan, Mark Ren and
Brian Zimmer, William Dally, Brucek Khailany

Keywords Paper

Deep Learning -> Generative Models, Algorithms -> Similarity and Distance Learning

0

0

0

0

19:08

02/02/2021

Memory and Computation-Efficient Kernel SVM via Binary Embedding and Ternary Model Coefficients

Zijian Lei, Liang Lan

Keywords Paper

0

0

0

0

12:29

18/07/2021

Randomized Dimensionality Reduction for Facility Location and Single-Linkage Clustering

Shyam Narayanan, Sandeep Silwal, Piotr Indyk, Or Zamir

Keywords Paper

Algorithms, Dimensionality Reduction

0

0

0

0

5:00

06/12/2021

Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex Decentralized Optimization Over Time-Varying Networks

Dmitry Kovalev, Elnur Gasanov, Alexander Gasnikov, Peter Richtarik

Keywords Paper

optimization

0

0

0

0

15:02

18/07/2021

Group Fisher Pruning for Practical Network Compression

Liyang Liu, Shilong Zhang, Zhanghui Kuang and
Aojun Zhou, Jing-Hao Xue, Xinjiang Wang, Yimin Chen, Wenming Yang, Qingmin Liao, Wayne Zhang

Keywords Paper

Applications, Computer Vision

0

0

0

0

5:05

22/11/2021

Hardware-Aware Mixed-Precision Neural Networks using In-Train Quantization

Manoj Rohit Vemparala, Nael Fasfous, Lukas Frickenstein and
Alexander Frickenstein, Anmol Singh, Driton Salihu, Christian Unger, Naveen Shankar Nagaraja, WALTER STECHELE

Keywords Paper

Quantization, Inference, Neural Network Compression, Mixed Precision, Hardware Aware Networks

0

0

0

0

2:58

06/12/2020

Learning Discrete Energy-based Models via Auxiliary-variable Local Exploration

Hanjun Dai, Rishabh Singh, Bo Dai and
Charles Sutton, Dale Schuurmans

Keywords Paper

0

0

0

0

3:23

03/05/2021

Neurally Augmented ALISTA

Freya Behrens, Jonathan Sauder, Peter Jung

Keywords Paper

learned ISTA, unrolled algorithms, compressed sensing, sparse reconstruction

0

0

0

0

5:18

06/12/2020

Model Fusion via Optimal Transport

Sidak Pal Singh, Martin Jaggi

Keywords Paper

1

0

0

1

3:10

18/07/2021

Sparsifying Networks via Subdifferential Inclusion

Sagar Verma, Jean-Christophe Pesquet

Keywords Paper

Optimization, Convex Optimization

0

0

0

0

5:10

22/11/2021

Parameter Efficient Dynamic Convolution via Tensor Decomposition

Zejiang Hou, Sun-Yuan Kung

Keywords Paper

dynamic convolution, input-dependent reparameterization, parameter efficiency, tensor decomposition

0

0

0

0

3:58

26/04/2020

Learned step size quantization

Steven K. Esser, Jeffrey L. McKinstry, Deepika Bablani and
Rathinakumar Appuswamy, Dharmendra S. Modha

Keywords Paper

deep learning, low precision, classification, quantization

0

0

0

0

4:40

14/06/2020

Structured Compression by Weight Encryption for Unstructured Pruning and Quantization

Se Jung Kwon, Dongsoo Lee, Byeongwook Kim and
Parichay Kapoor, Baeseong Park, Gu-Yeon Wei

Keywords Paper

model compression, quantization, pruning, xor gate, parallelism, memory bandwidth, sparse matrix, structured format

0

0

0

0

0:59

03/05/2021

MALI: A memory efficient and reverse accurate integrator for Neural ODEs

Juntang Zhuang, Nicha C Dvornek, sekhar tatikonda, James s Duncan

Keywords Paper

neural ode, memory efficient, gradient estimation, reverse accuracy

0

0

0

0

5:12

06/12/2020

Non-Euclidean Universal Approximation

Anastasis Kratsios, Eugene Bilokopytov

Keywords Paper

0

0

0

0

3:34

14/09/2020

Squeezing Correlated Neurons for Resource-Efficient Deep Neural Networks

Elbruz Ozen, Alex Orailoglu

Keywords Paper

deep learning, information redundancy, pruning

0

0

0

0

14:48