22/11/2021

Hardware-Aware Mixed-Precision Neural Networks using In-Train Quantization

Manoj Rohit Vemparala, Nael Fasfous, Lukas Frickenstein, Alexander Frickenstein, Anmol Singh, Driton Salihu, Christian Unger, Naveen Shankar Nagaraja, WALTER STECHELE

Keywords: Quantization, Inference, Neural Network Compression, Mixed Precision, Hardware Aware Networks

Abstract: Fixed-point quantization is an effective method to reduce the model size and computational demand of convolutional neural networks, by lowering the numerical precision of all layers down to a specific bit-width. Recent work shows assigning layer-wise specific bit-widths has an advantage over uniform assignments, although requiring complex, post-training search techniques and many GPU hours to identify the optimal bit-width strategy. To alleviate this, we propose an in-train quantization method that can directly learn the optimal bit-widths for weights and activations during the gradient-based training process. We incorporate hardware-awareness into the gradient-based optimization to directly improve the real hardware execution metrics. We replace the discrete and non-differentiable hardware measurements with a differentiable Gaussian process regressor. This provides accurate hardware predictions as an auxiliary loss to the gradient-descent optimizer, performing hardware-friendly in-train quantization. Our hardware-aware mixed-precision ResNet56 achieves an improvement of 1.3 x in execution latency compared to the uniform 4-bit quantization with no degradation in accuracy. Finally, we highlight the effectiveness of the in-train quantization method in the context of adversarial training, improving the trade-off between prediction accuracy and robustness.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at BMVC 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd

Similar Papers