02/02/2021

Empowering Adaptive Early-Exit Inference with Latency Awareness

Xinrui Tan, Hongjia Li, Liming Wang, Xueqing Huang, Zhen Xu

Keywords:

Abstract: With the capability of trading accuracy for latency on-the-fly, the technique of adaptive early-exit inference has emerged as a promising line of research to accelerate the deep learning inference. However, studies in this line of research commonly use a group of thresholds to control the accuracy-latency trade-off, where a thorough and general methodology on how to determine these thresholds has not been conducted yet, especially with regard to the common requirements of average inference latency. To address this issue and enable latency-aware adaptive early-exit inference, in the present paper, we approximately formulate the threshold determination problem of finding the accuracy-maximum threshold setting that meets a given average latency requirement, and then propose a threshold determination method to tackle our formulated non-convex problem. Theoretically, we prove that, for certain parameter settings, our method finds an approximate stationary point of the formulated problem. Empirically, on top of various models across multiple datasets (CIFAR-10, CIFAR-100, ImageNet and two time-series datasets), we show that our method can well handle the average latency requirements, and consistently finds good threshold settings in negligible time.

The video of this talk cannot be embedded. You can watch it here:
https://slideslive.com/38949072
(Link will open in new window)
 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at AAAI 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd Characters remaining: 140

Similar Papers