Boosting the Throughput and Accelerator Utilization of Specialized CNN Inference Beyond Increasing Batch Size

Abstract: Datacenter vision systems widely use small, specialized convolutional neural networks (CNNs) trained on specific tasks for high-throughput inference. These settings employ accelerators with massive computational capacity, but which specialized CNNs underutilize due to having low arithmetic intensity. This results in suboptimal application-level throughput and poor returns on accelerator investment. Increasing batch size is the only known way to increase both application-level throughput and accelerator utilization for inference, but yields diminishing returns; specialized CNNs poorly utilize accelerators even with large batch size. We propose FoldedCNNs, a new approach to CNN design that increases inference throughput and utilization beyond large batch size. FoldedCNNs rethink the structure of inputs and layers of specialized CNNs to boost arithmetic intensity: in FoldedCNNs, f images with C channels each are concatenated into a single input with fC channels and jointly classified by a wider CNN. Increased arithmetic intensity in FoldedCNNs increases the throughput and GPU utilization of specialized CNN inference by up to 2.5x and 2.8x, with accuracy close to the original CNN in most cases.

05/01/2021

Hao Tang, Xingwei Liu, Kun Han and
Xiaohui Xie, Xuming Chen, Huang Qian, Yong Liu, Shanlin Sun, Narisu Bai

single image denoising, noise data synthesis, camera imaging pipeline, srgb denoising, raw denoising, denoising algorithm, camera isp, stereoscopic cinema, spatial and channel attention, color matching

5:00

06/12/2021

Pieter-Jan Hoedt, Frederik Kratzert, Daniel Klotz and
Christina Halmich, Markus Holzleitner, Grey Nearing, Sepp Hochreiter, Günter Klambauer

Yunsheng Li, Yinpeng Chen, Xiyang Dai and
mengchen liu, Dongdong Chen, Ye Yu, Lu Yuan, Zicheng Liu, Mei Chen, Nuno Vasconcelos

Diogo C. Luvizon, Gustavo Sutter P. Carvalho, Andreza A. dos Santos and
Jhonatas S. Conceicao, Jose L. Flores-Campana, Luis G. L. Decker, Marcos R. Souza, Helio Pedrini, Antonio Joia, Otavio A. B. Penatti

Oliver Cobb, Christopher Wallis, Augustine Mavor-Parker and
Augustin Marignier, Matthew Price, Mayeul d'Avezac, Jason McEwen

3d semantic segmentation, segmentation, 3d vision, medical imaging, 3d imaging, brain segmentation, liver segmentation, implicit representation

2:56

03/05/2021

object detection, attention, video object detection, domain adaptation, generalization, static cameras, camera traps, low-quality data, conservation, climate change

1:01

06/12/2020