Abstract:
Convolution is the most time-consuming part in the computation of convolutional neural networks (CNNs). Due to the complex data dependency and the increase in the amount of model samples, the convolution suffers from high overhead on data movement. This work provides comprehensive analysis and methodologies to minimize the communication for the convolutions in CNNs. With an in-depth analysis on the I/O complexity theory under the red-blue pebble game model, we develop a general communication lower bound theory for a composite algorithm which consists of several different sub-computations. Based on the proposed theory, we establish the data movement lower bound results for three main convolution algorithms in CNNs, which are the direct convolution, the image2col method and Winograd algorithm. Furthermore, derived from I/O lower bound results, we design the near communication-optimal strategies respectively for the three main convolution algorithms by fully exploiting the data reuse. The deep analysis demonstrates that our designs are able to nearly reach the minimum communication in a two-level memory hierarchy.