Abstract:
Initialization, residual learning, and normalization are believed to be three indispensable techniques for training very deep convolutional neural networks and obtaining state-of-the-art performance. This paper shows that deep vanilla ConvNets without normalization nor residual structure can also be trained to achieve surprisingly good performance on standard image recognition benchmarks (ImageNet, and COCO). This is achieved by enforcing the convolution kernels to be near isometric during initialization and training, as well as by using a variant of ReLU that is shifted towards being isometric. Further experiments show that if combined with residual structure, such near isometric networks can achieve performances on par with the standard ResNet, even without normalization at all.