12/07/2020

Go Wide, Then Narrow: Efficient Training of Deep Thin Networks

Denny Zhou, Mao Ye, Chen Chen, Mingxing Tan, Tianjian Meng, Xiaodan Song, Quoc Le, Qiang Liu, Dale Schuurmans

Keywords: Deep Learning - Algorithms

Abstract: We propose an efficient algorithm to train a very deep and thin network with theoretic guarantee. Our method is motivated by model compression, and consists of three stages. In the first stage, we widen the deep thin network and train it until convergence. In the second stage, we use this well trained deep wide network to warm up or initialize the original deep thin network. In the last stage, we train this well initialized deep thin network until convergence. The key ingredient of our method is its second stage, in which the thin network is gradually warmed up by imitating the intermediate outputs of the wide network from bottom to top. We establish theoretical guarantee using mean field analysis. We show that our method is provably more efficient than directly training a deep thin network from scratch. We also conduct empirical evaluations on image classification and language modeling. By training with our approach, ResNet50 can outperform ResNet101 which is normally trained as in the literature, and BERT_BASE can be comparable with BERT_LARGE.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at ICML 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd

Similar Papers