05/01/2021

Phase-Wise Parameter Aggregation for Improving SGD Optimization

Takumi Kobayashi

Keywords:

Abstract: Stochastic gradient descent (SGD) is successfully applied to train deep convolutional neural networks (CNNs) on various computer vision tasks. Since fixed step-size SGD converges to so-called error plateau, it is applied in combination with decaying learning rate to reach a favorable optimum. In this paper, we propose a simple yet effective optimization method to improve SGD with a phase-wise decay of learning rate. Through analyzing both a loss surface around the error plateau and a structure of the SGD optimization process, the proposed method is formulated to improve convergence as well as initialization at each training phase by efficiently aggregating the CNN parameters along the optimization sequence. The method keeps the simplicity of SGD while touching the SGD procedure only a few times during training. The experimental results on image classification tasks thoroughly validate the effectiveness of the proposed method in comparison to the other methods.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at WACV 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd Characters remaining: 140

Similar Papers