At Stability's Edge: How to Adjust Hyperparameters to Preserve Minima Selection in Asynchronous Training of Neural Networks?

26/04/2020

At Stability's Edge: How to Adjust Hyperparameters to Preserve Minima Selection in Asynchronous Training of Neural Networks?

Niv Giladi, Mor Shpigel Nacson, Elad Hoffer, Daniel Soudry

Keywords: implicit bias, stability, neural networks, generalization gap, asynchronous SGD

Abstract Paper Code Similar Papers

Abstract: Background: Recent developments have made it possible to accelerate neural networks training significantly using large batch sizes and data parallelism. Training in an asynchronous fashion, where delay occurs, can make training even more scalable. However, asynchronous training has its pitfalls, mainly a degradation in generalization, even after convergence of the algorithm. This gap remains not well understood, as theoretical analysis so far mainly focused on the convergence rate of asynchronous methods. Contributions: We examine asynchronous training from the perspective of dynamical stability. We find that the degree of delay interacts with the learning rate, to change the set of minima accessible by an asynchronous stochastic gradient descent algorithm. We derive closed-form rules on how the learning rate could be changed, while keeping the accessible set the same. Specifically, for high delay values, we find that the learning rate should be kept inversely proportional to the delay. We then extend this analysis to include momentum. We find momentum should be either turned off, or modified to improve training stability. We provide empirical experiments to validate our theoretical findings.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICLR 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

12/07/2020

Decoupled Greedy Learning of CNNs

Eugene Belilovsky, Michael Eickenberg, Edouard Oyallon

Keywords Paper

Optimization - Large Scale, Parallel and Distributed

0

0

0

0

16:04

26/04/2020

Budgeted Training: Rethinking Deep Neural Network Training Under Resource Constraints

Mengtian Li, Ersin Yumer, Deva Ramanan

Keywords Paper

budgeted training, learning rate schedule, linear schedule, annealing, learning rate decay

0

0

0

0

5:00

26/04/2020

Training Recurrent Neural Networks Online by Learning Explicit State Variables

Somjit Nath, Vincent Liu, Alan Chan and
Xin Li, Adam White, Martha White

Keywords Paper

Recurrent Neural Network, Partial Observability, Online Prediction, Incremental Learning

0

0

0

0

5:06

02/02/2021

Distribution Adaptive INT8 Quantization for Training CNNs

Kang Zhao, Sida Huang, Pan Pan and
Yinghan Li, Yingya Zhang, Zhenyu Gu, Yinghui Xu

Keywords Paper

0

0

0

0

16:42

13/04/2021

Towards flexible device participation in federated learning

Yichen Ruan, Xiaoxi Zhang, Shu-Che Liang, Carlee Joe-Wong

Keywords Paper

0

0

0

0

3:01

03/05/2021

Understanding the effects of data parallelism and sparsity on neural network training

Namhoon Lee, Thalaiyasingam Ajanthan, Philip Torr, Martin Jaggi

Keywords Paper

sparsity, neural network training, data parallelism

0

0

0

0

4:52

06/12/2021

Leveraging Recursive Gumbel-Max Trick for Approximate Inference in Combinatorial Spaces

Kirill Struminsky, Artyom Gadetsky, Denis Rakitin and
Danil Karpushkin, Dmitry Vetrov

Keywords Paper

deep learning, optimization

0

0

0

0

14:26

26/04/2020

Meta-Learning with Warped Gradient Descent

Sebastian Flennerhag, Andrei A. Rusu, Razvan Pascanu and
Francesco Visin, Hujun Yin, Raia Hadsell

Keywords Paper

meta-learning, transfer learning

0

0

0

0

13:43

06/12/2020

Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate

Zhiyuan Li, Kaifeng Lyu, Sanjeev Arora

Keywords Paper

0

0

0

0

3:23

18/07/2021

Opening the Blackbox: Accelerating Neural Differential Equations by Regularizing Internal Solver Heuristics

Avik Pal, Yingbo Ma, Viral Shah, Christopher Rackauckas

Keywords Paper

Deep Learning

0

0

0

0

5:11

06/12/2021

Training Feedback Spiking Neural Networks by Implicit Differentiation on the Equilibrium State

Mingqing Xiao, Qingyan Meng, Zongpeng Zhang and
Yisen Wang, Zhouchen Lin

Keywords Paper

deep learning

0

0

0

0

12:22

12/07/2020

Extrapolation for Large-batch Training in Deep Learning

Tao LIN, Lingjing Kong, Sebastian Stich, Martin Jaggi

Keywords Paper

Deep Learning - Algorithms

0

0

0

0

13:21

06/12/2021

Model, sample, and epoch-wise descents: exact solution of gradient flow in the random feature model

Antoine Bodin, Nicolas Macris

Keywords Paper

deep learning, optimization

0

0

0

0

15:00

13/04/2021

Critical parameters for scalable distributed learning with large batches and asynchronous updates

Sebastian Stich, Amirkeivan Mohtashami, Martin Jaggi

Keywords Paper

0

0

0

0

3:00

18/07/2021

On Monotonic Linear Interpolation of Neural Network Parameters

James Lucas, Juhan Bae, Michael Zhang and
Stanislav Fort, Richard Zemel, Roger Grosse

Keywords Paper

Deep Learning, Others

0

0

0

0

5:03

26/04/2020

SpikeGrad: An ANN-equivalent Computation Model for Implementing Backpropagation with Spikes

Johannes C. Thiele, Olivier Bichler, Antoine Dupret

Keywords Paper

spiking neural network, neuromorphic engineering, backpropagation

0

0

0

0

5:21

06/12/2020

A Generalized Neural Tangent Kernel Analysis for Two-layer Neural Networks

Zixiang Chen, Yuan Cao, Quanquan Gu, Tong Zhang

Keywords Paper

0

0

0

0

3:16

06/12/2021

Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations

Jiayao Zhang, Hua Wang, Weijie Su

Keywords Paper

deep learning, optimization

0

0

0

0

13:45

03/05/2021

How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks

Keyulu Xu, Mozhi Zhang, Jingling Li and
Simon Du, Ken-Ichi Kawarabayashi, Stefanie Jegelka

Keywords Paper

graph neural networks, out-of-distribution, deep learning, extrapolation, deep learning theory

0

0

0

1

17:06

02/02/2021

Any-Precision Deep Neural Networks

Haichao Yu, Haoxiang Li, Humphrey Shi and
Thomas S. Huang, Gang Hua

Keywords Paper

0

0

0

0

14:26

06/12/2020

Temporal Spike Sequence Learning via Backpropagation for Deep Spiking Neural Networks

Wenrui Zhang, Peng Li

Keywords Paper

0

0

0

0

3:06

04/07/2020

Dynamically Adjusting Transformer Batch Size by Monitoring Gradient Direction Change

Hongfei Xu, Josef van Genabith, Deyi Xiong, Qiuhui Liu

Keywords Paper

Dynamically Size, Monitoring Change, accelerating convergence, training

0

0

0

0

5:51

14/09/2020

ADMMiRNN: Training RNN with Stable Convergence via An Efficient ADMM Approach

Yu Tang, Dequan Sun, Linbo Qiao and
Jingjing Xiao , Zhiquan Lai, Dongsheng Li

Keywords Paper

0

0

0

0

14:51

06/12/2021

Beyond BatchNorm: Towards a Unified Understanding of Normalization in Deep Learning

Ekdeep S Lubana, Robert Dick, Hidenori Tanaka

Keywords Paper

deep learning

0

0

0

0

8:28

12/07/2020

On the Generalization Benefit of Noise in Stochastic Gradient Descent

Samuel Smith, Erich Elsen, Soham De

Keywords Paper

Deep Learning - General

0

0

0

0

15:18

06/12/2021

Adversarial Robustness with Semi-Infinite Constrained Learning

Alexander Robey, Luiz Chamon, George J. Pappas and
Hamed Hassani, Alejandro Ribeiro

Keywords Paper

theory, deep learning, optimization, robustness, adversarial robustness and security

0

0

0

0

14:48

06/12/2020

Bayesian Optimization for Iterative Learning

Vu Nguyen, Sebastian Schulze, Michael A Osborne

Keywords Paper

0

0

0

0

3:19

03/05/2021

Meta-Learning with Neural Tangent Kernels

Yufan Zhou, Zhenyi Wang, Jiayi Xian and
Changyou Chen, Jinhui Xu

Keywords Paper

neural tangent kernel, meta-learning

0

0

0

0

3:54

12/07/2020

Deep Reinforcement Learning with Smooth Policy

Qianli Shen, Yan Li, Haoming Jiang and
Zhaoran Wang, Tuo Zhao

Keywords Paper

Reinforcement Learning - Deep RL

0

0

0

0

9:51

03/05/2021

MONGOOSE: A Learnable LSH Framework for Efficient Neural Network Training

Beidi Chen, Zichang Liu, Binghui Peng and
Zhaozhuo Xu, Jonathan L Li, Tri Dao, Zhao Song, Anshumali Shrivastava, Christopher Re

Keywords Paper

Randomized Algorithms, Efficient Training, Large-scale Machine Learning, Large-scale Deep Learning

0

0

0

0

15:07

12/07/2020

Obtaining Adjustable Regularization for Free via Iterate Averaging

Jingfeng Wu, Vladimir Braverman, Lin Yang

Keywords Paper

Optimization - General

0

0

0

0

12:07

06/12/2021

Reliable Estimation of KL Divergence using a Discriminator in Reproducing Kernel Hilbert Space

Sandesh Ghimire, Aria Masoomi, Jennifer Dy

Keywords Paper

theory, deep learning, machine learning, kernel methods

0

0

0

0

14:58

12/07/2020

AdaScale SGD: A User-Friendly Algorithm for Distributed Training

Tyler Johnson, Pulkit Agrawal, Haijie Gu, Carlos Guestrin

Keywords Paper

Optimization - Large Scale, Parallel and Distributed

0

0

0

0

14:22

14/06/2020

Sideways: Depth-Parallel Training of Video Models

Mateusz Malinowski, Grzegorz Świrszcz, João Carreira, Viorica Pătrăucean

Keywords Paper

efficient training and inference, video models, video understanding, backpropagation, backprop, scalable machine learning, depth-parallel training

0

0

0

0

0:59

26/04/2020

Why Gradient Clipping Accelerates Training: A Theoretical Justification for Adaptivity

Jingzhao Zhang, Tianxing He, Suvrit Sra, Ali Jadbabaie

Keywords Paper

Adaptive methods, optimization, deep learning

1

0

0

0

14:15

06/12/2021

Fast Axiomatic Attribution for Neural Networks

Robin Hesse, Simone Schaub-Meyer, Stefan Roth

Keywords Paper

deep learning, interpretability

0

0

0

0

14:49

03/05/2021

On the Curse of Memory in Recurrent Neural Networks: Approximation and Optimization Analysis

Zhong Li, Jiequn Han, Weinan E, Qianxiao Li

Keywords Paper

universal approximation, optimization, curse of memory, recurrent neural network, dynamical system

0

0

0

0

5:00

06/12/2021

What training reveals about neural network complexity

Andreas Loukas, Marinos Poiitis, Stefanie Jegelka

Keywords Paper

deep learning

0

0

0

0

8:29

16/11/2020

Generative adversarial training of product of policies for robust and adaptive movement primitives

Emmanuel Pignat, Hakan Girgin, Sylvain Calinon

Keywords Paper

0

0

0

0

4:26

18/07/2021

Nondeterminism and Instability in Neural Network Optimization

Cecilia Summers, Michael J Dinneen

Keywords Paper

Deep Learning, Optimization for Deep Networks

0

0

0

0

5:12