Understanding gradient clipping in incremental gradient methods

13/04/2021

Understanding gradient clipping in incremental gradient methods

Jiang Qian, Yuren Wu, Bojin Zhuang, Shaojun Wang, Jing Xiao

Keywords:

Abstract Paper Similar Papers

Abstract: We provide a theoretical analysis on how gradient clipping affects the convergence of the incremental gradient methods on minimizing an objective function that is the sum of a large number of component functions. We show that clipping on gradients of component functions leads to bias on the descent direction, which is affected by the clipping threshold, the norms of gradients of component functions, together with the angles between gradients of component functions and the full gradient. We then propose some sufficient conditions under which the increment gradient methods with gradient clipping can be shown to be convergent under the more general relaxed smoothness assumption. We also empirically observe that the angles between gradients of component functions and the full gradient generally decrease as the batchsize increases, which may help to explain why larger batchsizes generally lead to faster convergence in training deep neural networks with gradient clipping.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at AISTATS 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

26/04/2020

Gradient Descent Maximizes the Margin of Homogeneous Neural Networks

Kaifeng Lyu, Jian Li

Keywords Paper

margin, homogeneous, gradient descent

0

0

0

0

15:02

06/12/2021

On the Convergence of Step Decay Step-Size for Stochastic Optimization

Xiaoyu Wang, Sindri Magnússon, Mikael Johansson

Keywords Paper

deep learning, optimization, machine learning

0

0

0

0

14:58

26/04/2020

Why Gradient Clipping Accelerates Training: A Theoretical Justification for Adaptivity

Jingzhao Zhang, Tianxing He, Suvrit Sra, Ali Jadbabaie

Keywords Paper

Adaptive methods, optimization, deep learning

1

0

0

0

14:15

06/12/2021

ErrorCompensatedX: error compensation for variance reduced algorithms

Hanlin Tang, Yao Li, Ji Liu, Ming Yan

Keywords Paper

optimization

0

0

0

0

14:38

06/12/2020

Improved Analysis of Clipping Algorithms for Non-convex Optimization

Bohang Zhang, Jikai Jin, Cong Fang, Liwei Wang

Keywords Paper

0

0

0

0

3:16

12/07/2020

Analyzing the effect of neural network architecture on training performance

Karthik Abinav Sankararaman, Soham De, Zheng Xu and
W. Ronny Huang, Tom Goldstein

Keywords Paper

Deep Learning - Theory

0

0

0

0

14:03

06/12/2021

Implicit Sparse Regularization: The Impact of Depth and Early Stopping

Jiangyuan Li, Thanh Nguyen, Chinmay Hegde, Ka Wai Wong

Keywords Paper

optimization

0

0

0

0

10:10

12/07/2020

Statistically Preconditioned Accelerated Gradient Method for Distributed Optimization

Hadrien Hendrikx, Lin Xiao, Sebastien Bubeck and
Francis Bach, Laurent Massoulié

Keywords Paper

Optimization - Large Scale, Parallel and Distributed

0

0

0

0

14:37

04/07/2020

Dynamically Adjusting Transformer Batch Size by Monitoring Gradient Direction Change

Hongfei Xu, Josef van Genabith, Deyi Xiong, Qiuhui Liu

Keywords Paper

Dynamically Size, Monitoring Change, accelerating convergence, training

0

0

0

0

5:51

18/07/2021

Tensor Programs IV: Feature Learning in Infinite-Width Neural Networks

Greg Yang, Edward Hu

Keywords Paper

Theory, Deep learning Theory

0

0

0

0

5:22

26/04/2020

Adaptive Correlated Monte Carlo for Contextual Categorical Sequence Generation

Xinjie Fan, Yizhe Zhang, Zhendong Wang, Mingyuan Zhou

Keywords Paper

binary softmax, discrete variables, policy gradient, pseudo actions, reinforcement learning, variance reduction

0

0

0

0

4:59

03/05/2021

Rethinking the Role of Gradient-based Attribution Methods for Model Interpretability

Suraj Srinivas, François Fleuret

Keywords Paper

Interpretability, saliency maps, score-matching

0

0

0

0

15:08

26/04/2020

SlowMo: Improving Communication-Efficient Distributed SGD with Slow Momentum

Jianyu Wang, Vinayak Tantia, Nicolas Ballas, Michael Rabbat

Keywords Paper

distributed optimization, decentralized training methods, communication-efficient distributed training with momentum, large-scale parallel SGD

0

0

0

0

5:07

18/07/2021

Self Normalizing Flows

T. Anderson Keller, Jorn Peters, Priyank Jaini and
Emiel Hoogeboom, Patrick Forré, Max Welling

Keywords Paper

Deep Learning, Generative Models

0

1

1

0

4:24

06/12/2020

A Bayesian Nonparametrics View into Deep Representations

Michał Jamroż, Marcin Kurdziel, Mateusz Opala

Keywords Paper

0

0

0

0

3:18

06/12/2020

On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems

Panayotis Mertikopoulos, Nadav Hallak, Ali Kavis, Volkan Cevher

Keywords Paper

0

0

0

0

3:27

12/07/2020

Extrapolation for Large-batch Training in Deep Learning

Tao LIN, Lingjing Kong, Sebastian Stich, Martin Jaggi

Keywords Paper

Deep Learning - Algorithms

0

0

0

0

13:21

05/04/2021

An Efficient Statistical-based Gradient Compression Technique for Distributed Training Systems

Ahmed M. Abdelmoniem, Ahmed Elzanaty Elzanaty, Mohamed-Slim Alouini , Marco Canini

Keywords Paper

0

0

0

0

4:13

05/04/2021

An Efficient Statistical-based Gradient Compression Technique for Distributed Training Systems

Ahmed M. Abdelmoniem, Ahmed Elzanaty Elzanaty, Mohamed-Slim Alouini , Marco Canini

Keywords Paper

0

0

0

0

22:37

12/07/2020

Unique Properties of Wide Minima in Deep Networks

Rotem Mulayoff, Tomer Michaeli

Keywords Paper

Deep Learning - Theory

0

0

0

0

14:35

06/12/2020

Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy

Edward Moroshko, Blake Woodworth, Suriya Gunasekar and
Jason Lee, Nati Srebro, Daniel Soudry

Keywords Paper

0

0

0

0

3:19

03/05/2021

Separation and Concentration in Deep Networks

John Zarka, Florentin Guth, Stéphane Mallat

Keywords Paper

concentration, mean separation, neural collapse, fisher ratio, image classification, variance reduction, deep learning

0

0

0

0

5:11

06/12/2020

Robustness Analysis of Non-Convex Stochastic Gradient Descent using Biased Expectations

Kevin Scaman, Cedric Malherbe

Keywords Paper

0

0

0

0

3:09

12/07/2020

Don't Waste Your Bits! Squeeze Activations and Gradients for Deep Neural Networks via TinyScript

Fangcheng Fu, Yuzheng Hu, Yihan He and
Jiawei Jiang, Yingxia Shao, Ce Zhang, Bin Cui

Keywords Paper

Optimization - Large Scale, Parallel and Distributed

0

0

0

0

9:59

12/07/2020

Landscape Connectivity and Dropout Stability of SGD Solutions for Over-parameterized Neural Networks

Alexander Shevchenko, Marco Mondelli

Keywords Paper

Deep Learning - Theory

0

0

0

0

13:20

03/05/2021

Implicit Gradient Regularization

David Barrett, Benoit Dherin

Keywords Paper

regularization, theory, deep learning, implicit regularization, deep learning theory, theoretical issues in deep learning

0

0

0

0

4:55

03/05/2021

Rao-Blackwellizing the Straight-Through Gumbel-Softmax Gradient Estimator

Max B Paulus, Chris Maddison, Andreas Krause

Keywords Paper

softmax, gumbel, rao-blackwell, rao, straightthrough, straight-through, gumbel-softmax

0

0

0

0

13:25

06/12/2021

Linear Convergence in Federated Learning: Tackling Client Heterogeneity and Sparse Gradients

Aritra Mitra, Rayana Jaafar, George J. Pappas, Hamed Hassani

Keywords Paper

optimization, federated learning

0

0

0

0

14:43

03/05/2021

A unifying view on implicit bias in training linear neural networks

Chulhee (Charlie) Yun, Shankar Krishnan, Hossein Mobahi

Keywords Paper

convergence, implicit bias, gradient flow, implicit regularization, gradient descent

0

0

0

0

5:24

06/12/2021

Batch Normalization Orthogonalizes Representations in Deep Random Networks

Hadi Daneshmand, Amir Joudaki, Francis Bach

Keywords Paper

theory, deep learning, optimization, machine learning, generative model

0

0

0

0

12:49

06/12/2020

GCN meets GPU: Decoupling “When to Sample” from “How to Sample”

Morteza Ramezani, Weilin Cong, Mehrdad Mahdavi and
Anand Sivasubramaniam, Mahmut Kandemir

Keywords Paper

0

0

0

0

3:24

06/12/2021

Rectangular Flows for Manifold Learning

Anthony Caterini, Gabriel Loaiza-Ganem, Geoff Pleiss, John Cunningham

Keywords Paper

deep learning, optimization, generative model

0

0

0

0

12:26

18/07/2021

Fast Stochastic Bregman Gradient Methods: Sharp Analysis and Variance Reduction

Radu Alexandru Dragomir, Mathieu Even, Hadrien Hendrikx

Keywords Paper

Optimization, Convex Optimization

0

0

0

0

5:22

06/12/2021

Diffusion Normalizing Flow

Qinsheng Zhang, Yongxin Chen

Keywords Paper

generative model

0

0

0

0

9:09

02/02/2021

Distribution Adaptive INT8 Quantization for Training CNNs

Kang Zhao, Sida Huang, Pan Pan and
Yinghan Li, Yingya Zhang, Zhenyu Gu, Yinghui Xu

Keywords Paper

0

0

0

0

16:42

13/04/2021

Direct loss minimization for sparse gaussian processes

Yadi Wei, Rishit Sheth, Roni Khardon

Keywords Paper

0

0

0

0

3:24

12/07/2020

Accelerated Message Passing for Entropy-Regularized MAP Inference

Jonathan Lee, Aldo Pacchiano, Peter Bartlett, Michael Jordan

Keywords Paper

Probabilistic Inference - Models and Probabilistic Programming

0

0

0

0

14:57

23/08/2020

Rethinking pruning for accelerating deep inference at the edge

Dawei Gao, Xiaoxi He, Zimu Zhou and
Yongxin Tong, Ke Xu, Lothar Thiele

Keywords Paper

automatic speech recognition, deep learning, name entity recognition, network pruning, sequence labelling

0

0

0

0

13:43

26/04/2020

SNODE: Spectral Discretization of Neural ODEs for System Identification

Alessio Quaglino, Marco Gallieri, Jonathan Masci, Jan Koutník

Keywords Paper

Recurrent neural networks, system identification, neural ODEs

0

0

0

0

5:00

06/12/2021

Settling the Variance of Multi-Agent Policy Gradients

Jakub Grudzien Kuba, Muning Wen, Linghui Meng and
shangding gu, Haifeng Zhang, David Mguni, Jun Wang, Yaodong Yang

Keywords Paper

deep learning, reinforcement learning and planning

0

0

0

0

13:12