Explicit loss asymptotics in the gradient descent training of neural networks

06/12/2021

Explicit loss asymptotics in the gradient descent training of neural networks

Maksim Velikanov, Dmitry Yarotsky

Keywords: theory, deep learning, optimization

Abstract Paper Similar Papers

Abstract: Current theoretical results on optimization trajectories of neural networks trained by gradient descent typically have the form of rigorous but potentially loose bounds on the loss values. In the present work we take a different approach and show that the learning trajectory of a wide network in a lazy training regime can be characterized by an explicit asymptotic at large training times. Specifically, the leading term in the asymptotic expansion of the loss behaves as a power law $L(t) \sim C t^{-\xi}$ with exponent $\xi$ expressed only through the data dimension, the smoothness of the activation function, and the class of function being approximated. Our results are based on spectral analysis of the integral operator representing the linearized evolution of a large network trained on the expected loss. Importantly, the techniques we employ do not require a specific form of the data distribution, for example Gaussian, thus making our findings sufficiently universal.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

09/07/2020

Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss

Lénaïc Chizat, Francis Bach

Keywords Paper

Neural networks/deep learning, Non-convex optimization

0

0

0

0

14:41

12/07/2020

A Mean Field Analysis Of Deep ResNet And Beyond: Towards Provably Optimization Via Overparameterization From Depth

Yiping Lu, Chao Ma, Yulong Lu and
Jianfeng Lu, Lexing Ying

Keywords Paper

Deep Learning - Theory

0

0

0

0

4:37

09/07/2020

Kernel and Rich Regimes in Overparametrized Models

Blake E Woodworth, Suriya Gunasekar, Jason Lee and
Edward Moroshko, Pedro Henrique Pamplona Savarese, Itay Golan, Daniel Soudry, Nathan Srebro

Keywords Paper

Neural networks/deep learning,

0

0

0

0

13:29

06/12/2021

Robust Implicit Networks via Non-Euclidean Contractions

Saber Jafarpour, Alexander Davydov, Anton Proskurnikov, Francesco Bullo

Keywords Paper

theory, deep learning, machine learning, robustness, vision

0

0

0

0

14:59

06/12/2020

A Single-Loop Smoothed Gradient Descent-Ascent Algorithm for Nonconvex-Concave Min-Max Problems

Jiawei Zhang, Peijun Xiao, Ruoyu Sun, Zhiquan Luo

Keywords Paper

0

0

0

0

3:12

13/04/2021

A dynamical view on optimization algorithms of overparameterized neural networks

Zhiqi Bu, Shiyun Xu, Kan Chen

Keywords Paper

0

0

0

0

3:05

06/12/2021

Second-Order Neural ODE Optimizer

Guan-Horng Liu, Tianrong Chen, Evangelos Theodorou

Keywords Paper

deep learning, optimization, machine learning, vision

0

0

0

0

14:59

06/12/2021

Proxy Convexity: A Unified Framework for the Analysis of Neural Networks Trained by Gradient Descent

Spencer Frei, Quanquan Gu

Keywords Paper

deep learning, optimization

0

0

0

0

10:33

03/05/2021

Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability

Jeremy Cohen, Simran Kaur, Yuanzhi Li and
Zico Kolter, Ameet Talwalkar

Keywords Paper

implicit bias, stability, science of deep learning, L-smoothness, trajectory, optimization, sharpness, implicit regularization, deep learning theory

0

0

0

0

5:06

18/07/2021

On the Explicit Role of Initialization on the Convergence and Implicit Bias of Overparametrized Linear Networks

Hancheng Min, Salma Tarmoun, Rene Vidal, Enrique Mallada

Keywords Paper

Theory

0

0

0

0

5:16

26/04/2020

Finite Depth and Width Corrections to the Neural Tangent Kernel

Boris Hanin, Mihai Nica

Keywords Paper

Neural Tangent Kernel, Finite Width Corrections, Random ReLU Net, Wide Networks, Deep Networks

0

0

0

0

5:09

06/12/2021

The Implicit Bias of Minima Stability: A View from Function Space

Rotem Mulayoff, Tomer Michaeli, Daniel Soudry

Keywords Paper

deep learning, optimization

0

0

0

0

13:51

06/12/2020

A Dynamical Central Limit Theorem for Shallow Neural Networks

Zhengdao Chen, Grant Rotskoff, Joan Bruna, Eric Vanden-Eijnden

Keywords Paper

0

0

0

0

3:22

12/07/2020

Dynamics of Deep Neural Networks and Neural Tangent Hierarchy

Jiaoyang Huang, Horng-Tzer Yau

Keywords Paper

Deep Learning - Theory

0

0

0

0

15:28

26/08/2020

Functional Gradient Boosting for Learning Residual-like Networks with Statistical Guarantees

Atsushi Nitanda, Taiji Suzuki

Keywords Paper

0

0

0

0

10:49

06/12/2021

Even your Teacher Needs Guidance: Ground-Truth Targets Dampen Regularization Imposed by Self-Distillation

Kenneth Borup, Lars N Andersen

Keywords Paper

theory, deep learning, optimization

0

0

0

0

6:00

04/08/2021

Modeling from Features: a Mean-field Framework for Over-parameterized Deep Neural Networks

Cong Fang, Jason Lee, Pengkun Yang, Tong Zhang

Keywords Paper

0

0

0

0

15:10

06/12/2020

Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy

Edward Moroshko, Blake Woodworth, Suriya Gunasekar and
Jason Lee, Nati Srebro, Daniel Soudry

Keywords Paper

0

0

0

0

3:19

26/04/2020

Implicit Bias of Gradient Descent based Adversarial Training on Separable Data

Yan Li, Ethan X.Fang, Huan Xu, Tuo Zhao

Keywords Paper

implicit bias, adversarial training, robustness, gradient descent

0

0

0

0

4:53

06/12/2021

On the Provable Generalization of Recurrent Neural Networks

Lifu Wang, Bo Shen, Bo Hu, Xing Cao

Keywords Paper

theory, deep learning

0

0

0

0

5:01

06/12/2020

A Generalized Neural Tangent Kernel Analysis for Two-layer Neural Networks

Zixiang Chen, Yuan Cao, Quanquan Gu, Tong Zhang

Keywords Paper

0

0

0

0

3:16

26/04/2020

Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets

Mingrui Liu, Youssef Mroueh, Jerret Ross and
Wei Zhang, Xiaodong Cui, Payel Das, Tianbao Yang

Keywords Paper

Generative Adversarial Nets, Adaptive Gradient Algorithms

0

0

0

0

5:08

26/04/2020

Effect of Activation Functions on the Training of Overparametrized Neural Nets

Abhishek Panigrahi, Abhishek Shetty, Navin Goyal

Keywords Paper

activation functions, deep learning theory, neural networks

0

0

0

0

5:13

18/07/2021

Classifying high-dimensional Gaussian mixtures: Where kernel methods fail and neural networks succeed

Maria Refinetti, Sebastian Goldt, FLORENT KRZAKALA, Lenka Zdeborova

Keywords Paper

Theory, Models of Learning and Generalization

0

0

0

0

4:24

26/04/2020

SNODE: Spectral Discretization of Neural ODEs for System Identification

Alessio Quaglino, Marco Gallieri, Jonathan Masci, Jan Koutník

Keywords Paper

Recurrent neural networks, system identification, neural ODEs

0

0

0

0

5:00

26/04/2020

Provable Benefit of Orthogonal Initialization in Optimizing Deep Linear Networks

Wei Hu, Lechao Xiao, Jeffrey Pennington

Keywords Paper

deep learning theory, non-convex optimization, orthogonal initialization

0

0

0

0

5:10

20/07/2020

A type of generalization error induced by initialization in deep neural networks

Yaoyu Zhang, Zhi-Qin John Xu, Tao Luo, Zheng Ma

Keywords Paper

0

0

0

0

17:33

18/07/2021

Sparsifying Networks via Subdifferential Inclusion

Sagar Verma, Jean-Christophe Pesquet

Keywords Paper

Optimization, Convex Optimization

0

0

0

0

5:10

18/07/2021

Data-driven Prediction of General Hamiltonian Dynamics via Learning Exactly-Symplectic Maps

Renyi Chen, Molei Tao

Keywords Paper

Algorithms, Time Series and Sequences

0

0

0

0

5:21

18/07/2021

The Heavy-Tail Phenomenon in SGD

Mert Gurbuzbalaban, Umut Simsekli, Lingjiong Zhu

Keywords Paper

Optimization, Stochastic Optimization

0

0

0

0

5:37

03/05/2021

NOVAS: Non-convex Optimization via Adaptive Stochastic Search for End-to-end Learning and Control

Ioannis Exarchos, Marcus A Pereira, Ziyi Wang, Evangelos Theodorou

Keywords Paper

deep neural networks, deep FBSDEs, stochastic control, nested optimization

0

0

0

0

5:35

06/12/2020

Provably Efficient Reinforcement Learning with Kernel and Neural Function Approximations

Zhuoran Yang, Chi Jin, Zhaoran Wang and
Mengdi Wang, Michael Jordan

Keywords Paper

0

0

0

0

3:42

12/07/2020

Communication-Efficient Distributed Stochastic AUC Maximization with Deep Neural Networks

Zhishuai Guo, Mingrui Liu, Zhuoning Yuan and
Li Shen, Wei Liu, Tianbao Yang

Keywords Paper

Optimization - Large Scale, Parallel and Distributed

0

0

0

0

14:42

12/07/2020

Training Neural Networks for and by Interpolation

Leonard Berrada, M. Pawan Kumar, Andrew Zisserman

Keywords Paper

Deep Learning - General

0

0

0

0

16:12

06/12/2021

Neural Active Learning with Performance Guarantees

Zhilei Wang, Pranjal Awasthi, Christoph Dann and
Ayush Sekhari, Claudio Gentile

Keywords Paper

deep learning, active learning

0

0

0

0

10:43

18/07/2021

On Monotonic Linear Interpolation of Neural Network Parameters

James Lucas, Juhan Bae, Michael Zhang and
Stanislav Fort, Richard Zemel, Roger Grosse

Keywords Paper

Deep Learning, Others

0

0

0

0

5:03

04/08/2021

Nonparametric Regression with Shallow Overparametrized Neural Networks Trained by GD with Early Stopping

Ilja Kuzborskij , Csaba Szepesvari

Keywords Paper

0

0

0

0

15:14

06/12/2021

Analytic Insights into Structure and Rank of Neural Network Hessian Maps

Sidak Pal Singh, Gregor Bachmann, Thomas Hofmann

Keywords Paper

deep learning, optimization, generative model

0

0

0

0

15:08

06/12/2021

Leveraging Recursive Gumbel-Max Trick for Approximate Inference in Combinatorial Spaces

Kirill Struminsky, Artyom Gadetsky, Denis Rakitin and
Danil Karpushkin, Dmitry Vetrov

Keywords Paper

deep learning, optimization

0

0

0

0

14:26

06/12/2020

The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks

Wei Hu, Lechao Xiao, Ben Adlam, Jeffrey Pennington

Keywords Paper

0

0

0

0

3:20