On the Explicit Role of Initialization on the Convergence and Implicit Bias of Overparametrized Linear Networks

Abstract: Neural networks trained via gradient descent with random initialization and without any regularization enjoy good generalization performance in practice despite being highly overparametrized. A promising direction to explain this phenomenon is to study how initialization and overparametrization affect convergence and implicit bias of training algorithms. In this paper, we present a novel analysis of single-hidden-layer linear networks trained under gradient flow, which connects initialization, optimization, and overparametrization. Firstly, we show that the squared loss converges exponentially to its optimum at a rate that depends on the level of imbalance of the initialization. Secondly, we show that proper initialization constrains the dynamics of the network parameters to lie within an invariant set. In turn, minimizing the loss over this set leads to the min-norm solution. Finally, we show that large hidden layer width, together with (properly scaled) random initialization, ensures proximity to such an invariant set during training, allowing us to derive a novel non-asymptotic upper-bound on the distance between the trained network and the min-norm solution.

06/12/2020

regularization, theory, deep learning, implicit regularization, deep learning theory, theoretical issues in deep learning

4:55

18/07/2021

On the Explicit Role of Initialization on the Convergence and Implicit Bias of Overparametrized Linear Networks

Hancheng Min, Salma Tarmoun, Rene Vidal, Enrique Mallada

Comments

Similar Papers

The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks

Wei Hu, Lechao Xiao, Ben Adlam, Jeffrey Pennington

Keywords Abstract Paper

Simple and Effective Regularization Methods for Training on Noisily Labeled Data with Generalization Guarantee

Wei Hu, Zhiyuan Li, Dingli Yu

Keywords Abstract Paper

deep learning theory, regularization, noisy labels

Implicit Gradient Regularization

David Barrett, Benoit Dherin

Keywords Abstract Paper

regularization, theory, deep learning, implicit regularization, deep learning theory, theoretical issues in deep learning

PHEW : Constructing Sparse Networks that Learn Fast and Generalize Well without Training Data

Shreyas Malakarjun Patil, Constantine Dovrolis

Keywords Abstract Paper

Deep Learning

Finite Depth and Width Corrections to the Neural Tangent Kernel

Boris Hanin, Mihai Nica

Keywords Abstract Paper

Neural Tangent Kernel, Finite Width Corrections, Random ReLU Net, Wide Networks, Deep Networks

Implicit Bias of Gradient Descent based Adversarial Training on Separable Data

Yan Li, Ethan X.Fang, Huan Xu, Tuo Zhao

Keywords Abstract Paper

implicit bias, adversarial training, robustness, gradient descent

Efficient proximal mapping of the path-norm regularizer of shallow networks

Fabian Latorre, Paul Rolland, Shaul Nadav Hallak, Volkan Cevher

Keywords Abstract Paper

Deep Learning - Algorithms

Training Adversarially Robust Sparse Networks via Bayesian Connectivity Sampling

Ozan Özdenizci, Robert Legenstein

Keywords Abstract Paper

Algorithms, Adversarial Examples

Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss

Lénaïc Chizat, Francis Bach

Keywords Abstract Paper

Neural networks/deep learning, Non-convex optimization

Effect of Activation Functions on the Training of Overparametrized Neural Nets

Abhishek Panigrahi, Abhishek Shetty, Navin Goyal

Keywords Abstract Paper

activation functions, deep learning theory, neural networks

Integrated Latent Heterogeneity and Invariance Learning in Kernel Space

Jiashuo Liu, Zheyuan Hu, Peng Cui and Bo Li, Zheyan Shen

Keywords Abstract Paper

deep learning, reinforcement learning and planning, machine learning

ISTA-NAS: Efficient and Consistent Neural Architecture Search by Sparse Coding

Yibo Yang, Hongyang Li, Shan You and Fei Wang, Chen Qian, Zhouchen Lin

Keywords Abstract Paper

Towards Understanding the Dynamics of the First-Order Adversaries

Zhun Deng, Hangfeng He, Jiaoyang Huang, Weijie Su

Keywords Abstract Paper

Adversarial Examples

Confidence-Aware Learning for Deep Neural Networks

Sangheum Hwang, Jooyoung Moon, Jihyo Kim, Younghak Shin

Keywords Abstract Paper

Deep Learning - Algorithms

On the Acceleration of Deep Learning Model Parallelism With Staleness

An Xu, Zhouyuan Huo, Heng Huang

Keywords Abstract Paper

layer-wise staleness, asynchronous model parallelism, convolutional neural networks.

FL-NTK: A Neural Tangent Kernel-based Framework for Federated Learning Analysis

Baihe Huang, Xiaoxiao Li, Zhao Song, Xin Yang

Keywords Abstract Paper

Theory, Deep learning Theory

Explicit loss asymptotics in the gradient descent training of neural networks

Maksim Velikanov, Dmitry Yarotsky

Keywords Abstract Paper

theory, deep learning, optimization

On the Regularization Properties of Structured Dropout

Ambar Pal, Connor Lane, René Vidal, Benjamin D. Haeffele

Keywords Abstract Paper

dropout, regularization, dropblock, dropconnect, neural networks, optimization, low rank, nuclear norm, k-support norm

On Monotonic Linear Interpolation of Neural Network Parameters

James Lucas, Juhan Bae, Michael Zhang and Stanislav Fort, Richard Zemel, Roger Grosse

Keywords Abstract Paper

Deep Learning, Others

A Single-Loop Smoothed Gradient Descent-Ascent Algorithm for Nonconvex-Concave Min-Max Problems

Jiawei Zhang, Peijun Xiao, Ruoyu Sun, Zhiquan Luo

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Jiashuo Liu, Zheyuan Hu, Peng Cui and
Bo Li, Zheyan Shen

Keywords Paper

Yibo Yang, Hongyang Li, Shan You and
Fei Wang, Chen Qian, Zhouchen Lin

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

James Lucas, Juhan Bae, Michael Zhang and
Stanislav Fort, Richard Zemel, Roger Grosse

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Yi Zhang, Orestis Plevrakis, Simon Du and
Xingguo Li, Zhao Song, Sanjeev Arora

Keywords Paper

Xishan Zhang, Shaoli Liu, Rui Zhang and
Chang Liu, Di Huang, Shiyi Zhou, Jiaming Guo, Qi Guo, Zidong Du, Tian Zhi, Yunji Chen

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Melih Barsbey, Milad Sefidgaran, Murat Erdogdu and
Gaël Richard, Umut Simsekli

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper