Implicit regularization for deep neural networks driven by an Ornstein-Uhlenbeck like process

Abstract: We consider networks, trained via stochastic gradient descent to minimize $\ell_2$ loss, with the training labels perturbed by independent noise at each iteration. We characterize the behavior of the training dynamics near any parameter vector that achieves zero training error, in terms of an implicit regularization term corresponding to the sum over the data points, of the squared $\ell_2$ norm of the gradient of the model with respect to the parameter vector, evaluated at each data point. This holds for networks of any connectivity, width, depth, and choice of activation function. We interpret this implicit regularization term for three simple settings: matrix sensing, two layer ReLU networks trained on one-dimensional data, and two layer networks with sigmoid activations trained on a single datapoint. For these settings, we show why this new and general implicit regularization effect drives the networks towards "simple" models.

26/04/2020

Implicit regularization for deep neural networks driven by an Ornstein-Uhlenbeck like process

Guy Blanc, Neha Gupta, Gregory Valiant, Paul Valiant

Comments

Similar Papers

Gradient $\ell_1$ Regularization for Quantization Robustness

Milad Alizadeh, Arash Behboodi, Mart van Baalen and Christos Louizos, Tijmen Blankevoort, Max Welling

Keywords Abstract Paper

quantization, regularization, robustness, gradient regularization

Asymmetric Heavy Tails and Implicit Bias in Gaussian Noise Injections

Alexander D Camuto, Xiaoyu Wang, Lingjiong Zhu and Christopher Holmes, Mert Gurbuzbalaban, Umut Simsekli

Keywords Abstract Paper

Theory, Deep learning Theory

Fractional moment-preserving initialization schemes for training deep neural networks

Mert Gurbuzbalaban, Yuanhan Hu

Keywords Abstract Paper

The inductive bias of ReLU networks on orthogonally separable data

Mary Phuong, Christoph H Lampert

Keywords Abstract Paper

implicit bias, extremal sector, gradient descent, inductive bias, max-margin, ReLU networks

Kernel and Rich Regimes in Overparametrized Models

Blake E Woodworth, Suriya Gunasekar, Jason Lee and Edward Moroshko, Pedro Henrique Pamplona Savarese, Itay Golan, Daniel Soudry, Nathan Srebro

Keywords Abstract Paper

Neural networks/deep learning,

Even your Teacher Needs Guidance: Ground-Truth Targets Dampen Regularization Imposed by Self-Distillation

Kenneth Borup, Lars N Andersen

Keywords Abstract Paper

theory, deep learning, optimization

Unique Properties of Wide Minima in Deep Networks

Rotem Mulayoff, Tomer Michaeli

Keywords Abstract Paper

Deep Learning - Theory

Quantized Frank-Wolfe: Faster Optimization, Lower Communication, and Projection Free

Mingrui Zhang, Lin Chen, Aryan Mokhtari and Hamed Hassani, Amin Karbasi

Keywords Abstract Paper

Adaptive End-to-End Budgeted Network Learning via Inverse Scale Space

Zuyuan Zhong, Chen Liu, Yanwei Fu

Keywords Abstract Paper

deep learning, network architecture, growing network, budgeted network learning, pruning

Functional Regularization for Reinforcement Learning via Learned Fourier Features

Alexander Li, Deepak Pathak

Keywords Abstract Paper

deep learning, optimization, reinforcement learning and planning

Spectra of the Conjugate Kernel and Neural Tangent Kernel for linear-width neural networks

Zhou Fan, Zhichao Wang

Keywords Abstract Paper

Understanding Generalization in Recurrent Neural Networks

Zhuozhuo Tu, Fengxiang He, Dacheng Tao

Keywords Abstract Paper

generalization, recurrent neural networks, learning theory

Subquadratic Overparameterization for Shallow Neural Networks

ChaeHwan Song, Ali Ramezani-Kebrya, Thomas Pethick and Armin Eftekhari, Volkan Cevher

Keywords Abstract Paper

theory, deep learning, optimization

Learning with gradient descent and weakly convex losses

Dominic Richards, Mike Rabbat

Keywords Abstract Paper

A Dynamical Central Limit Theorem for Shallow Neural Networks

Zhengdao Chen, Grant Rotskoff, Joan Bruna, Eric Vanden-Eijnden

Keywords Abstract Paper

Better Training using Weight-Constrained Stochastic Dynamics

Benedict Leimkuhler, Tiffany Vlaar, Timothée Pouchon, Amos Storkey

Keywords Abstract Paper

Deep Learning, Bayesian Deep Learning

Generative Flows with Matrix Exponential

Changyi Xiao, Ligang Liu

Keywords Abstract Paper

Deep Learning - Generative Models and Autoencoders

The Implicit Bias of Minima Stability: A View from Function Space

Rotem Mulayoff, Tomer Michaeli, Daniel Soudry

Keywords Abstract Paper

deep learning, optimization

Noisy Recurrent Neural Networks

Soon Hoe Lim, N. Benjamin Erichson, Liam Hodgkinson, Michael W Mahoney

Keywords Abstract Paper

theory, deep learning, machine learning, robustness

Task Agnostic Robust Learning on Corrupt Outputs by Correlation-Guided Mixture Density Networks

Sungjoon Choi, Sanghoon Hong, Kyungjae Lee, Sungbin Lim

Keywords Abstract Paper

robust learning, bayesian deep learning, semi-supervised learning

Channel Permutations for N:M Sparsity

Milad Alizadeh, Arash Behboodi, Mart van Baalen and
Christos Louizos, Tijmen Blankevoort, Max Welling

Keywords Paper

Alexander D Camuto, Xiaoyu Wang, Lingjiong Zhu and
Christopher Holmes, Mert Gurbuzbalaban, Umut Simsekli

Keywords Paper

Keywords Paper

Keywords Paper

Blake E Woodworth, Suriya Gunasekar, Jason Lee and
Edward Moroshko, Pedro Henrique Pamplona Savarese, Itay Golan, Daniel Soudry, Nathan Srebro

Keywords Paper

Keywords Paper

Keywords Paper

Mingrui Zhang, Lin Chen, Aryan Mokhtari and
Hamed Hassani, Amin Karbasi

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

ChaeHwan Song, Ali Ramezani-Kebrya, Thomas Pethick and
Armin Eftekhari, Volkan Cevher

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Mohammad Pezeshki, Oumar Kaba, Yoshua Bengio and
Aaron Courville, Doina Precup, Guillaume Lajoie

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Mats L Richter, Justin C Shenk, Wolf Byttner and
Anna Wiedenroth, Mikael Huss

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Yanwei Fu, Chen Liu, Donghao Li and
Xinwei Sun, Jinshan ZENG, Yuan Yao

Keywords Paper

Keywords Paper