The Implicit Bias of Depth: How Incremental Learning Drives Generalization

26/04/2020

The Implicit Bias of Depth: How Incremental Learning Drives Generalization

Daniel Gissin, Shai Shalev-Shwartz, Amit Daniely

Keywords: gradient flow, gradient descent, implicit regularization, implicit bias, generalization, optimization, quadratic network, matrix sensing

Abstract Paper Code Similar Papers

Abstract: A leading hypothesis for the surprising generalization of neural networks is that the dynamics of gradient descent bias the model towards simple solutions, by searching through the solution space in an incremental order of complexity. We formally define the notion of incremental learning dynamics and derive the conditions on depth and initialization for which this phenomenon arises in deep linear models. Our main theoretical contribution is a dynamical depth separation result, proving that while shallow models can exhibit incremental learning dynamics, they require the initialization to be exponentially small for these dynamics to present themselves. However, once the model becomes deeper, the dependence becomes polynomial and incremental learning can arise in more natural settings. We complement our theoretical findings by experimenting with deep matrix sensing, quadratic neural networks and with binary classification using diagonal and convolutional linear networks, showing all of these models exhibit incremental learning.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICLR 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2021

Leveraging Recursive Gumbel-Max Trick for Approximate Inference in Combinatorial Spaces

Kirill Struminsky, Artyom Gadetsky, Denis Rakitin and
Danil Karpushkin, Dmitry Vetrov

Keywords Paper

deep learning, optimization

0

0

0

0

14:26

18/07/2021

Implicit Regularization in Tensor Factorization

Noam Razin, Asaf Maman, Nadav Cohen

Keywords Paper

Theory, Deep learning Theory

0

0

0

0

5:11

06/12/2021

The staircase property: How hierarchical structure can guide deep learning

Emmanuel Abbe, Enric Boix-Adsera, Matthew S Brennan and
Guy Bresler, Dheeraj Nagaraj

Keywords Paper

deep learning, optimization

0

0

0

0

14:16

06/12/2021

Learning to Learn Dense Gaussian Processes for Few-Shot Learning

Ze Wang, Zichen Miao, Xiantong Zhen, Qiang Qiu

Keywords Paper

deep learning, optimization, generative model, meta learning, kernel methods, few shot learning

0

0

0

0

5:21

18/07/2021

The Heavy-Tail Phenomenon in SGD

Mert Gurbuzbalaban, Umut Simsekli, Lingjiong Zhu

Keywords Paper

Optimization, Stochastic Optimization

0

0

0

0

5:37

03/05/2021

Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics

Daniel Kunin, Javier Sagastuy-Brena, Surya Ganguli and
Daniel L Yamins, Hidenori Tanaka

Keywords Paper

geometry, stochastic differential equation, symmetry, learning dynamics, modified equation analysis, conservation law, physics, gradient flow, loss landscape, hessian

0

0

0

0

4:36

06/12/2021

Simple Stochastic and Online Gradient Descent Algorithms for Pairwise Learning

ZHENHUAN YANG, Yunwen Lei, Puyu Wang and
Tianbao Yang, Yiming Ying

Keywords Paper

optimization, machine learning, privacy

0

0

0

0

14:40

06/12/2021

Representation Learning Beyond Linear Prediction Functions

Ziping Xu, Ambuj Tewari

Keywords Paper

theory, deep learning, optimization, representation learning, few shot learning

0

0

0

0

11:00

12/07/2020

Finding trainable sparse networks through Neural Tangent Transfer

Tianlin Liu, Friedemann Zenke

Keywords Paper

Deep Learning - Algorithms

0

0

0

0

13:43

12/07/2020

Deep Reinforcement Learning with Smooth Policy

Qianli Shen, Yan Li, Haoming Jiang and
Zhaoran Wang, Tuo Zhao

Keywords Paper

Reinforcement Learning - Deep RL

0

0

0

0

9:51

06/12/2021

Joint Inference for Neural Network Depth and Dropout Regularization

Kishan K C, Rui Li, MohammadMahdi Gilany

Keywords Paper

deep learning, generative model, continual learning

0

0

0

0

11:01

02/02/2021

Deep Frequency Principle Towards Understanding Why Deeper Learning Is Faster

Zhiqin John Xu, Hanxu Zhou

Keywords Paper

0

0

0

0

19:40

12/07/2020

dS^2LBI: Exploring Structural Sparsity on Deep Network via Differential Inclusion Paths

Yanwei Fu, Chen Liu, Donghao Li and
Xinwei Sun, Jinshan ZENG, Yuan Yao

Keywords Paper

Deep Learning - Algorithms

0

0

0

1

12:45

06/12/2020

Efficient Variational Inference for Sparse Deep Learning with Theoretical Guarantee

Jincheng Bai, Qifan Song, Guang Cheng

Keywords Paper

0

0

0

0

3:11

18/07/2021

Sparsifying Networks via Subdifferential Inclusion

Sagar Verma, Jean-Christophe Pesquet

Keywords Paper

Optimization, Convex Optimization

0

0

0

0

5:10

26/04/2020

Provable Benefit of Orthogonal Initialization in Optimizing Deep Linear Networks

Wei Hu, Lechao Xiao, Jeffrey Pennington

Keywords Paper

deep learning theory, non-convex optimization, orthogonal initialization

0

0

0

0

5:10

03/05/2021

How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks

Keyulu Xu, Mozhi Zhang, Jingling Li and
Simon Du, Ken-Ichi Kawarabayashi, Stefanie Jegelka

Keywords Paper

graph neural networks, out-of-distribution, deep learning, extrapolation, deep learning theory

0

0

0

1

17:06

06/07/2020

Bounding boxes for weakly supervised segmentation: Global constraints get close to full supervision

Hoel Kervadec, Jose Dolz, Shanshan Wang and
Eric Granger, Ismail Ben Ayed

Keywords Paper

0

0

0

0

15:09

06/12/2021

Towards Sample-efficient Overparameterized Meta-learning

Yue Sun, Adhyyan Narang, Ibrahim Gulluk and
Samet Oymak, Maryam Fazel

Keywords Paper

theory, machine learning, meta learning, representation learning, few shot learning

0

0

0

0

13:54

06/12/2021

Going Beyond Linear RL: Sample Efficient Neural Function Approximation

Baihe Huang, Kaixuan Huang, Sham Kakade and
Jason Lee, Qi Lei, Runzhe Wang, Jiaqi Yang

Keywords Paper

theory, deep learning, reinforcement learning and planning, generative model

0

0

0

0

12:17

12/07/2020

Puzzle Mix: Exploiting Saliency and Local Statistics for Optimal Mixup

Jang-Hyun Kim, Wonho Choo, Hyun Oh Song

Keywords Paper

Deep Learning - Algorithms

0

0

0

0

13:40

02/02/2021

Physarum Powered Differentiable Linear Programming Layers and Applications

Zihang Meng, Sathya N. Ravi, Vikas Singh

Keywords Paper

0

0

0

0

16:57

26/04/2020

Coherent Gradients: An Approach to Understanding Generalization in Gradient Descent-based Optimization

Satrajit Chatterjee

Keywords Paper

generalization, deep learning

0

0

0

0

5:01

06/12/2021

Gradient Starvation: A Learning Proclivity in Neural Networks

Mohammad Pezeshki, Oumar Kaba, Yoshua Bengio and
Aaron Courville, Doina Precup, Guillaume Lajoie

Keywords Paper

theory, deep learning, optimization, robustness

0

0

0

0

10:52

22/06/2020

Learning Credal Sum-Product Networks

Amelie Levray, Vaishak Belle

Keywords Paper

credal networks, imprecise probabilities, tractable learning

0

0

0

0

5:10

06/12/2020

Semialgebraic Optimization for Lipschitz Constants of ReLU Networks

Tong Chen, Jean Lasserre, Victor Magron, Edouard Pauwels

Keywords Paper

0

0

0

0

3:22

06/12/2021

Posterior Meta-Replay for Continual Learning

Christian Henning, Maria Cervera, Francesco D'Angelo and
Johannes von Oswald, Regina Traber, Benjamin Ehret, Seijin Kobayashi, Benjamin F. Grewe, João Sacramento

Keywords Paper

deep learning, continual learning

0

0

0

0

12:27

15/06/2020

Learning nonlinear loop invariants with gated continuous logic networks

Jianan Yao, Gabriel Ryan, Justin Wong and
Suman Jana, Ronghui Gu

Keywords Paper

Loop Invariant Inference, Continuous Logic Networks, Program Verification

0

0

0

0

14:18

18/07/2021

On Monotonic Linear Interpolation of Neural Network Parameters

James Lucas, Juhan Bae, Michael Zhang and
Stanislav Fort, Richard Zemel, Roger Grosse

Keywords Paper

Deep Learning, Others

0

0

0

0

5:03

06/12/2021

Robust Implicit Networks via Non-Euclidean Contractions

Saber Jafarpour, Alexander Davydov, Anton Proskurnikov, Francesco Bullo

Keywords Paper

theory, deep learning, machine learning, robustness, vision

0

0

0

0

14:59

06/12/2020

Benchmarking Deep Inverse Models over time, and the Neural-Adjoint method

Ben Ren, Willie Padilla, Jordan Malof

Keywords Paper

0

0

0

0

3:17

12/07/2020

Generalization Error of Generalized Linear Models in High Dimensions

Melikasadat Emami, Mojtaba Sahraee-Ardakan, Parthe Pandit and
Sundeep Rangan, Alyson Fletcher

Keywords Paper

Supervised Learning

0

0

0

0

15:08

06/12/2021

Continuous vs. Discrete Optimization of Deep Neural Networks

Omer Elkabetz, Nadav Cohen

Keywords Paper

theory, deep learning, optimization

0

0

0

0

9:51

06/12/2020

Modeling and Optimization Trade-off in Meta-learning

Katelyn Gao, Ozan Sener

Keywords Paper

0

0

0

0

3:21

06/12/2020

Improved Analysis of Clipping Algorithms for Non-convex Optimization

Bohang Zhang, Jikai Jin, Cong Fang, Liwei Wang

Keywords Paper

0

0

0

0

3:16

26/04/2020

SVQN: Sequential Variational Soft Q-Learning Networks

Shiyu Huang, Hang Su, Jun Zhu, Ting Chen

Keywords Paper

reinforcement learning, POMDP, variational inference, generative model

0

0

0

0

4:52

04/08/2021

Outlier-Robust Learning of Ising Models Under Dobrushin's Condition

Ilias Diakonikolas, Daniel M. Kane, Alistair Stewart, Yuxin Sun

Keywords Paper

0

0

0

0

16:22

03/05/2021

How Much Over-parameterization Is Sufficient to Learn Deep ReLU Networks?

Zixiang Chen, Yuan Cao, Difan Zou, Quanquan Gu

Keywords Paper

classification, neural tangent kernel, generalization error, (stochastic) gradient descent, deep ReLU networks

0

0

0

0

4:44

12/07/2020

Operation-Aware Soft Channel Pruning using Differentiable Masks

Minsoo Kang, Bohyung Han

Keywords Paper

Applications - Computer Vision

0

0

0

0

14:56

06/12/2020

Interior Point Solving for LP-based prediction+optimisation

Jayanta Mandi, Tias Guns

Keywords Paper

0

0

0

1

3:28