The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks

06/12/2020

The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks

Wei Hu, Lechao Xiao, Ben Adlam, Jeffrey Pennington

Keywords:

Abstract Paper Similar Papers

Abstract: Modern neural networks are often regarded as complex black-box functions whose behavior is difficult to understand owing to their nonlinear dependence on the data and the nonconvexity in their loss landscapes. In this work, we show that these common perceptions can be completely false in the early phase of learning. In particular, we formally prove that, for a class of well-behaved input distributions, the early-time learning dynamics of a two-layer fully-connected neural network can be mimicked by training a simple linear model on the inputs. We additionally argue that this surprising simplicity can persist in networks with more layers and with convolutional architecture, which we verify empirically. Key to our analysis is to bound the spectral norm of the difference between the Neural Tangent Kernel (NTK) and an affine transform of the data kernel; however, unlike many previous results utilizing the NTK, we do not require the network to have disproportionately large width, and the network is allowed to escape the kernel regime later in training.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

18/07/2021

Training Adversarially Robust Sparse Networks via Bayesian Connectivity Sampling

Ozan Özdenizci, Robert Legenstein

Keywords Paper

Algorithms, Adversarial Examples

0

0

0

1

6:27

03/05/2021

Go with the flow: Adaptive control for Neural ODEs

Mathieu Chalvidal, Matthew Ricci, Rufin VanRullen, Thomas Serre

Keywords Paper

Neural ODEs, Normalizing flows, Hypernetworks, Optimal Control Theory

0

0

0

0

5:03

06/12/2020

An analytic theory of shallow networks dynamics for hinge loss classification

Franco Pellegrini, Giulio Biroli

Keywords Paper

, Deep Learning -> Optimization for Deep Networks

0

0

0

0

3:11

18/07/2021

On the Explicit Role of Initialization on the Convergence and Implicit Bias of Overparametrized Linear Networks

Hancheng Min, Salma Tarmoun, Rene Vidal, Enrique Mallada

Keywords Paper

Theory

0

0

0

0

5:16

03/05/2021

A Temporal Kernel Approach for Deep Learning with Continuous-time Information

Da Xu, Chuanwei Ruan, evren korpeoglu and
Sushant Kumar, kannan achan

Keywords Paper

Reparameterization, Random Feature, Spectral Distribution, Continuous-time System, Kernel Learning, Learning Theory

0

0

0

0

4:20

26/04/2020

Simple and Effective Regularization Methods for Training on Noisily Labeled Data with Generalization Guarantee

Wei Hu, Zhiyuan Li, Dingli Yu

Keywords Paper

deep learning theory, regularization, noisy labels

0

0

0

0

5:13

09/07/2020

Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss

Lénaïc Chizat, Francis Bach

Keywords Paper

Neural networks/deep learning, Non-convex optimization

0

0

0

0

14:41

26/08/2020

Orthogonal Gradient Descent for Continual Learning

Mehrdad Farajtabar, Navid Azizan, Alex Mott, Ang Li

Keywords Paper

0

0

0

0

13:33

12/07/2020

Confidence-Aware Learning for Deep Neural Networks

Sangheum Hwang, Jooyoung Moon, Jihyo Kim, Younghak Shin

Keywords Paper

Deep Learning - Algorithms

0

0

0

1

14:05

26/04/2020

Finite Depth and Width Corrections to the Neural Tangent Kernel

Boris Hanin, Mihai Nica

Keywords Paper

Neural Tangent Kernel, Finite Width Corrections, Random ReLU Net, Wide Networks, Deep Networks

0

0

0

0

5:09

06/12/2021

Integrated Latent Heterogeneity and Invariance Learning in Kernel Space

Jiashuo Liu, Zheyuan Hu, Peng Cui and
Bo Li, Zheyan Shen

Keywords Paper

deep learning, reinforcement learning and planning, machine learning

0

0

0

0

11:11

06/12/2020

Efficient Variational Inference for Sparse Deep Learning with Theoretical Guarantee

Jincheng Bai, Qifan Song, Guang Cheng

Keywords Paper

0

0

0

0

3:11

06/12/2021

Asymptotics of representation learning in finite Bayesian neural networks

Jacob Zavatone-Veth, Abdulkadir Canatar, Ben Ruben, Cengiz Pehlevan

Keywords Paper

deep learning, representation learning

0

0

0

0

14:09

18/07/2021

Classifying high-dimensional Gaussian mixtures: Where kernel methods fail and neural networks succeed

Maria Refinetti, Sebastian Goldt, FLORENT KRZAKALA, Lenka Zdeborova

Keywords Paper

Theory, Models of Learning and Generalization

0

0

0

0

4:24

03/05/2021

Exploring the Uncertainty Properties of Neural Networks’ Implicit Priors in the Infinite-Width Limit

Ben Adlam, Jaehoon Lee, Lechao Xiao and
Jeffrey Pennington, Jasper Snoek

Keywords Paper

Deep Learning, Bayesian Neural Networks, Neural Network Gaussian Process, Infinite-Width Limit, Uncertainty, Gaussian Process

0

0

0

0

4:34

26/04/2020

Effect of Activation Functions on the Training of Overparametrized Neural Nets

Abhishek Panigrahi, Abhishek Shetty, Navin Goyal

Keywords Paper

activation functions, deep learning theory, neural networks

0

0

0

0

5:13

06/12/2021

Intrinsic Dimension, Persistent Homology and Generalization in Neural Networks

Tolga Birdal, Aaron Lou, Leonidas Guibas, Umut Simsekli

Keywords Paper

theory, deep learning, optimization

0

0

0

0

14:38

06/12/2020

Learning Parities with Neural Networks

Amit Daniely, Eran Malach

Keywords Paper

0

0

0

0

3:21

06/12/2020

MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures

Jeong Un Ryu, JWoong Shin, Hae Beom Lee, Sung Ju Hwang

Keywords Paper

0

0

0

0

3:32

12/07/2020

Efficient proximal mapping of the path-norm regularizer of shallow networks

Fabian Latorre, Paul Rolland, Shaul Nadav Hallak, Volkan Cevher

Keywords Paper

Deep Learning - Algorithms

0

0

0

0

11:32

30/11/2020

Bridging Adversarial and Statistical Domain Transfer via Spectral Adaptation Networks

Christoph Raab, Philipp Väth, Peter Meier, Frank-Michael Schleif

Keywords Paper

0

0

0

0

10:07

12/07/2020

Defense Through Diverse Directions

Christopher Bender, Yang Li, Yifeng Shi and
Michael K. Reiter, Junier Oliva

Keywords Paper

Adversarial Examples

0

0

0

0

15:06

18/07/2021

PHEW : Constructing Sparse Networks that Learn Fast and Generalize Well without Training Data

Shreyas Malakarjun Patil, Constantine Dovrolis

Keywords Paper

Deep Learning

1

1

0

1

5:20

06/12/2021

Meta-Learning Sparse Implicit Neural Representations

Jaeho Lee, Jihoon Tack, Namhoon Lee, Jinwoo Shin

Keywords Paper

deep learning, optimization, meta learning, representation learning

0

0

0

0

8:41

14/06/2020

On the Acceleration of Deep Learning Model Parallelism With Staleness

An Xu, Zhouyuan Huo, Heng Huang

Keywords Paper

layer-wise staleness, asynchronous model parallelism, convolutional neural networks.

0

0

0

0

1:01

22/11/2021

FFNB: Forgetting-Free Neural Blocks for Deep Continual Learning

Hichem Sahbi, Haoming Zhan

Keywords Paper

Continual and incremental learning, lifelong learning, catastrophic interference, catastrophic forgetting, dynamic neural networks, visual recognition

0

0

0

0

3:05

03/05/2021

Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data

Colin Wei, Kendrick Shen, Yining Chen, Tengyu Ma

Keywords Paper

deep learning theory, semi-supervised learning theory, unsupervised learning theory, domain adaptation theory

1

1

0

0

14:46

12/07/2020

Revisiting Spatial Invariance with Low-Rank Local Connectivity

Gamaleldin Elsayed, Prajit Ramachandran, Jon Shlens, Simon Kornblith

Keywords Paper

Deep Learning - General

0

0

0

0

14:48

06/12/2021

Local Signal Adaptivity: Provable Feature Learning in Neural Networks Beyond Kernels

Stefani Karp, Ezra Winston, Yuanzhi Li, Aarti Singh

Keywords Paper

theory, deep learning, optimization, machine learning, vision, kernel methods

0

0

0

0

13:22

03/05/2021

How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks

Keyulu Xu, Mozhi Zhang, Jingling Li and
Simon Du, Ken-Ichi Kawarabayashi, Stefanie Jegelka

Keywords Paper

graph neural networks, out-of-distribution, deep learning, extrapolation, deep learning theory

0

0

0

1

17:06

06/12/2020

A Generalized Neural Tangent Kernel Analysis for Two-layer Neural Networks

Zixiang Chen, Yuan Cao, Quanquan Gu, Tong Zhang

Keywords Paper

0

0

0

0

3:16

06/12/2020

Over-parameterized Adversarial Training: An Analysis Overcoming the Curse of Dimensionality

Yi Zhang, Orestis Plevrakis, Simon Du and
Xingguo Li, Zhao Song, Sanjeev Arora

Keywords Paper

0

0

0

0

2:56

06/12/2021

What can linearized neural networks actually say about generalization?

Guillermo Ortiz-Jimenez, Seyed-Mohsen Moosavi-Dezfooli, Pascal Frossard

Keywords Paper

theory, deep learning

0

0

0

0

9:46

18/07/2021

Addressing Catastrophic Forgetting in Few-Shot Problems

Pauching Yap, Hippolyt Ritter, David Barber

Keywords Paper

Applications, Computer Vision, Deep Learning, CNN Architectures; Deep Learning, Generative Models, Algorithms, Multitask, Transfer, and Meta Learning

0

0

0

0

5:11

03/05/2021

MetaNorm: Learning to Normalize Few-Shot Batches Across Domains

Yingjun Du, Xiantong Zhen, Ling Shao, Cees G Snoek

Keywords Paper

batch normalization, Meta-learning, few-shot domain generalization

0

0

0

0

5:48

03/08/2020

Hidden Markov Nonlinear ICA: Unsupervised Learning from Nonstationary Time Series

Hermanni Hälvä, Aapo Hyvarinen

Keywords Paper

0

0

0

0

7:57

06/12/2021

When Are Solutions Connected in Deep Networks?

Quynh Nguyen, Pierre Bréchet, Marco Mondelli

Keywords Paper

theory, deep learning, optimization

0

0

0

0

14:44

06/12/2020

From Boltzmann Machines to Neural Networks and Back Again

Surbhi Goel, Adam Klivans, Frederic Koehler

Keywords Paper

Algorithms -> Nonlinear Dimensionality Reduction and Manifold Learning, Algorithms -> Regression

0

0

0

0

3:26

18/07/2021

FL-NTK: A Neural Tangent Kernel-based Framework for Federated Learning Analysis

Baihe Huang, Xiaoxiao Li, Zhao Song, Xin Yang

Keywords Paper

Theory, Deep learning Theory

0

0

0

0

4:49

26/08/2020

Adaptive, Distribution-Free Prediction Intervals for Deep Networks

Danijel Kivaranovic, Kory D. Johnson, Hannes Leeb

Keywords Paper

0

0

0

0

16:48