Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK

09/07/2020

Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK

Yuanzhi Li, Tengyu Ma, Hongyang R Zhang

Keywords: Neural networks/deep learning, Matrix/tensor estimation, Non-convex optimization

Abstract Paper Similar Papers

Abstract: We consider the dynamic of gradient descent for learning a two-layer neural network. We assume the input $x\in\mathbb{R}^d$ is drawn from a Gaussian distribution and the label of $x$ satisfies $f^{\star}(x) = a^{\top}|W^{\star}x|$, where $a\in\mathbb{R}^d$ is a nonnegative vector and $W^{\star} \in\mathbb{R}^{d\times d}$ is an orthonormal matrix. We show that an \emph{over-parameterized} two-layer neural network with ReLU activation, trained by gradient descent from \emph{random initialization}, can provably learn the ground truth network with population loss at most $o(1/d)$ in polynomial time with polynomial samples. On the other hand, we prove that any kernel method, including Neural Tangent Kernel, with a polynomial number of samples in $d$, has population loss at least $\Omega(1 / d)$.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at COLT 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2020

Agnostic Learning of a Single Neuron with Gradient Descent

Spencer Frei, Yuan Cao, Quanquan Gu

Keywords Paper

0

0

0

0

3:10

06/12/2021

On the Provable Generalization of Recurrent Neural Networks

Lifu Wang, Bo Shen, Bo Hu, Xing Cao

Keywords Paper

theory, deep learning

0

0

0

0

5:01

03/05/2021

A unifying view on implicit bias in training linear neural networks

Chulhee (Charlie) Yun, Shankar Krishnan, Hossein Mobahi

Keywords Paper

convergence, implicit bias, gradient flow, implicit regularization, gradient descent

0

0

0

0

5:24

03/05/2021

Deep Networks and the Multiple Manifold Problem

Sam Buchanan, Dar Gilboa, John Wright

Keywords Paper

low-dimensional structure, overparameterized neural networks, deep learning

0

0

0

0

5:14

06/12/2020

A Universal Approximation Theorem of Deep Neural Networks for Expressing Probability Distributions

Yulong Lu, Jianfeng Lu

Keywords Paper

0

0

0

0

2:55

12/07/2020

Frequency Bias in Neural Networks for Input of Non-Uniform Density

Ronen Basri, Meirav Galun, Amnon Geifman and
David Jacobs, Yoni Kasten, Shira Kritchman

Keywords Paper

Deep Learning - Theory

0

0

0

0

11:18

06/12/2020

Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation

Devavrat Shah, Dogyoon Song, Zhi Xu, Yuzhe Yang

Keywords Paper

0

0

0

0

3:22

09/07/2020

A Corrective View of Neural Networks: Representation, Memorization and Learning

Dheeraj M Nagaraj, Guy Bresler

Keywords Paper

Neural networks/deep learning, Learning with algebraic or combinatorial structure, Supervised learning

0

0

0

0

13:38

26/04/2020

Generalization of Two-layer Neural Networks: An Asymptotic Viewpoint

Jimmy Ba, Murat Erdogdu, Taiji Suzuki and
Denny Wu, Tianzong Zhang

Keywords Paper

Neural Networks, Generalization, High-dimensional Statistics

0

0

0

0

6:17

06/12/2020

Provably Efficient Neural GTD for Off-Policy Learning

Hoi-To Wai, Zhuoran Yang, Zhaoran Wang, Mingyi Hong

Keywords Paper

0

0

0

0

3:40

06/12/2021

The Implicit Bias of Minima Stability: A View from Function Space

Rotem Mulayoff, Tomer Michaeli, Daniel Soudry

Keywords Paper

deep learning, optimization

0

0

0

0

13:51

06/12/2020

Asymptotic normality and confidence intervals for derivatives of 2-layers neural network in the random features model

Yiwei Shen, Pierre C Bellec

Keywords Paper

0

0

0

0

3:12

12/07/2020

Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks

Blake Bordelon, Abdulkadir Canatar, Cengiz Pehlevan

Keywords Paper

General Machine Learning Techniques

0

0

0

0

14:55

26/04/2020

On the Global Convergence of Training Deep Linear ResNets

Difan Zou, Philip M. Long, Quanquan Gu

Keywords Paper

0

0

0

0

4:56

03/05/2021

Large-width functional asymptotics for deep Gaussian neural networks

Daniele Bracale, Stefano Favaro, Sandra Fortini, Stefano Peluchetti

Keywords Paper

deep learning theory, stochastic process, Gaussian process, infinitely wide neural network

0

0

0

0

4:48

06/12/2021

Even your Teacher Needs Guidance: Ground-Truth Targets Dampen Regularization Imposed by Self-Distillation

Kenneth Borup, Lars N Andersen

Keywords Paper

theory, deep learning, optimization

0

0

0

0

6:00

09/07/2020

Tree-projected gradient descent for estimating gradient-sparse parameters on graphs

Sheng Xu, Zhou Fan, Sahand Negahban

Keywords Paper

High-dimensional statistics, Combinatorial optimization, Learning from complex/structured data (e.g. networks, time series), Non-convex optimization

0

0

0

0

16:00

20/07/2020

A type of generalization error induced by initialization in deep neural networks

Yaoyu Zhang, Zhi-Qin John Xu, Tao Luo, Zheng Ma

Keywords Paper

0

0

0

0

17:33

06/12/2021

Efficient Algorithms for Learning Depth-2 Neural Networks with General ReLU Activations

Pranjal Awasthi, Alex Tang, Aravindan Vijayaraghavan

Keywords Paper

theory, deep learning

0

0

0

0

14:31

06/12/2020

Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model

Raphaël Berthier, Francis Bach, Pierre Gaillard

Keywords Paper

Optimization -> Non-Convex Optimization, Deep Learning -> Optimization for Deep Networks

0

0

0

0

3:05

06/12/2020

Beyond Lazy Training for Over-parameterized Tensor Decomposition

Xiang Wang, Chenwei Wu, Jason Lee and
Tengyu Ma, Rong Ge

Keywords Paper

0

0

0

0

3:16

09/07/2020

Kernel and Rich Regimes in Overparametrized Models

Blake E Woodworth, Suriya Gunasekar, Jason Lee and
Edward Moroshko, Pedro Henrique Pamplona Savarese, Itay Golan, Daniel Soudry, Nathan Srebro

Keywords Paper

Neural networks/deep learning,

0

0

0

0

13:29

06/12/2020

Matrix Inference and Estimation in Multi-Layer Models

Parthe Pandit, Moji Sahraee Ardakan, Sundeep Rangan and
Phil Schniter, Alyson Fletcher

Keywords Paper

0

0

0

0

3:24

06/12/2020

A Single-Loop Smoothed Gradient Descent-Ascent Algorithm for Nonconvex-Concave Min-Max Problems

Jiawei Zhang, Peijun Xiao, Ruoyu Sun, Zhiquan Luo

Keywords Paper

0

0

0

0

3:12

06/12/2020

Statistical-Query Lower Bounds via Functional Gradients

Surbhi Goel, Aravind Gollakota, Adam Klivans

Keywords Paper

0

0

0

0

3:24

06/12/2021

Robust Implicit Networks via Non-Euclidean Contractions

Saber Jafarpour, Alexander Davydov, Anton Proskurnikov, Francesco Bullo

Keywords Paper

theory, deep learning, machine learning, robustness, vision

0

0

0

0

14:59

26/04/2020

Gradient Descent Maximizes the Margin of Homogeneous Neural Networks

Kaifeng Lyu, Jian Li

Keywords Paper

margin, homogeneous, gradient descent

0

0

0

0

15:02

18/07/2021

Data-driven Prediction of General Hamiltonian Dynamics via Learning Exactly-Symplectic Maps

Renyi Chen, Molei Tao

Keywords Paper

Algorithms, Time Series and Sequences

0

0

0

0

5:21

09/07/2020

Winnowing with Gradient Descent

Ehsan Amid, Manfred K. Warmuth

Keywords Paper

Online learning,

0

0

0

0

14:22

03/05/2021

Optimal Rates for Averaged Stochastic Gradient Descent under Neural Tangent Kernel Regime

Atsushi Nitanda, Taiji Suzuki

Keywords Paper

stochastic gradient descent, neural tangent kernel, over-parameterization, two-layer neural network

0

0

0

0

18:48

18/07/2021

Besov Function Approximation and Binary Classification on Low-Dimensional Manifolds Using Convolutional Residual Networks

Hao Liu, Minshuo Chen, Tuo Zhao, Wenjing Liao

Keywords Paper

Applications, Computer Vision, , Theory, Deep learning Theory

0

0

0

0

5:14

03/05/2021

Why Are Convolutional Nets More Sample-Efficient than Fully-Connected Nets?

Zhiyuan Li, Yi Zhang, Sanjeev Arora

Keywords Paper

equivariance, fully-connected, sample complexity separation, convolutional neural networks

0

0

0

0

15:18

12/07/2020

Improving the Sample and Communication Complexity for Decentralized Non-Convex Optimization: Joint Gradient Estimation and Tracking

Haoran Sun, Songtao Lu, Mingyi Hong

Keywords Paper

Optimization - Non-convex

0

0

0

0

13:56

12/07/2020

Dynamics of Deep Neural Networks and Neural Tangent Hierarchy

Jiaoyang Huang, Horng-Tzer Yau

Keywords Paper

Deep Learning - Theory

0

0

0

0

15:28

09/07/2020

Learning Polynomials in Few Relevant Dimensions

Sitan Chen, Raghu Meka

Keywords Paper

Regression, Convex optimization, High-dimensional statistics, Non-convex optimization

0

0

0

0

15:03

18/07/2021

Geometry of the Loss Landscape in Overparameterized Neural Networks: Symmetries and Invariances

Berfin Simsek, François Ged, Arthur Jacot and
Francesco Spadaro, Clement Hongler, Wulfram Gerstner, Johanni Brea

Keywords Paper

Theory, Algorithms, Representation Learning, Algorithms, Large Scale Learning; Applications, Natural Language Processing; Deep Learning, Efficient Inference Methods;

0

0

0

0

5:05

06/12/2021

An Improved Analysis of Gradient Tracking for Decentralized Machine Learning

Anastasiia Koloskova, Tao Lin, Sebastian Stich

Keywords Paper

optimization, machine learning

0

0

0

0

7:22

13/04/2021

Learning with gradient descent and weakly convex losses

Dominic Richards, Mike Rabbat

Keywords Paper

0

0

0

0

3:20

06/12/2020

Dynamical mean-field theory for stochastic gradient descent in Gaussian mixture classification

Francesca Mignacco, Florent Krzakala, Pierfrancesco Urbani, Lenka Zdeborová

Keywords Paper

0

0

0

0

3:21

09/07/2020

On the Multiple Descent of Minimum-Norm Interpolants and Restricted Lower Isometry of Kernels

Tengyuan Liang, Alexander Rakhlin, Xiyu Zhai

Keywords Paper

Supervised learning, Excess risk bounds and generalization error bounds, High-dimensional statistics, Kernel methods, Regression

0

0

0

0

14:56