Generalization of Two-layer Neural Networks: An Asymptotic Viewpoint

26/04/2020

Generalization of Two-layer Neural Networks: An Asymptotic Viewpoint

Jimmy Ba, Murat Erdogdu, Taiji Suzuki, Denny Wu, Tianzong Zhang

Keywords: Neural Networks, Generalization, High-dimensional Statistics

Abstract Paper Similar Papers

Abstract: This paper investigates the generalization properties of two-layer neural networks in high-dimensions, i.e. when the number of samples $n$, features $d$, and neurons $h$ tend to infinity at the same rate. Specifically, we derive the exact population risk of the unregularized least squares regression problem with two-layer neural networks when either the first or the second layer is trained using a gradient flow under different initialization setups. When only the second layer coefficients are optimized, we recover the \textit{double descent} phenomenon: a cusp in the population risk appears at $h\approx n$ and further overparameterization decreases the risk. In contrast, when the first layer weights are optimized, we highlight how different scales of initialization lead to different inductive bias, and show that the resulting risk is \textit{independent} of overparameterization. Our theoretical and experimental results suggest that previously studied model setups that provably give rise to \textit{double descent} might not translate to optimizing two-layer neural networks.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICLR 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2020

Asymptotic normality and confidence intervals for derivatives of 2-layers neural network in the random features model

Yiwei Shen, Pierre C Bellec

Keywords Paper

0

0

0

0

3:12

03/05/2021

When does preconditioning help or hurt generalization?

Shun-ichi Amari, Jimmy Ba, Roger Grosse and
Chen Li, Atsushi Nitanda, Taiji Suzuki, Denny Wu, Ji Xu

Keywords Paper

high-dimensional asymptotics, generalization, second-order optimization, natural gradient descent

0

0

0

0

5:21

18/07/2021

On the Generalization Power of Overfitted Two-Layer Neural Tangent Kernel Models

Peizhong Ju, Xiaojun Lin, Ness Shroff

Keywords Paper

Theory, Models of Learning and Generalization

0

0

0

0

5:16

06/12/2021

On the Role of Optimization in Double Descent: A Least Squares Study

Ilja Kuzborskij, Csaba Szepesvari, Omar Rivasplata and
Amal Rannen-Triki, Razvan Pascanu

Keywords Paper

theory, deep learning, optimization

0

0

0

0

13:48

12/07/2020

Rethinking Bias-Variance Trade-off for Generalization of Neural Networks

Zitong Yang, Yaodong Yu, Chong You and
Jacob Steinhardt, Yi Ma

Keywords Paper

Deep Learning - Theory

0

0

0

0

12:54

09/07/2020

On the Multiple Descent of Minimum-Norm Interpolants and Restricted Lower Isometry of Kernels

Tengyuan Liang, Alexander Rakhlin, Xiyu Zhai

Keywords Paper

Supervised learning, Excess risk bounds and generalization error bounds, High-dimensional statistics, Kernel methods, Regression

0

0

0

0

14:56

06/12/2020

Agnostic Learning of a Single Neuron with Gradient Descent

Spencer Frei, Yuan Cao, Quanquan Gu

Keywords Paper

0

0

0

0

3:10

18/07/2021

T-SCI: A Two-Stage Conformal Inference Algorithm with Guaranteed Coverage for Cox-MLP

Jiaye Teng, Zeren Tan, Yang Yuan

Keywords Paper

Algorithms, Others

0

0

0

0

4:43

13/04/2021

Learning with gradient descent and weakly convex losses

Dominic Richards, Mike Rabbat

Keywords Paper

0

0

0

0

3:20

18/07/2021

Besov Function Approximation and Binary Classification on Low-Dimensional Manifolds Using Convolutional Residual Networks

Hao Liu, Minshuo Chen, Tuo Zhao, Wenjing Liao

Keywords Paper

Applications, Computer Vision, , Theory, Deep learning Theory

0

0

0

0

5:14

06/12/2020

Overfitting Can Be Harmless for Basis Pursuit, But Only to a Degree

Peizhong Ju, Xiaojun Lin, Jia Liu

Keywords Paper

0

0

0

0

3:16

06/12/2020

Triple descent and the two kinds of overfitting: where & why do they appear?

Stéphane d'Ascoli, Levent Sagun, Giulio Biroli

Keywords Paper

Algorithms -> Active Learning; Algorithms -> Classification; Algorithms -> Ranking and Preference Learning, Theory -> Learning Theory

0

0

0

0

3:28

06/12/2020

When Do Neural Networks Outperform Kernel Methods?

Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, Andrea Montanari

Keywords Paper

0

0

0

0

3:16

06/12/2020

Generalization error in high-dimensional perceptrons: Approaching Bayes error with convex optimization

Benjamin Aubin, Florent Krzakala, Yue Lu, Lenka Zdeborová

Keywords Paper

0

0

0

0

3:08

09/07/2020

A Corrective View of Neural Networks: Representation, Memorization and Learning

Dheeraj M Nagaraj, Guy Bresler

Keywords Paper

Neural networks/deep learning, Learning with algebraic or combinatorial structure, Supervised learning

0

0

0

0

13:38

06/12/2020

A Bayesian Nonparametrics View into Deep Representations

Michał Jamroż, Marcin Kurdziel, Mateusz Opala

Keywords Paper

0

0

0

0

3:18

06/12/2020

Statistical-Query Lower Bounds via Functional Gradients

Surbhi Goel, Aravind Gollakota, Adam Klivans

Keywords Paper

0

0

0

0

3:24

14/09/2020

Effective Version Space Reduction for Convolutional Neural Networks

Jiayu Liu, Ioannis Chiotellis, Rudolph Triebel , Daniel Cremers

Keywords Paper

active learning, deep learning, version space, diameter reduction

0

0

0

0

14:45

03/05/2021

Early Stopping in Deep Networks: Double Descent and How to Eliminate it

Reinhard Heckel, Fatih Furkan Yilmaz

Keywords Paper

early stopping, double descent

0

0

0

0

4:37

12/07/2020

Double Trouble in Double Descent: Bias and Variance(s) in the Lazy Regime

Stéphane d'Ascoli, Maria Refinetti, Giulio Biroli, Florent Krzakala

Keywords Paper

Deep Learning - Theory

0

0

0

0

15:11

06/12/2021

The Implicit Bias of Minima Stability: A View from Function Space

Rotem Mulayoff, Tomer Michaeli, Daniel Soudry

Keywords Paper

deep learning, optimization

0

0

0

0

13:51

06/12/2021

A single gradient step finds adversarial examples on random two-layers neural networks

Sebastien Bubeck, Yeshwanth Cherapanamjeri, Gauthier Gidel, Remi Tachet des Combes

Keywords Paper

theory, deep learning, optimization, adversarial robustness and security

0

0

0

0

13:02

18/07/2021

Exact Gap between Generalization Error and Uniform Convergence in Random Feature Models

Zitong Yang, Yu Bai, Song Mei

Keywords Paper

Theory, Deep learning Theory

0

0

0

0

5:40

06/12/2021

Local Signal Adaptivity: Provable Feature Learning in Neural Networks Beyond Kernels

Stefani Karp, Ezra Winston, Yuanzhi Li, Aarti Singh

Keywords Paper

theory, deep learning, optimization, machine learning, vision, kernel methods

0

0

0

0

13:22

13/04/2021

Regularization matters: A nonparametric perspective on overparametrized neural network

Tianyang Hu, Wenjia Wang, Cong Lin, Guang Cheng

Keywords Paper

0

0

0

0

3:19

12/07/2020

Towards Understanding the Dynamics of the First-Order Adversaries

Zhun Deng, Hangfeng He, Jiaoyang Huang, Weijie Su

Keywords Paper

Adversarial Examples

0

0

0

0

11:05

26/04/2020

Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks

Sanjeev Arora, Simon S. Du, Zhiyuan Li and
Ruslan Salakhutdinov, Ruosong Wang, Dingli Yu

Keywords Paper

small data, neural tangent kernel, UCI database, few-shot learning, kernel SVMs, deep learning theory, kernel design

0

0

0

0

5:02

03/05/2021

Rethinking the Role of Gradient-based Attribution Methods for Model Interpretability

Suraj Srinivas, François Fleuret

Keywords Paper

Interpretability, saliency maps, score-matching

0

0

0

0

15:08

18/07/2021

Provable Generalization of SGD-trained Neural Networks of Any Width in the Presence of Adversarial Label Noise

Spencer Frei, Yuan Cao, Quanquan Gu

Keywords Paper

Applications, Fairness, Accountability, and Transparency, Algorithms, Classification; Algorithms, Online Learning, Theory, Deep learning Theory

0

0

0

0

5:00

18/07/2021

On the Explicit Role of Initialization on the Convergence and Implicit Bias of Overparametrized Linear Networks

Hancheng Min, Salma Tarmoun, Rene Vidal, Enrique Mallada

Keywords Paper

Theory

0

0

0

0

5:16

09/07/2020

Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK

Yuanzhi Li, Tengyu Ma, Hongyang R Zhang

Keywords Paper

Neural networks/deep learning, Matrix/tensor estimation, Non-convex optimization

0

0

0

0

14:08

06/12/2021

When Are Solutions Connected in Deep Networks?

Quynh Nguyen, Pierre Bréchet, Marco Mondelli

Keywords Paper

theory, deep learning, optimization

0

0

0

0

14:44

12/07/2020

The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization

Ben Adlam, Jeffrey Pennington

Keywords Paper

Deep Learning - Theory

0

0

0

0

14:24

26/04/2020

Sampling-Free Learning of Bayesian Quantized Neural Networks

Jiahao Su, Milan Cvitkovic, Furong Huang

Keywords Paper

Bayesian neural networks, Quantized neural networks

0

0

0

0

4:45

06/12/2021

The Limitations of Large Width in Neural Networks: A Deep Gaussian Process Perspective

Geoff Pleiss, John Cunningham

Keywords Paper

deep learning, kernel methods

0

0

0

0

6:59

06/12/2021

Bayesian Adaptation for Covariate Shift

Aurick Zhou, Sergey Levine

Keywords Paper

deep learning, machine learning, robustness, vision, domain adaptation

0

0

0

0

8:21

06/12/2021

A universal probabilistic spike count model reveals ongoing modulation of neural variability

David Liu, Mate Lengyel

Keywords Paper

generative model, kernel methods

0

0

0

0

15:06

18/07/2021

Bayesian Deep Learning via Subnetwork Inference

Erik Daxberger, Eric Nalisnick, James Allingham and
Javier Antorán, Jose Miguel Hernandez-Lobato

Keywords Paper

, Reinforcement Learning and Planning, Multi-Agent RL, Deep Learning, Bayesian Deep Learning

0

0

0

0

5:18

06/12/2021

Analysis of one-hidden-layer neural networks via the resolvent method

Vanessa Piccolo, Dominik Schröder

Keywords Paper

theory, deep learning

0

0

0

0

11:28

13/04/2021

A dynamical view on optimization algorithms of overparameterized neural networks

Zhiqi Bu, Shiyun Xu, Kan Chen

Keywords Paper

0

0

0

0

3:05