Neural Networks Learning and Memorization with (almost) no Over-Parameterization

Abstract: Many results in recent years established polynomial time learnability of various models via neural networks algorithms (e.g. \cite{andoni2014learning, daniely2016toward, daniely2017sgd, cao2019generalization, ziwei2019polylogarithmic, zou2019improved, ma2019comparative, du2018gradient, arora2019fine, song2019quadratic, oymak2019towards, ge2019mildly, brutzkus2018sgd}). However, unless the model is linear separable~\cite{brutzkus2018sgd}, or the activation is a polynomial~\cite{ge2019mildly}, these results require very large networks -- much more than what is needed for the mere existence of a good predictor. In this paper we prove that SGD on depth two neural networks can memorize samples, learn polynomials with bounded weights, and learn certain kernel spaces, with {\em near optimal} network size, sample complexity, and runtime. In particular, we show that SGD on depth two network with $\tilde{O}\left(\frac{m}{d}\right)$ hidden neurons (and hence $\tilde{O}(m)$ parameters) can memorize $m$ random labeled points in $\sphere^{d-1}$.

06/12/2021

Neural Networks Learning and Memorization with (almost) no Over-Parameterization

Amit Daniely

Comments

Similar Papers

Parametric Complexity Bounds for Approximating PDEs with Neural Networks

Tanya Marwah, Zachary Lipton, Andrej Risteski

Keywords Abstract Paper

theory, deep learning, optimization

Avoiding Kernel Fixed Points: Computing with ELU and GELU Infinite Networks

Russell Tsuchida, Tim Pearce, Chris van der Heide and Fred Roosta, Marcus Gallagher

Keywords Abstract Paper

How Much Over-parameterization Is Sufficient to Learn Deep ReLU Networks?

Zixiang Chen, Yuan Cao, Difan Zou, Quanquan Gu

Keywords Abstract Paper

classification, neural tangent kernel, generalization error, (stochastic) gradient descent, deep ReLU networks

Set2Graph: Learning Graphs From Sets

Hadar Serviansky, Nimrod Segol, Jonathan Shlomi and Kyle Cranmer, Eilam Gross, Haggai Maron, Yaron Lipman

Keywords Abstract Paper

Beyond Linearization: On Quadratic and Higher-Order Approximation of Wide Neural Networks

Yu Bai, Jason D. Lee

Keywords Abstract Paper

Neural Tangent Kernels, over-parametrized neural networks, deep learning theory

Deep Unfolding Network for Image Super-Resolution

Kai Zhang, Luc Van Gool, Radu Timofte

Keywords Abstract Paper

super-resolution, unfolding, degradation model, gaussian kernel, deblurring

GLSearch: Maximum Common Subgraph Detection via Learning to Search

Yunsheng Bai, Derek Xu, Yizhou Sun, Wei Wang

Keywords Abstract Paper

Probabilistic Methods, Probabilistic Methods, Causal Inference, Algorithms, Networks and Relational Learning

Kernel and Rich Regimes in Overparametrized Models

Blake E Woodworth, Suriya Gunasekar, Jason Lee and Edward Moroshko, Pedro Henrique Pamplona Savarese, Itay Golan, Daniel Soudry, Nathan Srebro

Keywords Abstract Paper

Neural networks/deep learning,

Training Quantized Neural Networks to Global Optimality via Semidefinite Programming

Burak Bartan, Mert Pilanci

Keywords Abstract Paper

Optimization, Combinatorial Optimization

On Generalization Bounds of a Family of Recurrent Neural Networks

Minshuo Chen, Xingguo Li, Tuo Zhao

Keywords Abstract Paper

MomentumRNN: Integrating Momentum into Recurrent Neural Networks

Tan Nguyen, Richard Baraniuk, Andrea Bertozzi and Stanley Osher, Bao Wang

Keywords Abstract Paper

Span Recovery for Deep Neural Networks with Applications to Input Obfuscation

Rajesh Jayaram, David P. Woodruff, Qiuyi Zhang

Keywords Abstract Paper

Span recovery, low rank neural networks, adversarial attack

Lipschitz constant estimation of Neural Networks via sparse polynomial optimization

Fabian Latorre, Paul Rolland, Volkan Cevher

Keywords Abstract Paper

robust networks, Lipschitz constant, polynomial optimization

Recurrent Quantum Neural Networks

Johannes Bausch

Keywords Abstract Paper

UnICORNN: A recurrent model for learning very long time dependencies

T. Konstantin Rusch, Siddhartha Mishra

Keywords Abstract Paper

Deep Learning, Architectures

Towards Understanding Learning in Neural Networks with Linear Teachers

Roei Sarussi, Alon Brutzkus, Amir Globerson

Keywords Abstract Paper

Probabilistic Methods, Theory, Probabilistic Methods, MCMC

The phase diagram of approximation rates for deep neural networks

Dmitry Yarotsky, Anton Zhevnerchuk

Keywords Abstract Paper

On Universal Equivariant Set Networks

Nimrod Segol, Yaron Lipman

Keywords Abstract Paper

deep learning, universality, set functions, equivariance

A Convergence Analysis of Gradient Descent on Graph Neural Networks

Pranjal Awasthi, Abhimanyu Das, Sreenivas Gollapudi

Keywords Abstract Paper

theory, deep learning, optimization, graph learning

Nimble: Efficiently Compiling Dynamic Neural Networks for Model Inference

Haichen Shen, Jared Roesch, Zhi Chen and wweic Chen, Yong Wu, Mu Li, Vin Sharma, Zachary Tatlock, Yida Wang

Keywords Abstract Paper

Nimble: Efficiently Compiling Dynamic Neural Networks for Model Inference

Haichen Shen, Jared Roesch, Zhi Chen and wweic Chen, Yong Wu, Mu Li, Vin Sharma, Zachary Tatlock, Yida Wang

Keywords Abstract Paper

Keywords Paper

Russell Tsuchida, Tim Pearce, Chris van der Heide and
Fred Roosta, Marcus Gallagher

Keywords Paper

Keywords Paper

Hadar Serviansky, Nimrod Segol, Jonathan Shlomi and
Kyle Cranmer, Eilam Gross, Haggai Maron, Yaron Lipman

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Blake E Woodworth, Suriya Gunasekar, Jason Lee and
Edward Moroshko, Pedro Henrique Pamplona Savarese, Itay Golan, Daniel Soudry, Nathan Srebro

Keywords Paper

Keywords Paper

Keywords Paper

Tan Nguyen, Richard Baraniuk, Andrea Bertozzi and
Stanley Osher, Bao Wang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Haichen Shen, Jared Roesch, Zhi Chen and
wweic Chen, Yong Wu, Mu Li, Vin Sharma, Zachary Tatlock, Yida Wang

Keywords Paper

Haichen Shen, Jared Roesch, Zhi Chen and
wweic Chen, Yong Wu, Mu Li, Vin Sharma, Zachary Tatlock, Yida Wang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Ruqi Zhang, Chunyuan Li, Jianyi Zhang and
Changyou Chen, Andrew Gordon Wilson

Keywords Paper

Keywords Paper

Alexander Camuto, George Deligiannidis, Murat Erdogdu and
Mert Gurbuzbalaban, Umut Simsekli, Lingjiong Zhu

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper