Why Are Convolutional Nets More Sample-Efficient than Fully-Connected Nets?

03/05/2021

Why Are Convolutional Nets More Sample-Efficient than Fully-Connected Nets?

Zhiyuan Li, Yi Zhang, Sanjeev Arora

Keywords: equivariance, fully-connected, sample complexity separation, convolutional neural networks

Abstract Paper Similar Papers

Abstract: Convolutional neural networks often dominate fully-connected counterparts in generalization performance, especially on image classification tasks. This is often explained in terms of \textquotedblleft better inductive bias.\textquotedblright\ However, this has not been made mathematically rigorous, and the hurdle is that the sufficiently wide fully-connected net can always simulate the convolutional net. Thus the training algorithm plays a role. The current work describes a natural task on which a provable sample complexity gap can be shown, for standard training algorithms. We construct a single natural distribution on $\mathbb{R}^d\times\{\pm 1\}$ on which any orthogonal-invariant algorithm (i.e. fully-connected networks trained with most gradient-based methods from gaussian initialization) requires $\Omega(d^2)$ samples to generalize while $O(1)$ samples suffice for convolutional architectures. Furthermore, we demonstrate a single target function, learning which on all possible distributions leads to an $O(1)$ vs $\Omega(d^2/\varepsilon)$ gap. The proof relies on the fact that SGD on fully-connected network is orthogonal equivariant. Similar results are achieved for $\ell_2$ regression and adaptive training algorithms, e.g. Adam and AdaGrad, which are only permutation equivariant.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICLR 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

07/09/2020

Mish: A Self Regularized Non-Monotonic Activation Function

Diganta Misra

Keywords Paper

activation functions, non-linear dynamics, loss landscapes

0

0

0

0

10:37

26/04/2020

Finite Depth and Width Corrections to the Neural Tangent Kernel

Boris Hanin, Mihai Nica

Keywords Paper

Neural Tangent Kernel, Finite Width Corrections, Random ReLU Net, Wide Networks, Deep Networks

0

0

0

0

5:09

03/05/2021

Deep Networks and the Multiple Manifold Problem

Sam Buchanan, Dar Gilboa, John Wright

Keywords Paper

low-dimensional structure, overparameterized neural networks, deep learning

0

0

0

0

5:14

26/04/2020

Span Recovery for Deep Neural Networks with Applications to Input Obfuscation

Rajesh Jayaram, David P. Woodruff, Qiuyi Zhang

Keywords Paper

Span recovery, low rank neural networks, adversarial attack

0

0

0

0

5:19

22/11/2021

SamplingAug: On the Importance of Patch Sampling Augmentation for Single Image Super-Resolution

Shizun Wang, Ming Lu, Kaixin Chen and
Jiaming Liu, Xiaoqi Li, Chuang Zhang, Ming Wu

Keywords Paper

Super-Resolution, Patch Sampling

0

0

0

0

2:18

09/07/2020

A Corrective View of Neural Networks: Representation, Memorization and Learning

Dheeraj M Nagaraj, Guy Bresler

Keywords Paper

Neural networks/deep learning, Learning with algebraic or combinatorial structure, Supervised learning

0

0

0

0

13:38

03/05/2021

Global Convergence of Three-layer Neural Networks in the Mean Field Regime

Huy Tuan Pham, Phan-Minh Nguyen

Keywords Paper

deep learning theory

0

0

0

0

15:41

18/07/2021

Towards Certifying L-infinity Robustness using Neural Networks with L-inf-dist Neurons

Bohang Zhang, Tianle Cai, Zhou Lu and
Di He, Liwei Wang

Keywords Paper

Algorithms, Adversarial Examples

0

0

0

0

5:11

03/05/2021

Vector-output ReLU Neural Network Problems are Copositive Programs: Convex Analysis of Two Layer Networks and Polynomial-time Algorithms

Arda Sahiner, Tolga Ergen, John M Pauly, Mert Pilanci

Keywords Paper

convolutional neural networks, convex duality, copositive programming, nonnegative PCA, semi-nonnegative matrix factorization, computational complexity, global optima, semi-infinite duality, theory, convex optimization, neural networks

0

0

0

0

6:08

18/07/2021

Besov Function Approximation and Binary Classification on Low-Dimensional Manifolds Using Convolutional Residual Networks

Hao Liu, Minshuo Chen, Tuo Zhao, Wenjing Liao

Keywords Paper

Applications, Computer Vision, , Theory, Deep learning Theory

0

0

0

0

5:14

06/12/2020

Enabling certification of verification-agnostic networks via memory-efficient semidefinite programming

Sumanth Dathathri, Krishnamurthy Dvijotham, Alexey Kurakin and
Aditi Raghunathan, Jonathan Uesato, Rudy Bunel, Shreya Shankar, Jacob Steinhardt, Ian Goodfellow, Percy Liang, Pushmeet Kohli

Keywords Paper

0

0

0

0

3:23

02/02/2021

A Recipe for Global Convergence Guarantee in Deep Neural Networks

Kenji Kawaguchi, Qingyun Sun

Keywords Paper

0

0

0

0

17:15

18/07/2021

A Functional Perspective on Learning Symmetric Functions with Neural Networks

Aaron Zweig, Joan Bruna

Keywords Paper

Theory, Deep learning Theory

0

0

0

0

5:36

02/02/2021

DPFPS: Dynamic and Progressive Filter Pruning for Compressing Convolutional Neural Networks from Scratch

Xiaofeng Ruan, Yufan Liu, Bing Li and
Chunfeng Yuan, Weiming Hu

Keywords Paper

0

0

0

0

14:38

06/12/2020

Non-Euclidean Universal Approximation

Anastasis Kratsios, Eugene Bilokopytov

Keywords Paper

0

0

0

0

3:34

22/11/2021

Adaptive End-to-End Budgeted Network Learning via Inverse Scale Space

Zuyuan Zhong, Chen Liu, Yanwei Fu

Keywords Paper

deep learning, network architecture, growing network, budgeted network learning, pruning

0

0

0

0

2:58

06/12/2020

Asymptotic normality and confidence intervals for derivatives of 2-layers neural network in the random features model

Yiwei Shen, Pierre C Bellec

Keywords Paper

0

0

0

0

3:12

06/12/2021

Stochastic Solutions for Linear Inverse Problems using the Prior Implicit in a Denoiser

Zahra Kadkhodaie, Eero P Simoncelli

Keywords Paper

deep learning, self-supervised learning

0

0

0

0

14:45

06/12/2021

It Has Potential: Gradient-Driven Denoisers for Convergent Solutions to Inverse Problems

Regev Cohen, Yochai Blau, Daniel Freedman, Ehud Rivlin

Keywords Paper

deep learning

0

0

0

0

5:22

06/12/2021

The Implicit Bias of Minima Stability: A View from Function Space

Rotem Mulayoff, Tomer Michaeli, Daniel Soudry

Keywords Paper

deep learning, optimization

0

0

0

0

13:51

09/07/2020

Kernel and Rich Regimes in Overparametrized Models

Blake E Woodworth, Suriya Gunasekar, Jason Lee and
Edward Moroshko, Pedro Henrique Pamplona Savarese, Itay Golan, Daniel Soudry, Nathan Srebro

Keywords Paper

Neural networks/deep learning,

0

0

0

0

13:29

06/12/2021

Analysis of one-hidden-layer neural networks via the resolvent method

Vanessa Piccolo, Dominik Schröder

Keywords Paper

theory, deep learning

0

0

0

0

11:28

26/04/2020

Cyclical Stochastic Gradient MCMC for Bayesian Deep Learning

Ruqi Zhang, Chunyuan Li, Jianyi Zhang and
Changyou Chen, Andrew Gordon Wilson

Keywords Paper

0

0

0

0

14:59

06/12/2020

A Single-Loop Smoothed Gradient Descent-Ascent Algorithm for Nonconvex-Concave Min-Max Problems

Jiawei Zhang, Peijun Xiao, Ruoyu Sun, Zhiquan Luo

Keywords Paper

0

0

0

0

3:12

03/05/2021

Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch

Aojun Zhou, Yukun Ma, Junnan Zhu and
Jianbo Liu, Zhijie Zhang, Kun Yuan, Wenxiu Sun, Hongsheng Li

Keywords Paper

sparsity, efficient training and inference.

0

0

0

0

5:09

06/12/2021

Subquadratic Overparameterization for Shallow Neural Networks

ChaeHwan Song, Ali Ramezani-Kebrya, Thomas Pethick and
Armin Eftekhari, Volkan Cevher

Keywords Paper

theory, deep learning, optimization

0

0

0

0

5:23

14/06/2020

Deep Unfolding Network for Image Super-Resolution

Kai Zhang, Luc Van Gool, Radu Timofte

Keywords Paper

super-resolution, unfolding, degradation model, gaussian kernel, deblurring

0

0

0

0

1:01

26/04/2020

Effect of Activation Functions on the Training of Overparametrized Neural Nets

Abhishek Panigrahi, Abhishek Shetty, Navin Goyal

Keywords Paper

activation functions, deep learning theory, neural networks

0

0

0

0

5:13

06/12/2020

ISTA-NAS: Efficient and Consistent Neural Architecture Search by Sparse Coding

Yibo Yang, Hongyang Li, Shan You and
Fei Wang, Chen Qian, Zhouchen Lin

Keywords Paper

0

0

0

0

3:19

26/04/2020

Implicit Bias of Gradient Descent based Adversarial Training on Separable Data

Yan Li, Ethan X.Fang, Huan Xu, Tuo Zhao

Keywords Paper

implicit bias, adversarial training, robustness, gradient descent

0

0

0

0

4:53

18/07/2021

Data-driven Prediction of General Hamiltonian Dynamics via Learning Exactly-Symplectic Maps

Renyi Chen, Molei Tao

Keywords Paper

Algorithms, Time Series and Sequences

0

0

0

0

5:21

12/07/2020

Disentangling Trainability and Generalization in Deep Neural Networks

Lechao Xiao, Jeffrey Pennington, Samuel Schoenholz

Keywords Paper

Deep Learning - Theory

0

0

0

0

15:10

26/04/2020

Fast Neural Network Adaptation via Parameter Remapping and Architecture Search

Jiemin Fang, Yuzhu Sun, Kangjian Peng* and
Qian Zhang, Yuan Li, Wenyu Liu, Xinggang Wang

Keywords Paper

1

0

0

0

4:39

18/07/2021

Understanding self-supervised learning dynamics without contrastive pairs

Yuandong Tian, Xinlei Chen, Surya Ganguli

Keywords Paper

Deep Learning, Optimization for Deep Networks

0

0

0

0

18:16

02/02/2021

Asynchronous Optimization Methods for Efficient Training of Deep Neural Networks with Guarantees

Vyacheslav Kungurtsev, Malcolm Egan, Bapi Chatterjee, Dan Alistarh

Keywords Paper

0

0

0

0

19:56

14/06/2020

Discrete Model Compression With Resource Constraint for Deep Neural Networks

Shangqian Gao, Feihu Huang, Jian Pei, Heng Huang

Keywords Paper

covutional neural networks, model compression, channel pruning, discrete optimization

0

0

0

0

1:01

12/07/2020

Second-Order Provable Defenses against Adversarial Attacks

Sahil Singla, Soheil Feizi

Keywords Paper

Adversarial Examples

0

0

0

0

12:45

14/06/2020

Generalized Zero-Shot Learning via Over-Complete Distribution

Rohit Keshari, Richa Singh, Mayank Vatsa

Keywords Paper

deep learning, zero-shot leaning, cvae, triplet loss, center loss

0

0

0

0

0:50

06/12/2020

NVAE: A Deep Hierarchical Variational Autoencoder

Arash Vahdat, Jan Kautz

Keywords Paper

0

0

0

0

3:37

12/07/2020

Superpolynomial Lower Bounds for Learning One-Layer Neural Networks using Gradient Descent

Surbhi Goel, Aravind Gollakota, Zhihan Jin and
Sushrut Karmalkar, Adam Klivans

Keywords Paper

Learning Theory

0

0

0

0

14:38