Over-parameterized Adversarial Training: An Analysis Overcoming the Curse of Dimensionality

06/12/2020

Over-parameterized Adversarial Training: An Analysis Overcoming the Curse of Dimensionality

Yi Zhang, Orestis Plevrakis, Simon Du, Xingguo Li, Zhao Song, Sanjeev Arora

Keywords:

Abstract Paper Similar Papers

Abstract: Adversarial training is a popular method to give neural nets robustness against adversarial perturbations. In practice adversarial training leads to low robust training loss. However, a rigorous explanation for why this happens under natural conditions is still missing. Recently a convergence theory of standard (non-adversarial) supervised training was developed by various groups for {\em very overparametrized} nets. It is unclear how to extend these results to adversarial training because of the min-max objective. Recently, a first step towards this direction was made by Gao et al. using tools from online learning, but they require the width of the net to be \emph{exponential} in input dimension $d$, and with an unnatural activation function. Our work proves convergence to low robust training loss for \emph{polynomial} width instead of exponential, under natural assumptions and with ReLU activations. A key element of our proof is showing that ReLU networks near initialization can approximate the step function, which may be of independent interest.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

26/04/2020

Effect of Activation Functions on the Training of Overparametrized Neural Nets

Abhishek Panigrahi, Abhishek Shetty, Navin Goyal

Keywords Paper

activation functions, deep learning theory, neural networks

0

0

0

0

5:13

26/04/2020

Implicit Bias of Gradient Descent based Adversarial Training on Separable Data

Yan Li, Ethan X.Fang, Huan Xu, Tuo Zhao

Keywords Paper

implicit bias, adversarial training, robustness, gradient descent

0

0

0

0

4:53

06/12/2020

The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks

Wei Hu, Lechao Xiao, Ben Adlam, Jeffrey Pennington

Keywords Paper

0

0

0

0

3:20

18/07/2021

On the Explicit Role of Initialization on the Convergence and Implicit Bias of Overparametrized Linear Networks

Hancheng Min, Salma Tarmoun, Rene Vidal, Enrique Mallada

Keywords Paper

Theory

0

0

0

0

5:16

03/05/2021

How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks

Keyulu Xu, Mozhi Zhang, Jingling Li and
Simon Du, Ken-Ichi Kawarabayashi, Stefanie Jegelka

Keywords Paper

graph neural networks, out-of-distribution, deep learning, extrapolation, deep learning theory

0

0

0

1

17:06

26/04/2020

Finite Depth and Width Corrections to the Neural Tangent Kernel

Boris Hanin, Mihai Nica

Keywords Paper

Neural Tangent Kernel, Finite Width Corrections, Random ReLU Net, Wide Networks, Deep Networks

0

0

0

0

5:09

18/07/2021

FL-NTK: A Neural Tangent Kernel-based Framework for Federated Learning Analysis

Baihe Huang, Xiaoxiao Li, Zhao Song, Xin Yang

Keywords Paper

Theory, Deep learning Theory

0

0

0

0

4:49

06/12/2020

Towards Better Generalization of Adaptive Gradient Methods

Yingxue Zhou, Belhal Karimi, Jinxing Yu and
Zhiqiang Xu, Ping Li

Keywords Paper

0

0

0

0

3:21

06/12/2021

Gradient Descent on Two-layer Nets: Margin Maximization and Simplicity Bias

Kaifeng Lyu, Zhiyuan Li, Runzhe Wang, Sanjeev Arora

Keywords Paper

deep learning, optimization, machine learning

0

0

0

0

14:56

02/02/2021

Large Batch Optimization for Deep Learning Using New Complete Layer-Wise Adaptive Rate Scaling

Zhouyuan Huo, Bin Gu, Heng Huang

Keywords Paper

0

0

0

0

15:17

14/06/2020

On the Acceleration of Deep Learning Model Parallelism With Staleness

An Xu, Zhouyuan Huo, Heng Huang

Keywords Paper

layer-wise staleness, asynchronous model parallelism, convolutional neural networks.

0

0

0

0

1:01

03/05/2021

How Much Over-parameterization Is Sufficient to Learn Deep ReLU Networks?

Zixiang Chen, Yuan Cao, Difan Zou, Quanquan Gu

Keywords Paper

classification, neural tangent kernel, generalization error, (stochastic) gradient descent, deep ReLU networks

0

0

0

0

4:44

14/06/2020

Towards Unified INT8 Training for Convolutional Neural Network

Feng Zhu, Ruihao Gong, Fengwei Yu and
Xianglong Liu, Yanfei Wang, Zhelong Li, Xiuqi Yang, Junjie Yan

Keywords Paper

int8 training, gradient quantization, direction sensitive gradient clipping, learning rate scaling, gradient distribution

0

0

0

0

1:01

02/02/2021

Fast and Scalable Adversarial Training of Kernel SVM via Doubly Stochastic Gradients

Huimin Wu, Zhengmian Hu, Bin Gu

Keywords Paper

0

0

0

0

14:04

18/07/2021

The Implicit Bias for Adaptive Optimization Algorithms on Homogeneous Neural Networks

Bohan Wang, Qi Meng, Wei Chen, Tie-Yan Liu

Keywords Paper

Theory, Deep learning Theory

0

0

0

0

16:53

18/07/2021

Training Adversarially Robust Sparse Networks via Bayesian Connectivity Sampling

Ozan Özdenizci, Robert Legenstein

Keywords Paper

Algorithms, Adversarial Examples

0

0

0

1

6:27

12/07/2020

Training Binary Neural Networks using the Bayesian Learning Rule

Xiangming Meng, Roman Bachmann, Mohammad Emtiyaz Khan

Keywords Paper

Deep Learning - General

0

0

0

0

10:27

26/04/2020

Provable Benefit of Orthogonal Initialization in Optimizing Deep Linear Networks

Wei Hu, Lechao Xiao, Jeffrey Pennington

Keywords Paper

deep learning theory, non-convex optimization, orthogonal initialization

0

0

0

0

5:10

06/12/2020

Improved Analysis of Clipping Algorithms for Non-convex Optimization

Bohang Zhang, Jikai Jin, Cong Fang, Liwei Wang

Keywords Paper

0

0

0

0

3:16

06/12/2021

Proxy Convexity: A Unified Framework for the Analysis of Neural Networks Trained by Gradient Descent

Spencer Frei, Quanquan Gu

Keywords Paper

deep learning, optimization

0

0

0

0

10:33

06/12/2020

In search of robust measures of generalization

Gintare Karolina Dziugaite, Alexandre Drouin, Brady Neal and
Nitarshan Rajkumar, Ethan Caballero, Linbo Wang, Ioannis Mitliagkas, Dan Roy

Keywords Paper

0

0

0

0

3:20

06/12/2020

Why Do Deep Residual Networks Generalize Better than Deep Feedforward Networks? --- A Neural Tangent Kernel Perspective

Kaixuan Huang, Yuqing Wang, Molei Tao, Tuo Zhao

Keywords Paper

Algorithms -> Uncertainty Estimation; Theory -> Frequentist Statistics; Theory -> Large Deviations and Asymptotic Analysis; The, Algorithms -> Kernel Methods

0

0

0

0

2:59

03/05/2021

A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient Descent Exponentially Favors Flat Minima

Zeke Xie, Issei Sato, Masashi Sugiyama

Keywords Paper

flat minima, SGD, deep learning dynamics, stochastic optimization, diffusion

0

0

0

0

4:37

06/12/2020

A Generalized Neural Tangent Kernel Analysis for Two-layer Neural Networks

Zixiang Chen, Yuan Cao, Quanquan Gu, Tong Zhang

Keywords Paper

0

0

0

0

3:16

06/12/2021

Intrinsic Dimension, Persistent Homology and Generalization in Neural Networks

Tolga Birdal, Aaron Lou, Leonidas Guibas, Umut Simsekli

Keywords Paper

theory, deep learning, optimization

0

0

0

0

14:38

26/04/2020

Escaping Saddle Points Faster with Stochastic Momentum

Jun-Kun Wang, Chi-Heng Lin, Jacob Abernethy

Keywords Paper

SGD, momentum, escaping saddle point

0

0

0

0

5:26

13/04/2021

On the generalization properties of adversarial training

Yue Xing, Qifan Song, Guang Cheng

Keywords Paper

0

0

0

0

3:05

06/12/2021

Integrated Latent Heterogeneity and Invariance Learning in Kernel Space

Jiashuo Liu, Zheyuan Hu, Peng Cui and
Bo Li, Zheyan Shen

Keywords Paper

deep learning, reinforcement learning and planning, machine learning

0

0

0

0

11:11

12/07/2020

Towards a General Theory of Infinite-Width Limits of Neural Classifiers

Eugene Golikov

Keywords Paper

Deep Learning - Theory

0

0

0

0

14:29

09/07/2020

Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss

Lénaïc Chizat, Francis Bach

Keywords Paper

Neural networks/deep learning, Non-convex optimization

0

0

0

0

14:41

06/12/2020

Learning Parities with Neural Networks

Amit Daniely, Eran Malach

Keywords Paper

0

0

0

0

3:21

06/12/2021

Boost Neural Networks by Checkpoints

Feng Wang, Guoyizhe Wei, Qiao Liu and
Jinxiang Ou, xian wei, Hairong Lv

Keywords Paper

deep learning

1

0

0

0

4:45

06/12/2021

What can linearized neural networks actually say about generalization?

Guillermo Ortiz-Jimenez, Seyed-Mohsen Moosavi-Dezfooli, Pascal Frossard

Keywords Paper

theory, deep learning

0

0

0

0

9:46

06/12/2021

A self consistent theory of Gaussian Processes captures feature learning effects in finite CNNs

Gadi Naveh, Zohar Ringel

Keywords Paper

theory, deep learning, optimization, kernel methods

0

0

0

0

9:13

03/05/2021

Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data

Colin Wei, Kendrick Shen, Yining Chen, Tengyu Ma

Keywords Paper

deep learning theory, semi-supervised learning theory, unsupervised learning theory, domain adaptation theory

1

1

0

0

14:46

14/06/2020

Fast Sparse ConvNets

Erich Elsen, Marat Dukhan, Trevor Gale, Karen Simonyan

Keywords Paper

vision, convolutional networks, cnns, efficient inference, sparsity, mobile, edge, tensorflow, xnnpack

0

0

0

0

1:01

06/12/2021

When Expressivity Meets Trainability: Fewer than $n$ Neurons Can Work

Jiawei Zhang, Yushun Zhang, Mingyi Hong and
Ruoyu Sun, Zhi-Quan Luo

Keywords Paper

deep learning, optimization

0

0

0

0

11:22

03/05/2021

Net-DNF: Effective Deep Modeling of Tabular Data

Liran Katzir, Gal Elidan, Ran El-Yaniv

Keywords Paper

Neural Networks, Predictive Modeling, Tabular Data, Architectures

0

0

0

0

5:10

22/11/2021

FFNB: Forgetting-Free Neural Blocks for Deep Continual Learning

Hichem Sahbi, Haoming Zhan

Keywords Paper

Continual and incremental learning, lifelong learning, catastrophic interference, catastrophic forgetting, dynamic neural networks, visual recognition

0

0

0

0

3:05

19/08/2021

On the Neural Tangent Kernel of Deep Networks with Orthogonal Initialization

Wei Huang, Weitao Du, Richard Yi Da Xu

Keywords Paper

Machine Learning, Deep Learning, Learning Theory

0

0

0

0

14:51