Optimization and Generalization of Shallow Neural Networks with Quadratic Activation Functions

06/12/2020

Optimization and Generalization of Shallow Neural Networks with Quadratic Activation Functions

Stefano Sarao Mannelli, Eric Vanden-Eijnden, Lenka Zdeborová

Keywords: Algorithms -> Model Selection and Structure Learning; Algorithms -> Representation Learning; Theory -> Computational Complexity, Reinforcement Learning and Planning -> Markov Decision Processes

Abstract Paper Similar Papers

Abstract: We study the dynamics of optimization and the generalization properties of one-hidden layer neural networks with quadratic activation function in the overparametrized regime where the layer width m is larger than the input dimension d. We consider a teacher-student scenario where the teacher has the same structure as the student with a hidden layer of smaller width m*<=m. We describe how the empirical loss landscape is affected by the number n of data samples and the width m* of the teacher network. In particular we determine how the probability that there be no spurious minima on the empirical loss depends on n, d, and m*, thereby establishing conditions under which the neural network can in principle recover the teacher. We also show that under the same conditions gradient descent dynamics on the empirical loss converges and leads to small generalization error, i.e. it enables recovery in practice. Finally we characterize the time-convergence rate of gradient descent in the limit of a large number of samples. These results are confirmed by numerical experiments.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2021

Locality defeats the curse of dimensionality in convolutional teacher-student scenarios

Alessandro Favero, Francesco Cagnetta, Matthieu Wyart

Keywords Paper

deep learning, kernel methods

0

0

0

0

13:08

04/08/2021

A Local Convergence Theory for Mildly Over-Parameterized Two-Layer Neural Network

Mo Zhou, Rong Ge, Chi Jin

Keywords Paper

0

0

0

0

17:47

02/02/2021

Stochastic Precision Ensemble: Self-Knowledge Distillation for Quantized Deep Neural Networks

Yoonho Boo, Sungho Shin, Jungwook Choi, Wonyong Sung

Keywords Paper

0

0

0

0

19:03

19/08/2021

Comparing Kullback-Leibler Divergence and Mean Squared Error Loss in Knowledge Distillation

Taehyeon Kim, Jaehoon Oh, Nak Yil Kim and
Sangwook Cho, Se-Young Yun

Keywords Paper

Machine Learning, Classification, Deep Learning

0

0

0

0

12:43

18/07/2021

On Learnability via Gradient Method for Two-Layer ReLU Neural Networks in Teacher-Student Setting

Shunta Akiyama, Taiji Suzuki

Keywords Paper

Theory, Deep learning Theory

0

0

0

0

5:17

06/12/2021

Learning curves of generic features maps for realistic datasets with a teacher-student model

Bruno Loureiro, Cedric Gerbelot, Hugo Cui and
Sebastian Goldt, Florent Krzakala, Marc Mezard, Lenka Zdeborová

Keywords Paper

deep learning, machine learning, kernel methods

0

0

0

0

12:59

14/06/2020

Few Sample Knowledge Distillation for Efficient Network Compression

Tianhong Li, Jianguo Li, Zhuang Liu, Changshui Zhang

Keywords Paper

efficient network compression, few samples, knowledge distillation

0

0

0

0

1:01

03/05/2021

SEED: Self-supervised Distillation For Visual Representation

Jacob Zhiyuan Fang, Jianfeng Wang, Lijuan Wang and
Lei Zhang, 'YZ' Yezhou Yang, Zicheng Liu

Keywords Paper

Representation Learning, Self Supervised Learning, Knowledge Distillation

0

0

0

0

5:09

06/12/2021

Exponential Separation between Two Learning Models and Adversarial Robustness

Grzegorz Gluch, Ruediger Urbanke

Keywords Paper

theory, robustness, adversarial robustness and security

0

0

0

0

15:11

06/12/2020

Curriculum Learning by Dynamic Instance Hardness

Tianyi Zhou, Shengjie Wang, Jeff A Bilmes

Keywords Paper

0

0

0

0

3:24

02/02/2021

Learning to Reweight with Deep Interactions

Yang Fan, Yingce Xia, Lijun Wu and
Shufang Xie, Weiqing Liu, Jiang Bian, Tao Qin, Xiang-Yang Li

Keywords Paper

0

0

0

0

14:06

03/05/2021

Knowledge distillation via softmax regression representation learning

Jing Yang, Brais Martinez, Adrian Bulat, Georgios Tzimiropoulos

Keywords Paper

0

0

0

0

4:56

12/07/2020

Student Specialization in Deep Rectified Networks With Finite Width and Input Dimension

Yuandong Tian

Keywords Paper

Deep Learning - Theory

0

0

0

0

15:31

06/12/2020

Learning to Mutate with Hypergradient Guided Population

Zhiqiang Tao, Yaliang Li, Bolin Ding and
Ce Zhang, Jingren Zhou, Yun Fu

Keywords Paper

0

0

0

0

3:17

06/12/2020

Knowledge Distillation in Wide Neural Networks: Risk Bound, Data Efficiency and Imperfect Teacher

Guangda Ji, Zhanxing Zhu

Keywords Paper

0

0

0

0

3:19

04/08/2021

The Effects of Mild Over-parameterization on the Optimization Landscape of Shallow ReLU Neural Networks

Itay M Safran, Gilad Yehudai, Ohad Shamir

Keywords Paper

0

0

0

0

17:58

03/05/2021

Knowledge Distillation as Semiparametric Inference

Tri Dao, Govinda Kamath, Vasilis Syrgkanis, Lester Mackey

Keywords Paper

generalization bounds, knowledge distillation, model compression, loss correction, orthogonal machine learning, cross-fitting, semiparametric inference

0

0

0

0

5:10

06/12/2021

Spatial Ensemble: a Novel Model Smoothing Mechanism for Student-Teacher Framework

Tengteng Huang, Yifan Sun, Xun Wang and
Haotian Yao, Chi Zhang

Keywords Paper

0

0

0

0

10:07

06/12/2021

Generalization of Model-Agnostic Meta-Learning Algorithms: Recurring and Unseen Tasks

Alireza Fallah, Aryan Mokhtari, Asuman Ozdaglar

Keywords Paper

theory, optimization, meta learning

0

0

0

0

14:42

14/09/2020

Companion Guided Soft Margin for Face Recognition

Yingcheng Su, Yichao Wu, Zhenmao Li and
Ken Chen, Ding Liang, Xiaolin Hu, Junjie Yan

Keywords Paper

face recognition, companion guided soft margin, sample-wise adaptive margin

0

0

0

0

15:44

06/12/2021

Teaching an Active Learner with Contrastive Examples

Chaoqi Wang, Adish Singla, Yuxin Chen

Keywords Paper

optimization, active learning

0

0

0

0

10:10

12/07/2020

Training Neural Networks for and by Interpolation

Leonard Berrada, M. Pawan Kumar, Andrew Zisserman

Keywords Paper

Deep Learning - General

0

0

0

0

16:12

19/04/2021

Annealing knowledge distillation

Aref Jafari, Mehdi Rezagholizadeh, Pranav Sharma, Ali Ghodsi

Keywords Paper

0

0

0

0

12:38

06/12/2021

Training Over-parameterized Models with Non-decomposable Objectives

Harikrishna Narasimhan, Aditya Menon

Keywords Paper

optimization, machine learning, fairness

0

0

0

0

8:28

03/05/2021

Dataset Meta-Learning from Kernel Ridge-Regression

Timothy Nguyen, Zhourong Chen, Jaehoon Lee

Keywords Paper

dataset corruption, infinite-width networks, neural kernels, kernel-ridge regression, dataset compression, dataset distillation, meta-learning

0

0

0

0

4:59

19/08/2021

Self-boosting for Feature Distillation

Yulong Pei, Yanyun Qu, Junping Zhang

Keywords Paper

Computer Vision, 2D and 3D Computer Vision, Recognition

0

0

0

0

12:57

12/07/2020

Time-Consistent Self-Supervision for Semi-Supervised Learning

Tianyi Zhou, Shengjie Wang, Jeff Bilmes

Keywords Paper

Unsupervised and Semi-Supervised Learning

0

0

0

0

14:37

18/07/2021

A statistical perspective on distillation

Aditya Menon, Ankit Singh Rawat, Sashank Jakkam Reddi and
Seungyeon Kim, Sanjiv Kumar

Keywords Paper

Theory, Deep learning Theory

0

0

0

0

4:56

18/07/2021

Zero-Shot Knowledge Distillation from a Decision-Based Black-Box Model

Zi Wang

Keywords Paper

Deep Learning

0

0

0

0

5:08

12/07/2020

Teaching with Limited Information on the Learner's Behaviour

Ferdinando Cicalese, Francisco Sergio de Freitas Filho, Eduardo Laber, Marco Molinaro

Keywords Paper

Learning Theory

0

0

0

0

15:07

14/06/2020

Neural Networks Are More Productive Teachers Than Human Raters: Active Mixup for Data-Efficient Knowledge Distillation From a Blackbox Model

Dongdong Wang, Yandong Li, Liqiang Wang, Boqing Gong

Keywords Paper

blackbox knowledge distillation, data-efficient learning, active learning, mixup

0

0

0

0

4:59

02/02/2021

The Sample Complexity of Teaching by Reinforcement on Q-Learning

Xuezhou Zhang, Shubham Bharti, Yuzhe Ma and
Adish Singla, Xiaojin Zhu

Keywords Paper

0

0

0

0

14:48

03/05/2021

Rethinking Soft Labels for Knowledge Distillation: A Bias–Variance Tradeoff Perspective

Helong Zhou, Liangchen Song, Jiajie Chen and
Ye Zhou, Guoli Wang, Junsong Yuan, Qian Zhang

Keywords Paper

teacher-student model, soft labels, Knowledge distillation

0

0

0

0

2:20

03/05/2021

Meta-learning with negative learning rates

Alberto Bernacchia

Keywords Paper

Meta-learning

0

0

0

0

5:19

06/12/2021

Unsupervised Representation Transfer for Small Networks: I Believe I Can Distill On-the-Fly

Hee Min Choi, Hyoa Kang, Dokwan Oh

Keywords Paper

self-supervised learning, representation learning

0

0

0

0

3:35

13/04/2021

Curriculum learning by optimizing learning dynamics

Tianyi Zhou, Shengjie Wang, Jeff Bilmes

Keywords Paper

0

0

0

0

3:03

13/04/2021

Contrastive learning of strong-mixing continuous-time stochastic processes

Bingbin Liu, Pradeep Ravikumar, Andrej Risteski

Keywords Paper

0

0

0

0

2:57

13/04/2021

Understanding robustness in teacher-student setting: A new perspective

Zhuolin Yang, Zhaoxi Chen, Tiffany Cai and
Xinyun Chen, Bo Li, Yuandong Tian

Keywords Paper

0

0

0

0

3:03

14/06/2020

Distilling Cross-Task Knowledge via Relationship Matching

Han-Jia Ye, Su Lu, De-Chuan Zhan

Keywords Paper

knowledge distillation, model reuse, knowledge transfer, cross-task learning, embedding learning

0

0

0

0

4:54

13/04/2021

Learning with risk-averse feedback under potentially heavy tails

Matthew Holland, El Mehdi Haress

Keywords Paper

0

0

0

0

2:44