On the training dynamics of deep networks with $L_2$ regularization

06/12/2020

On the training dynamics of deep networks with $L_2$ regularization

Aitor Lewkowycz, Guy Gur-Ari

Keywords:

Abstract Paper Similar Papers

Abstract: We study the role of $L_2$ regularization in deep learning, and uncover simple relations between the performance of the model, the $L_2$ coefficient, the learning rate, and the number of training steps. These empirical relations hold when the network is overparameterized. They can be used to predict the optimal regularization parameter of a given model. In addition, based on these observations we propose a dynamical schedule for the regularization parameter that improves performance and speeds up training. We test these proposals in modern image classification settings. Finally, we show that these empirical relations can be understood theoretically in the context of infinitely wide networks. We derive the gradient flow dynamics of such networks, and compare the role of $L_2$ regularization in this context with that of linear models.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2021

Even your Teacher Needs Guidance: Ground-Truth Targets Dampen Regularization Imposed by Self-Distillation

Kenneth Borup, Lars N Andersen

Keywords Paper

theory, deep learning, optimization

0

0

0

0

6:00

03/05/2021

Initialization and Regularization of Factorized Neural Layers

Misha Khodak, Neil Tenenholtz, Lester Mackey, Nicolo Fusi

Keywords Paper

matrix factorization, knowledge distillation, multi-head attention, model compression

0

0

0

0

4:25

06/12/2021

A Mathematical Framework for Quantifying Transferability in Multi-source Transfer Learning

Xinyi Tong, Xiangxiang Xu, Shao-Lun Huang, Lizhong Zheng

Keywords Paper

theory, deep learning, machine learning, vision, transfer learning

2

1

0

0

13:27

06/12/2021

$(\textrm{Implicit})^2$: Implicit Layers for Implicit Representations

Zhichun Huang, Shaojie Bai, J. Zico Kolter

Keywords Paper

deep learning, representation learning

1

0

0

1

12:23

06/12/2020

Conditioning and Processing: Techniques to Improve Information-Theoretic Generalization Bounds

Hassan Hafez-Kolahi, Zeinab Golgooni, Shohreh Kasaei, Mahdieh Soleymani

Keywords Paper

0

0

0

0

3:25

18/07/2021

Prediction-Centric Learning of Independent Cascade Dynamics from Partial Observations

Mateusz Wilinski, Andrey Lokhov

Keywords Paper

Probabilistic Methods, Approximate Inference

0

0

0

0

6:26

05/01/2021

Holistic Filter Pruning for Efficient Deep Neural Networks

Lukas Enderich, Fabian Timm, Wolfram Burgard

Keywords Paper

0

0

0

0

5:00

06/12/2021

Gradient Starvation: A Learning Proclivity in Neural Networks

Mohammad Pezeshki, Oumar Kaba, Yoshua Bengio and
Aaron Courville, Doina Precup, Guillaume Lajoie

Keywords Paper

theory, deep learning, optimization, robustness

0

0

0

0

10:52

06/12/2021

Model, sample, and epoch-wise descents: exact solution of gradient flow in the random feature model

Antoine Bodin, Nicolas Macris

Keywords Paper

deep learning, optimization

0

0

0

0

15:00

26/04/2020

Asymptotics of Wide Networks from Feynman Diagrams

Ethan Dyer, Guy Gur-Ari

Keywords Paper

0

0

0

0

4:40

19/08/2021

Regularising Knowledge Transfer by Meta Functional Learning

Pan Li, Yanwei Fu, Shaogang Gong

Keywords Paper

Machine Learning, Classification, Transfer, Adaptation, Multi-task Learning, Weakly Supervised Learning

0

0

0

0

13:41

06/12/2021

Constrained Robust Submodular Partitioning

Shengjie Wang, Tianyi Zhou, Chandrashekhar Lavania, Jeff A Bilmes

Keywords Paper

optimization, machine learning

0

0

0

0

15:20

06/12/2020

Spectra of the Conjugate Kernel and Neural Tangent Kernel for linear-width neural networks

Zhou Fan, Zhichao Wang

Keywords Paper

0

0

0

0

3:25

26/04/2020

Gradients as Features for Deep Representation Learning

Fangzhou Mu, Yingyu Liang, Yin Li

Keywords Paper

representation learning, gradient features, deep learning

0

0

0

0

5:07

26/04/2020

Adversarially robust transfer learning

Ali Shafahi, Parsa Saadatpanah, Chen Zhu and
Amin Ghiasi, Christoph Studer, David Jacobs, Tom Goldstein

Keywords Paper

0

0

0

0

4:58

18/07/2021

Bayesian Structural Adaptation for Continual Learning

Abhishek Kumar, Sunabha Chatterjee, Piyush Rai

Keywords Paper

Probabilistic Methods, Bayesian Methods

0

0

0

0

7:39

12/07/2020

Learning Similarity Metrics for Numerical Simulations

Georg Kohl, Kiwon Um, Nils Thuerey

Keywords Paper

General Machine Learning Techniques

0

0

0

0

15:16

13/04/2021

Learn to expect the unexpected: Probably approximately correct domain generalization

Vikas Garg, Adam Tauman Kalai, Katrina Ligett, Steven Wu

Keywords Paper

0

0

0

0

3:01

02/02/2021

DeepCollaboration: Collaborative Generative and Discriminative Models for Class Incremental Learning

Bo Cui, Guyue Hu, Shan Yu

Keywords Paper

0

0

0

0

15:13

14/09/2020

Learning a Sequence of Sentiment Classification Tasks

Zixuan Ke, Bing Liu, Hao Wang, Lei Shu

Keywords Paper

0

0

0

0

14:23

12/07/2020

Generative Flows with Matrix Exponential

Changyi Xiao, Ligang Liu

Keywords Paper

Deep Learning - Generative Models and Autoencoders

0

0

0

0

8:47

06/12/2021

Information-theoretic generalization bounds for black-box learning algorithms

Hrayr Harutyunyan, Maxim Raginsky, Greg Ver Steeg, Aram Galstyan

Keywords Paper

theory, deep learning

0

0

0

0

13:59

03/05/2021

Understanding the effects of data parallelism and sparsity on neural network training

Namhoon Lee, Thalaiyasingam Ajanthan, Philip Torr, Martin Jaggi

Keywords Paper

sparsity, neural network training, data parallelism

0

0

0

0

4:52

03/05/2021

Gradient Projection Memory for Continual Learning

Gobinda Saha, Isha Garg, Kaushik Roy

Keywords Paper

Continual Learning, Representation Learning, Computer Vision, Deep learning

0

0

0

0

17:12

05/01/2021

Analyzing Deep Neural Network's Transferability via Frechet Distance

Yifan Ding, Liqiang Wang, Boqing Gong

Keywords Paper

0

0

0

0

4:59

03/05/2021

Dataset Meta-Learning from Kernel Ridge-Regression

Timothy Nguyen, Zhourong Chen, Jaehoon Lee

Keywords Paper

dataset corruption, infinite-width networks, neural kernels, kernel-ridge regression, dataset compression, dataset distillation, meta-learning

0

0

0

0

4:59

06/12/2020

A Group-Theoretic Framework for Data Augmentation

Shuxiao Chen, Edgar Dobriban, Jane Lee

Keywords Paper

0

0

0

0

3:28

18/07/2021

Model Performance Scaling with Multiple Data Sources

Tatsunori Hashimoto

Keywords Paper

Algorithms, Supervised Learning

0

0

0

1

4:50

02/02/2021

A General Class of Transfer Learning Regression without Implementation Cost

Shunya Minami, Song Liu, Stephen Wu and
Kenji Fukumizu, Ryo Yoshida

Keywords Paper

0

0

0

0

14:13

26/04/2020

Frequency-based Search-control in Dyna

Yangchen Pan, Jincheng Mei, Amir-massoud Farahmand

Keywords Paper

Model-based reinforcement learning, search-control, Dyna, frequency of a signal

0

0

0

0

4:32

13/04/2021

Fractional moment-preserving initialization schemes for training deep neural networks

Mert Gurbuzbalaban, Yuanhan Hu

Keywords Paper

0

0

0

0

3:05

18/07/2021

A Wasserstein Minimax Framework for Mixed Linear Regression

Theo Diamandis, Yonina Eldar, Alireza Fallah and
Farzan Farnia, Asuman Ozdaglar

Keywords Paper

Algorithms, Multimodal Learning

0

0

0

0

25:41

06/12/2020

Adaptive Gradient Quantization for Data-Parallel SGD

Fartash Faghri, Iman Tabrizian, Ilia Markov and
Dan Alistarh, Dan Roy, Ali Ramezani-Kebrya

Keywords Paper

0

0

0

0

3:20

06/12/2021

Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations

Jiayao Zhang, Hua Wang, Weijie Su

Keywords Paper

deep learning, optimization

0

0

0

0

13:45

03/05/2021

IEPT: Instance-Level and Episode-Level Pretext Tasks for Few-Shot Learning

Manli Zhang, Jianhong Zhang, Zhiwu Lu and
Tao Xiang, Mingyu Ding, Songfang Huang

Keywords Paper

self-supervised learning, few-shot learning, episode-level pretext task

0

0

0

0

5:03

09/07/2020

Reasoning About Generalization via Conditional Mutual Information

Thomas Steinke, Lydia Zakynthinou

Keywords Paper

Information theory, Adaptive data analysis, Excess risk bounds and generalization error bounds

0

0

0

0

14:45

06/12/2021

The staircase property: How hierarchical structure can guide deep learning

Emmanuel Abbe, Enric Boix-Adsera, Matthew S Brennan and
Guy Bresler, Dheeraj Nagaraj

Keywords Paper

deep learning, optimization

0

0

0

0

14:16

09/07/2020

Kernel and Rich Regimes in Overparametrized Models

Blake E Woodworth, Suriya Gunasekar, Jason Lee and
Edward Moroshko, Pedro Henrique Pamplona Savarese, Itay Golan, Daniel Soudry, Nathan Srebro

Keywords Paper

Neural networks/deep learning,

0

0

0

0

13:29

26/04/2020

Functional Regularisation for Continual Learning with Gaussian Processes

Michalis K. Titsias, Jonathan Schwarz, Alexander G. de G. Matthews and
Razvan Pascanu, Yee Whye Teh

Keywords Paper

Continual Learning, Gaussian Processes, Lifelong learning, Incremental Learning

0

0

0

0

4:31

02/02/2021

Progressive Multi-task Learning with Controlled Information Flow for Joint Entity and Relation Extraction

Kai Sun, Richong Zhang, Samuel Mensah and
Yongyi Mao, Xudong Liu

Keywords Paper

0

0

0

0

13:45