Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks

06/12/2020

Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks

Soham De, Sam Smith

Keywords:

Abstract Paper Similar Papers

Abstract: Batch normalization dramatically increases the largest trainable depth of residual networks, and this benefit has been crucial to the empirical success of deep residual networks on a wide range of benchmarks. We show that this key benefit arises because, at initialization, batch normalization downscales the residual branch relative to the skip connection, by a normalizing factor on the order of the square root of the network depth. This ensures that, early in training, the function computed by normalized residual blocks in deep networks is close to the identity function (on average). We use this insight to develop a simple initialization scheme that can train deep residual networks without normalization. We also provide a detailed empirical study of residual networks, which clarifies that, although batch normalized networks can be trained with larger learning rates, this effect is only beneficial in specific compute regimes, and has minimal benefits when the batch size is small.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

12/07/2020

Small-GAN: Speeding up GAN Training using Core-Sets

Samrath Sinha, Han Zhang, Anirudh Goyal and
Yoshua Bengio, Hugo Larochelle, Augustus Odena

Keywords Paper

Deep Learning - General

0

0

0

0

14:41

03/05/2021

MetaNorm: Learning to Normalize Few-Shot Batches Across Domains

Yingjun Du, Xiantong Zhen, Ling Shao, Cees G Snoek

Keywords Paper

batch normalization, Meta-learning, few-shot domain generalization

0

0

0

0

5:48

03/05/2021

Deconstructing the Regularization of BatchNorm

Yann Dauphin, Ekin Cubuk

Keywords Paper

understanding neural networks, batch normalization, regularization, deep learning

0

0

0

0

5:09

14/09/2020

Squeezing Correlated Neurons for Resource-Efficient Deep Neural Networks

Elbruz Ozen, Alex Orailoglu

Keywords Paper

deep learning, information redundancy, pruning

0

0

0

0

14:48

26/04/2020

Minimizing FLOPs to Learn Efficient Sparse Representations

Biswajit Paria, Chih-Kuan Yeh, Ian E.H. Yen and
Ning Xu, Pradeep Ravikumar, Barnabás Póczos

Keywords Paper

sparse embeddings, deep representations, metric learning, regularization

0

0

0

0

4:41

14/06/2020

Semantic Drift Compensation for Class-Incremental Learning

Lu Yu, Bartłomiej Twardowski, Xialei Liu and
Luis Herranz, Kai Wang, Yongmei Cheng, Shangling Jui, Joost van de Weijer

Keywords Paper

incremental learning, metric learning, semantic drift, deep neural networks, image classification, embedding networks, classification networks, catastrophic forgetting, task agnostic, nearest class mean classifier

0

0

0

0

0:59

06/12/2021

Efficient Training of Retrieval Models using Negative Cache

Erik Lindgren, Sashank Reddi, Ruiqi Guo, Sanjiv Kumar

Keywords Paper

deep learning, machine learning

0

0

0

0

10:41

14/06/2020

GreedyNAS: Towards Fast One-Shot NAS With Greedy Supernet

Shan You, Tao Huang, Mingmin Yang and
Fei Wang, Chen Qian, Changshui Zhang

Keywords Paper

neural architecture search, supernet, one-shot nas, single path, greedy algorithm, exploration and exploitation, searching efficiency

0

0

0

0

1:01

06/12/2021

Dense Unsupervised Learning for Video Segmentation

Nikita Araslanov, Simone Schaub-Meyer, Stefan Roth

Keywords Paper

self-supervised learning, representation learning

0

0

0

0

13:34

06/12/2020

Top-KAST: Top-K Always Sparse Training

Sid Jayakumar, Razvan Pascanu, Jack Rae and
Simon Osindero, Erich Elsen

Keywords Paper

0

0

0

0

3:18

03/05/2021

Neural Pruning via Growing Regularization

Huan Wang, Can Qin, Yulun Zhang, Yun Fu

Keywords Paper

deep neural network pruning, regularization, Hessian matrix, model compression

0

0

0

0

6:15

12/07/2020

Overfitting in adversarially robust deep learning

Eric Wong, Leslie Rice, Zico Kolter

Keywords Paper

Adversarial Examples

0

0

0

0

14:44

03/05/2021

Revisiting Locally Supervised Learning: an Alternative to End-to-end Training

Yulin Wang, Zanlin Ni, Shiji Song and
Le Yang, Gao Huang

Keywords Paper

Deep learning, Locally supervised training

1

0

0

1

5:03

06/12/2021

RETRIEVE: Coreset Selection for Efficient and Robust Semi-Supervised Learning

Krishnateja Killamsetty, Xujiang Zhao, Feng Chen, Rishabh Iyer

Keywords Paper

optimization, semi-supervised learning

0

0

0

0

13:59

26/04/2020

Four Things Everyone Should Know to Improve Batch Normalization

Cecilia Summers, Michael J. Dinneen

Keywords Paper

batch normalization

0

0

0

0

4:32

06/12/2021

Sparse Flows: Pruning Continuous-depth Models

Lucas Liebenwein, Ramin Hasani, Alexander Amini, Daniela Rus

Keywords Paper

deep learning, generative model

0

0

0

0

12:51

06/12/2021

Efficient Neural Network Training via Forward and Backward Propagation Sparsification

Xiao Zhou, Weizhong Zhang, Zonghao Chen and
SHIZHE DIAO, Tong Zhang

Keywords Paper

deep learning, optimization

0

0

0

0

7:48

06/12/2021

Powerpropagation: A sparsity inducing weight reparameterisation

Jonathan Schwarz, Siddhant M Jayakumar, Razvan Pascanu and
Peter E Latham, Yee Teh

Keywords Paper

deep learning, optimization, continual learning

0

0

0

1

9:08

06/12/2021

Batch Active Learning at Scale

Gui Citovsky, Giulia DeSalvo, Claudio Gentile and
Lazaros Karydas, Anand Rajagopalan, Afshin Rostamizadeh, Sanjiv Kumar

Keywords Paper

active learning

0

0

0

0

12:19

12/07/2020

Analyzing the effect of neural network architecture on training performance

Karthik Abinav Sankararaman, Soham De, Zheng Xu and
W. Ronny Huang, Tom Goldstein

Keywords Paper

Deep Learning - Theory

0

0

0

0

14:03

06/12/2021

Boost Neural Networks by Checkpoints

Feng Wang, Guoyizhe Wei, Qiao Liu and
Jinxiang Ou, xian wei, Hairong Lv

Keywords Paper

deep learning

1

0

0

0

4:45

06/12/2020

SuperLoss: A Generic Loss for Robust Curriculum Learning

Thibault Castells, Philippe Weinzaepfel, Jerome Revaud

Keywords Paper

, Probabilistic Methods -> MCMC

0

0

0

0

3:26

03/05/2021

AdamP: Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights

Byeongho Heo, Sanghyuk Chun, Seong Joon Oh and
Dongyoon Han, Sangdoo Yun, Gyuwan Kim, Youngjung Uh, Jung-Woo Ha

Keywords Paper

effective learning rate, normalize layer, scale-invariant weights, momentum optimizer

0

0

0

0

5:16

18/07/2021

CARTL: Cooperative Adversarially-Robust Transfer Learning

Dian Chen, Hongxin Hu, Qian Wang and
Li Yinli, Cong Wang, Chao Shen, Qi Li

Keywords Paper

Algorithms, Adversarial Examples

0

0

0

0

19:32

07/09/2020

Meta-RetinaNet for Few-shot Object Detection

Shaoqi Li, Wenfeng Song, Shuai Li and
Aimin Hao, Hong Qin

Keywords Paper

Few shot, object detection, meta-learning, Meta-RetinaNet, Balanced Loss, coefficient vector

0

0

0

0

8:51

06/12/2020

On Warm-Starting Neural Network Training

Jordan Ash, Ryan Adams

Keywords Paper

0

0

0

0

2:30

07/09/2020

Paying more Attention to Snapshots of Iterative Pruning: Improving Model Compression via Ensemble Distillation

Duong Le, Nhan Vo, Nam Thoai

Keywords Paper

network pruning, knowledge distillation, ensemble learning

0

0

0

0

8:30

14/06/2020

Recursive Least-Squares Estimator-Aided Online Learning for Visual Tracking

Jin Gao, Weiming Hu, Yan Lu

Keywords Paper

online learning, visual tracking, continual learning, recursive least-squares estimation, deep learning, memory retention, recursive learning, mini-batch sgd, normal equation, mlp layer

0

0

0

0

5:01

03/05/2021

Class Normalization for (Continual)? Generalized Zero-Shot Learning

Ivan Skorokhodov, Mohamed Elhoseiny

Keywords Paper

initialization, normalization, zero-shot learning, continual learning

0

0

0

0

4:45

03/05/2021

Auxiliary Task Update Decomposition: The Good, the Bad and the Neutral

Lucio Dery, Yann Dauphin, David Grangier

Keywords Paper

multitask learning, deeplearning, pre-training, gradient decomposition

0

0

0

0

5:22

02/02/2021

Incremental Embedding Learning via Zero-Shot Translation

Kun Wei, Cheng Deng, Xu Yang, Maosen Li

Keywords Paper

0

0

0

0

13:42

14/06/2020

Conditional Channel Gated Networks for Task-Aware Continual Learning

Davide Abati, Jakub Tomczak, Tijmen Blankevoort and
Simone Calderara, Rita Cucchiara, Babak Ehteshami Bejnordi

Keywords Paper

continual learning, channel gating, conditional computation, incremental learning, lifelong learning, hard attention

0

0

0

0

5:01

12/07/2020

Safe Deep Semi-Supervised Learning for Unseen-Class Unlabeled Data

Lan-Zhe Guo, Zhen-Yu Zhang, Yuan Jiang and
Yufeng Li, Zhi-Hua Zhou

Keywords Paper

Unsupervised and Semi-Supervised Learning

0

0

0

0

13:58

03/05/2021

Neurally Augmented ALISTA

Freya Behrens, Jonathan Sauder, Peter Jung

Keywords Paper

learned ISTA, unrolled algorithms, compressed sensing, sparse reconstruction

0

0

0

0

5:18

06/12/2021

One Loss for All: Deep Hashing with a Single Cosine Similarity based Learning Objective

Jiun Tian Hoe, Kam Woh Ng, Tianyu Zhang and
Chee Seng Chan, Yi-Zhe Song, Tao Xiang

Keywords Paper

machine learning

0

0

0

0

11:39

26/04/2020

Don't Use Large Mini-batches, Use Local SGD

Tao Lin, Sebastian U. Stich, Kumar Kshitij Patel, Martin Jaggi

Keywords Paper

0

0

0

0

4:36

06/12/2021

R-Drop: Regularized Dropout for Neural Networks

xiaobo liang, Lijun Wu, Juntao Li and
Yue Wang, Qi Meng, Tao Qin, Wei Chen, Min Zhang, Tie-Yan Liu

Keywords Paper

deep learning, machine learning, transformers, vision, language

0

0

0

0

13:02

18/07/2021

Self-Damaging Contrastive Learning

Ziyu Jiang, Tianlong Chen, Bobak Mortazavi, Zhangyang Wang

Keywords Paper

Algorithms, Unsupervised Learning

0

0

0

1

5:10

19/04/2021

Zero-shot neural passage retrieval via domain-targeted synthetic question generation

Ji Ma, Ivan Korotkov, Yinfei Yang and
Keith Hall, Ryan McDonald

Keywords Paper

0

0

0

0

12:47

02/02/2021

Large Batch Optimization for Deep Learning Using New Complete Layer-Wise Adaptive Rate Scaling

Zhouyuan Huo, Bin Gu, Heng Huang

Keywords Paper

0

0

0

0

15:17