Deconstructing the Regularization of BatchNorm

03/05/2021

Deconstructing the Regularization of BatchNorm

Yann Dauphin, Ekin Cubuk

Keywords: understanding neural networks, batch normalization, regularization, deep learning

Abstract Paper Similar Papers

Abstract: Batch normalization (BatchNorm) has become a standard technique in deep learning. Its popularity is in no small part due to its often positive effect on generalization. Despite this success, the regularization effect of the technique is still poorly understood. This study aims to decompose BatchNorm into separate mechanisms that are much simpler. We identify three effects of BatchNorm and assess their impact directly with ablations and interventions. Our experiments show that preventing explosive growth at the final layer at initialization and during training can recover a large part of BatchNorm's generalization boost. This regularization mechanism can lift accuracy by $2.9\%$ for Resnet-50 on Imagenet without BatchNorm. We show it is linked to other methods like Dropout and recent initializations like Fixup. Surprisingly, this simple mechanism matches the improvement of $0.9\%$ of the more complex Dropout regularization for the state-of-the-art Efficientnet-B8 model on Imagenet. This demonstrates the underrated effectiveness of simple regularizations and sheds light on directions to further improve generalization for deep nets.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICLR 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2021

Towards Deeper Deep Reinforcement Learning with Spectral Normalization

Nils Bjorck, Carla Gomes, Kilian Weinberger

Keywords Paper

reinforcement learning and planning, vision, language

0

0

0

0

9:28

06/12/2020

Big Self-Supervised Models are Strong Semi-Supervised Learners

Ting Chen, Simon Kornblith, Kevin Swersky and
Mohammad Norouzi, Geoffrey E Hinton

Keywords Paper

0

0

0

0

3:18

06/12/2020

Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks

Soham De, Sam Smith

Keywords Paper

0

0

0

0

3:23

18/07/2021

High-Performance Large-Scale Image Recognition Without Normalization

Andy Brock, Soham De, Samuel Smith, Karen Simonyan

Keywords Paper

Deep Learning

0

0

0

0

5:19

06/12/2020

Unsupervised Data Augmentation for Consistency Training

Qizhe Xie, Zihang Dai, Eduard Hovy and
Thang Luong, Quoc V Le

Keywords Paper

0

0

0

0

3:29

12/07/2020

Safe Deep Semi-Supervised Learning for Unseen-Class Unlabeled Data

Lan-Zhe Guo, Zhen-Yu Zhang, Yuan Jiang and
Yufeng Li, Zhi-Hua Zhou

Keywords Paper

Unsupervised and Semi-Supervised Learning

0

0

0

0

13:58

06/12/2020

Differentiable Augmentation for Data-Efficient GAN Training

Shengyu Zhao, Zhijian Liu, Ji Lin and
Jun-Yan Zhu, Song Han

Keywords Paper

0

0

0

0

3:22

14/06/2020

Recursive Least-Squares Estimator-Aided Online Learning for Visual Tracking

Jin Gao, Weiming Hu, Yan Lu

Keywords Paper

online learning, visual tracking, continual learning, recursive least-squares estimation, deep learning, memory retention, recursive learning, mini-batch sgd, normal equation, mlp layer

0

0

0

0

5:01

07/09/2020

Non-Probabilistic Cosine Similarity Loss for Few-Shot Image Classification

Joonhyuk Kim, Inug Yoon, Gyeong-Moon Park, Jong-Hwan Kim

Keywords Paper

few-shot learning, image classification, NPC loss

0

0

0

0

4:59

14/09/2020

Squeezing Correlated Neurons for Resource-Efficient Deep Neural Networks

Elbruz Ozen, Alex Orailoglu

Keywords Paper

deep learning, information redundancy, pruning

0

0

0

0

14:48

06/12/2021

ReSSL: Relational Self-Supervised Learning with Weak Augmentation

Mingkai Zheng, Shan You, Fei Wang and
Chen Qian, Changshui Zhang, Xiaogang Wang, Chang Xu

Keywords Paper

self-supervised learning, contrastive learning

0

0

0

0

6:35

14/06/2020

ViewAL: Active Learning With Viewpoint Entropy for Semantic Segmentation

Yawar Siddiqui, Julien Valentin, Matthias Nießner

Keywords Paper

active learning, semantic segmentation, deep learning, view consistency

0

0

0

0

1:01

06/12/2021

Powerpropagation: A sparsity inducing weight reparameterisation

Jonathan Schwarz, Siddhant M Jayakumar, Razvan Pascanu and
Peter E Latham, Yee Teh

Keywords Paper

deep learning, optimization, continual learning

0

0

0

1

9:08

18/07/2021

Low-Precision Reinforcement Learning: Running Soft Actor-Critic in Half Precision

Johan Björck, Xiangyu Chen, Christopher De Sa and
Carla Gomes, Kilian Weinberger

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

5:19

06/12/2021

Boost Neural Networks by Checkpoints

Feng Wang, Guoyizhe Wei, Qiao Liu and
Jinxiang Ou, xian wei, Hairong Lv

Keywords Paper

deep learning

1

0

0

0

4:45

06/12/2020

FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence

Kihyuk Sohn, David Berthelot, Nicholas Carlini and
Zizhao Zhang, Han Zhang, Colin A Raffel, Dogus Cubuk, Alexey Kurakin, Chun-Liang Li

Keywords Paper

0

0

0

0

3:17

22/11/2021

Boosting Adversarial Transferability through Enhanced Momentum

Xiaosen Wang, Jiadong Lin, Han Hu and
Jingdong Wang, Kun He

Keywords Paper

adversarial transferability, adversarial attack, adversarial examples, optimization

0

0

0

0

3:03

02/02/2021

Tempered Sigmoid Activations for Deep Learning with Differential Privacy

Nicolas Papernot, Abhradeep Thakurta, Shuang Song and
Steve Chien, Úlfar Erlingsson

Keywords Paper

0

0

0

0

15:38

03/05/2021

Neural Pruning via Growing Regularization

Huan Wang, Can Qin, Yulun Zhang, Yun Fu

Keywords Paper

deep neural network pruning, regularization, Hessian matrix, model compression

0

0

0

0

6:15

06/12/2020

Top-KAST: Top-K Always Sparse Training

Sid Jayakumar, Razvan Pascanu, Jack Rae and
Simon Osindero, Erich Elsen

Keywords Paper

0

0

0

0

3:18

03/05/2021

Contextual Dropout: An Efficient Sample-Dependent Dropout Module

XINJIE FAN, Shujian Zhang, Korawat Tanwisuth and
Xiaoning Qian, Mingyuan Zhou

Keywords Paper

Supervised Deep Networks, Probabilistic Methods, Efficient Inference Methods

0

0

0

0

4:30

17/08/2020

Learning temporal coherence via self-supervision for GAN-based video generation

Mengyu Chu, You Xie, Jonas Mayer and
Laura Leal-Taixé, Nils Thuerey

Keywords Paper

self-supervision, temporal cycle-consistency, video super-resolution, generative adversarial network, unpaired video translation

0

0

0

0

16:59

06/12/2020

Provably Robust Metric Learning

Lu Wang, Xuanqing Liu, Jinfeng Yi and
Yuan Jiang, Cho-Jui Hsieh

Keywords Paper

0

0

0

0

3:14

03/05/2021

AdamP: Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights

Byeongho Heo, Sanghyuk Chun, Seong Joon Oh and
Dongyoon Han, Sangdoo Yun, Gyuwan Kim, Youngjung Uh, Jung-Woo Ha

Keywords Paper

effective learning rate, normalize layer, scale-invariant weights, momentum optimizer

0

0

0

0

5:16

07/09/2020

Meta-RetinaNet for Few-shot Object Detection

Shaoqi Li, Wenfeng Song, Shuai Li and
Aimin Hao, Hong Qin

Keywords Paper

Few shot, object detection, meta-learning, Meta-RetinaNet, Balanced Loss, coefficient vector

0

0

0

0

8:51

07/09/2020

Making L-BFGS Work with Industrial-Strength Nets

Abhay Yadav, Tom Goldstein, David Jacobs

Keywords Paper

deep network training, efficient training, second-order optimization

0

0

0

0

7:55

03/05/2021

In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label Selection Framework for Semi-Supervised Learning

Mamshad Nayeem Rizve, Kevin Duarte, Yogesh S Rawat, Mubarak Shah

Keywords Paper

Deep Learning, Calibration, Uncertainty, Pseudo-Labeling, Semi-Supervised Learning

0

0

0

0

5:06

14/06/2020

Instance-Aware, Context-Focused, and Memory-Efficient Weakly Supervised Object Detection

Zhongzheng Ren, Zhiding Yu, Xiaodong Yang and
Ming-Yu Liu, Yong Jae Lee, Alexander G. Schwing, Jan Kautz

Keywords Paper

weakly-supervised, object detection, video recognition, instance-aware, context-focused, memory-efficient

0

0

0

0

0:59

06/12/2021

Regularized Softmax Deep Multi-Agent Q-Learning

Ling Pan, Tabish Rashid, Bei Peng and
Longbo Huang, Shimon Whiteson

Keywords Paper

reinforcement learning and planning

0

0

0

0

10:58

06/12/2021

Tactical Optimism and Pessimism for Deep Reinforcement Learning

Ted Moskovitz, Jack Parker-Holder, Aldo Pacchiano and
Michael Arbel, Michael Jordan

Keywords Paper

reinforcement learning and planning, bandits

0

0

0

0

6:30

14/06/2020

Regularizing Class-Wise Predictions via Self-Knowledge Distillation

Sukmin Yun, Jongjin Park, Kimin Lee, Jinwoo Shin

Keywords Paper

image classification, regularization, self-knowledge distillation, generalization, calibration

0

0

0

0

1:01

06/12/2020

Task-Robust Model-Agnostic Meta-Learning

Liam Collins, Aryan Mokhtari, Sanjay Shakkottai

Keywords Paper

0

0

0

0

3:17

03/05/2021

MetaNorm: Learning to Normalize Few-Shot Batches Across Domains

Yingjun Du, Xiantong Zhen, Ling Shao, Cees G Snoek

Keywords Paper

batch normalization, Meta-learning, few-shot domain generalization

0

0

0

0

5:48

12/07/2020

Inducing and Exploiting Activation Sparsity for Fast Inference on Deep Neural Networks

Mark Kurtz, Justin Kopinsky, Rati Gelashvili and
Alexander Matveev, John Carr, Michael Goin, William Leiserson, Sage Moore, Nir Shavit, Dan Alistarh

Keywords Paper

Deep Learning - Algorithms

0

0

0

0

14:41

06/12/2020

Improving Generalization in Reinforcement Learning with Mixture Regularization

KAIXIN WANG, Bingyi Kang, Jie Shao, Jiashi Feng

Keywords Paper

0

0

0

1

3:14

14/06/2020

Equalization Loss for Long-Tailed Object Recognition

Jingru Tan, Changbao Wang, Buyu Li and
Quanquan Li, Wanli Ouyang, Changqing Yin, Junjie Yan

Keywords Paper

long tail, object detection, lvis, object recognition

0

0

0

0

1:00

05/01/2021

InfoMax-GAN: Improved Adversarial Image Generation via Information Maximization and Contrastive Learning

Kwot Sin Lee, Ngoc-Trung Tran, Ngai-Man Cheung

Keywords Paper

0

0

0

0

5:01

18/07/2021

Image-Level or Object-Level? A Tale of Two Resampling Strategies for Long-Tailed Detection

Nadine Chang, Zhiding Yu, Yu-Xiong Wang and
Anima Anandkumar, Sanja Fidler, Jose Alvarez

Keywords Paper

Applications, Computer Vision

0

0

0

0

5:17

03/05/2021

Training independent subnetworks for robust prediction

Marton Havasi, Rodolphe Jenatton, Stanislav Fort and
Jeremiah Zhe Liu, Jasper Snoek, Balaji Lakshminarayanan, Andrew Dai, Dustin Tran

Keywords Paper

robustness, Efficient ensembles

0

0

0

0

4:10

06/12/2021

ResNEsts and DenseNEsts: Block-based DNN Models with Improved Representation Guarantees

Kuan-Lin Chen, Ching-Hua Lee, Harinath Garudadri, Bhaskar D Rao

Keywords Paper

optimization, vision

0

0

0

0

13:27