Batch Normalization Orthogonalizes Representations in Deep Random Networks

06/12/2021

Batch Normalization Orthogonalizes Representations in Deep Random Networks

Hadi Daneshmand, Amir Joudaki, Francis Bach

Keywords: theory, deep learning, optimization, machine learning, generative model

Abstract Paper Similar Papers

Abstract: This paper underlines an elegant property of batch-normalization (BN): Successive batch normalizations with random linear updates make samples increasingly orthogonal. We establish a non-asymptotic characterization of the interplay between depth, width, and the orthogonality of deep representations. More precisely, we prove, under a mild assumption, the deviation of the representations from orthogonality rapidly decays with depth up to a term inversely proportional to the network width. This result has two main theoretical and practical implications: 1) Theoretically, as the depth grows, the distribution of the outputs contracts to a Wasserstein-2 ball around an isotropic normal distribution. Furthermore, the radius of this Wasserstein ball shrinks with the width of the network. 2) Practically, the orthogonality of the representations directly influences the performance of stochastic gradient descent (SGD). When representations are initially aligned, we observe SGD wastes many iterations to disentangle representations before the classification. Nevertheless, we experimentally show that starting optimization from orthogonal representations is sufficient to accelerate SGD, with no need for BN.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

02/02/2021

Delving into Variance Transmission and Normalization: Shift of Average Gradient Makes the Network Collapse

Yuxiang Liu, Jidong Ge, Chuanyi Li, Jie Gui

Keywords Paper

0

0

0

0

14:49

03/05/2021

AdamP: Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights

Byeongho Heo, Sanghyuk Chun, Seong Joon Oh and
Dongyoon Han, Sangdoo Yun, Gyuwan Kim, Youngjung Uh, Jung-Woo Ha

Keywords Paper

effective learning rate, normalize layer, scale-invariant weights, momentum optimizer

0

0

0

0

5:16

12/07/2020

Landscape Connectivity and Dropout Stability of SGD Solutions for Over-parameterized Neural Networks

Alexander Shevchenko, Marco Mondelli

Keywords Paper

Deep Learning - Theory

0

0

0

0

13:20

13/04/2021

Understanding gradient clipping in incremental gradient methods

Jiang Qian, Yuren Wu, Bojin Zhuang and
Shaojun Wang, Jing Xiao

Keywords Paper

0

0

0

0

3:17

26/04/2020

Neural tangent kernels, transportation mappings, and universal approximation

Ziwei Ji, Matus Telgarsky, Ruicheng Xian

Keywords Paper

Neural Tangent Kernel, universal approximation, Barron, transport mapping

0

0

0

0

4:48

03/05/2021

Separation and Concentration in Deep Networks

John Zarka, Florentin Guth, Stéphane Mallat

Keywords Paper

concentration, mean separation, neural collapse, fisher ratio, image classification, variance reduction, deep learning

0

0

0

0

5:11

18/07/2021

Momentum Residual Neural Networks

Michael Sander, Pierre Ablin, Mathieu Blondel, Gabriel Peyré

Keywords Paper

Deep Learning

0

0

0

0

5:07

14/06/2020

Towards Discriminability and Diversity: Batch Nuclear-Norm Maximization Under Label Insufficient Situations

Shuhao Cui, Shuhui Wang, Junbao Zhuo and
Liang Li, Qingming Huang, Qi Tian

Keywords Paper

discriminability, diversity, nuclear-norm, domain adaptation, open-set

0

0

0

0

4:56

06/12/2021

Adversarial Examples in Multi-Layer Random ReLU Networks

Peter Bartlett, Sebastien Bubeck, Yeshwanth Cherapanamjeri

Keywords Paper

theory, adversarial robustness and security

0

0

0

0

10:49

14/06/2020

Forward and Backward Information Retention for Accurate Binary Neural Networks

Haotong Qin, Ruihao Gong, Xianglong Liu and
Mingzhu Shen, Ziran Wei, Fengwei Yu, Jingkuan Song

Keywords Paper

model compression, binary neural networks, deep learning, quantization, computer vision

0

0

0

0

1:00

06/12/2021

Dynamics of Stochastic Momentum Methods on Large-scale, Quadratic Models

Courtney Paquette, Elliot Paquette

Keywords Paper

theory, optimization

0

0

0

0

15:10

18/07/2021

Understanding self-supervised learning dynamics without contrastive pairs

Yuandong Tian, Xinlei Chen, Surya Ganguli

Keywords Paper

Deep Learning, Optimization for Deep Networks

0

0

0

0

18:16

26/04/2020

Minimizing FLOPs to Learn Efficient Sparse Representations

Biswajit Paria, Chih-Kuan Yeh, Ian E.H. Yen and
Ning Xu, Pradeep Ravikumar, Barnabás Póczos

Keywords Paper

sparse embeddings, deep representations, metric learning, regularization

0

0

0

0

4:41

06/12/2021

Collapsed Variational Bounds for Bayesian Neural Networks

Marcin Tomczak, Siddharth Swaroop, Andrew Foong, Richard Turner

Keywords Paper

deep learning, optimization, generative model

0

0

0

0

5:44

14/06/2020

Continual Learning With Extended Kronecker-Factored Approximate Curvature

Janghyeon Lee, Hyeong Gwon Hong, Donggyu Joo, Junmo Kim

Keywords Paper

continual learning, curvature approximation, extended k-fac

0

0

0

0

1:01

14/06/2020

Regularizing CNN Transfer Learning With Randomised Regression

Yang Zhong, Atsuto Maki

Keywords Paper

transfer learning, network regularization, randomised regression, pseudo task regularization, limited samples

0

0

0

0

0:58

06/12/2021

Large-Scale Learning with Fourier Features and Tensor Decompositions

Frederiek Wesel, Kim Batselier

Keywords Paper

machine learning, kernel methods

0

0

0

0

15:01

14/06/2020

Controllable Orthogonalization in Training DNNs

Lei Huang, Li Liu, Fan Zhu and
Diwen Wan, Zehuan Yuan, Bo Li, Ling Shao

Keywords Paper

orthogonalization, weight normalization, newtons iteration, dynamic isometry, lipschitz continuity, regularization, orthogonality, deep learning, gans, small batch size

0

0

0

0

5:00

06/12/2020

A novel variational form of the Schatten-$p$ quasi-norm

Paris Giampouras, Rene Vidal, Athanasios Rontogiannis, Benjamin Haeffele

Keywords Paper

0

0

0

0

3:14

06/12/2020

Random Reshuffling: Simple Analysis with Vast Improvements

Konstantin Mishchenko, Ahmed Khaled Ragab Bayoumi, Peter Richtarik

Keywords Paper

Reinforcement Learning and Planning -> Planning; Reinforcement Learning and Planning -> Reinforcement Learning, Reinforcement Learning and Planning

0

0

0

0

3:08

26/08/2020

AMAGOLD: Amortized Metropolis Adjustment for Efficient Stochastic Gradient MCMC

Ruqi Zhang, A. Feder Cooper, Christopher De Sa

Keywords Paper

0

0

0

0

16:26

06/12/2021

How can classical multidimensional scaling go wrong?

Rishi Sonthalia, Greg Van Buskirk, Benjamin Raichel, Anna Gilbert

Keywords Paper

machine learning, robustness

0

0

0

0

14:59

06/12/2021

Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex Decentralized Optimization Over Time-Varying Networks

Dmitry Kovalev, Elnur Gasanov, Alexander Gasnikov, Peter Richtarik

Keywords Paper

optimization

0

0

0

0

15:02

13/04/2021

Associative convolutional layers

Hamed Omidvar, Vahideh Akhlaghi, Hao Su and
Massimo Franceschetti, Rajesh Gupta

Keywords Paper

0

0

0

0

3:09

13/04/2021

Fractional moment-preserving initialization schemes for training deep neural networks

Mert Gurbuzbalaban, Yuanhan Hu

Keywords Paper

0

0

0

0

3:05

18/07/2021

Tensor Programs IV: Feature Learning in Infinite-Width Neural Networks

Greg Yang, Edward Hu

Keywords Paper

Theory, Deep learning Theory

0

0

0

0

5:22

26/04/2020

Improved Sample Complexities for Deep Neural Networks and Robust Classification via an All-Layer Margin

Colin Wei, Tengyu Ma

Keywords Paper

deep learning theory, generalization bounds, adversarially robust generalization, data-dependent generalization bounds

0

0

0

0

5:30

02/02/2021

Multi-Proxy Wasserstein Classifier for Image Classification

Benlin Liu, Yongming Rao, Jiwen Lu and
Jie Zhou, Cho-Jui Hsieh

Keywords Paper

0

0

0

0

12:05

30/11/2020

Channel Pruning for Accelerating Convolutional Neural Networks via Wasserstein Metric

Haoran Duan, Hui Li

Keywords Paper

0

0

0

0

5:23

06/12/2020

GCN meets GPU: Decoupling “When to Sample” from “How to Sample”

Morteza Ramezani, Weilin Cong, Mehrdad Mahdavi and
Anand Sivasubramaniam, Mahmut Kandemir

Keywords Paper

0

0

0

0

3:24

05/01/2021

Exploiting the Redundancy in Convolutional Filters for Parameter Reduction

Kumara Kahatapitiya, Ranga Rodrigo

Keywords Paper

0

0

0

0

5:10

26/04/2020

Regularizing activations in neural networks via distribution matching with the Wasserstein metric

Taejong Joo, Donggu Kang, Byunghoon Kim

Keywords Paper

regularization, Wasserstein metric, deep learning

0

0

0

0

5:26

06/12/2021

Deeply Shared Filter Bases for Parameter-Efficient Convolutional Neural Networks

Woochul Kang, Daeyeon Kim

Keywords Paper

deep learning, machine learning, vision

0

0

0

0

13:17

06/12/2021

Spatio-Temporal Variational Gaussian Processes

Oliver Hamelijnck, William Wilkinson, Niki Loppi and
Arno Solin, Theodoros Damoulas

Keywords Paper

generative model, kernel methods

0

0

0

0

6:04

05/04/2021

Accelerate Inference of CNNs for Video Analysis While Preserving Exactness Exploiting Activation Sparsity

Toshiaki Wakatsuki, Sekitoshi Kanai, Yasuhiro Fujiwara

Keywords Paper

0

0

0

0

4:40

05/04/2021

Accelerate Inference of CNNs for Video Analysis While Preserving Exactness Exploiting Activation Sparsity

Toshiaki Wakatsuki, Sekitoshi Kanai, Yasuhiro Fujiwara

Keywords Paper

0

0

0

0

17:56

18/07/2021

Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization

Zeke Xie, Li Yuan, Zhanxing Zhu, Masashi Sugiyama

Keywords Paper

Optimization, Stochastic Optimization

0

0

0

0

5:17

18/07/2021

Accelerating Feedforward Computation via Parallel Nonlinear Equation Solving

Yang Song, Chenlin Meng, Renjie Liao, Stefano Ermon

Keywords Paper

Deep Learning

0

0

0

0

5:53

12/07/2020

The continuous categorical: a novel simplex-valued exponential family

Elliott Gordon-Rodriguez, Gabriel Loaiza-Ganem, John Cunningham

Keywords Paper

Probabilistic Inference - Models and Probabilistic Programming

0

0

0

0

14:59

06/12/2021

The Implicit Bias of Minima Stability: A View from Function Space

Rotem Mulayoff, Tomer Michaeli, Daniel Soudry

Keywords Paper

deep learning, optimization

0

0

0

0

13:51