The inductive bias of ReLU networks on orthogonally separable data

03/05/2021

The inductive bias of ReLU networks on orthogonally separable data

Mary Phuong, Christoph H Lampert

Keywords: implicit bias, extremal sector, gradient descent, inductive bias, max-margin, ReLU networks

Abstract Paper Similar Papers

Abstract: We study the inductive bias of two-layer ReLU networks trained by gradient flow. We identify a class of easy-to-learn (`orthogonally separable') datasets, and characterise the solution that ReLU networks trained on such datasets converge to. Irrespective of network width, the solution turns out to be a combination of two max-margin classifiers: one corresponding to the positive data subset and one corresponding to the negative data subset. The proof is based on the recently introduced concept of extremal sectors, for which we prove a number of properties in the context of orthogonal separability. In particular, we prove stationarity of activation patterns from some time $T$ onwards, which enables a reduction of the ReLU network to an ensemble of linear subnetworks.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICLR 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2021

Gradient Starvation: A Learning Proclivity in Neural Networks

Mohammad Pezeshki, Oumar Kaba, Yoshua Bengio and
Aaron Courville, Doina Precup, Guillaume Lajoie

Keywords Paper

theory, deep learning, optimization, robustness

0

0

0

0

10:52

03/05/2021

Neurally Augmented ALISTA

Freya Behrens, Jonathan Sauder, Peter Jung

Keywords Paper

learned ISTA, unrolled algorithms, compressed sensing, sparse reconstruction

0

0

0

0

5:18

06/12/2020

Directional convergence and alignment in deep learning

Ziwei Ji, Matus Telgarsky

Keywords Paper

0

0

0

0

3:21

06/12/2020

Domain Adaptation with Conditional Distribution Matching and Generalized Label Shift

Remi Tachet des Combes, Han Zhao, Yu-Xiang Wang, Geoffrey Gordon

Keywords Paper

0

0

0

0

3:19

18/07/2021

On the Implicit Bias of Initialization Shape: Beyond Infinitesimal Mirror Descent

Shahar Azulay, Edward Moroshko, Mor Shpigel Nacson and
Blake Woodworth, Nati Srebro, Amir Globerson, Daniel Soudry

Keywords Paper

, Probabilistic Methods, MCMC, Theory, Deep learning Theory

0

0

0

0

15:38

03/05/2021

Learning explanations that are hard to vary

Giambattista Parascandolo, Alexander Neitz, Antonio Orvieto and
Luigi Gresele, Bernhard Schoelkopf

Keywords Paper

invariances, gradient alignment, consistency

0

0

0

0

5:16

06/12/2021

Even your Teacher Needs Guidance: Ground-Truth Targets Dampen Regularization Imposed by Self-Distillation

Kenneth Borup, Lars N Andersen

Keywords Paper

theory, deep learning, optimization

0

0

0

0

6:00

22/11/2021

Adaptive End-to-End Budgeted Network Learning via Inverse Scale Space

Zuyuan Zhong, Chen Liu, Yanwei Fu

Keywords Paper

deep learning, network architecture, growing network, budgeted network learning, pruning

0

0

0

0

2:58

02/02/2021

Harmonized Dense Knowledge Distillation Training for Multi-Exit Architectures

Xinglu Wang, Yingming Li

Keywords Paper

0

0

0

0

15:12

12/07/2020

dS^2LBI: Exploring Structural Sparsity on Deep Network via Differential Inclusion Paths

Yanwei Fu, Chen Liu, Donghao Li and
Xinwei Sun, Jinshan ZENG, Yuan Yao

Keywords Paper

Deep Learning - Algorithms

0

0

0

1

12:45

13/04/2021

Fractional moment-preserving initialization schemes for training deep neural networks

Mert Gurbuzbalaban, Yuanhan Hu

Keywords Paper

0

0

0

0

3:05

18/07/2021

Asynchronous Decentralized Optimization With Implicit Stochastic Variance Reduction

Kenta Niwa, Guoqiang Zhang, W. Bastiaan Kleijn and
Noboru Harada, Hiroshi Sawada, Akinori Fujino

Keywords Paper

Optimization, Distributed and Parallel Optimization

0

0

0

0

5:41

13/04/2021

Logistic q-learning

Joan Bas-Serrano, Sebastian Curi, Andreas Krause, Gergely Neu

Keywords Paper

0

0

0

0

2:44

06/12/2021

Robust Implicit Networks via Non-Euclidean Contractions

Saber Jafarpour, Alexander Davydov, Anton Proskurnikov, Francesco Bullo

Keywords Paper

theory, deep learning, machine learning, robustness, vision

0

0

0

0

14:59

18/07/2021

Variational Empowerment as Representation Learning for Goal-Conditioned Reinforcement Learning

Jongwook Choi, Archit Sharma, Honglak Lee and
Sergey Levine, Shixiang Gu

Keywords Paper

Neuroscience and Cognitive Science, Neuroscience, Reinforcement Learning and Planning, Algorithms, Representation Learning; Algorithms, Sparse Coding and Dimensionality Expansion; Applications, Matrix and Ten

0

0

0

0

5:16

06/12/2021

Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex Decentralized Optimization Over Time-Varying Networks

Dmitry Kovalev, Elnur Gasanov, Alexander Gasnikov, Peter Richtarik

Keywords Paper

optimization

0

0

0

0

15:02

07/09/2020

Lifted Regression/Reconstruction Networks

Rasmus Høier, Christopher Zach

Keywords Paper

Lifted neural networks, Lipschitz continuity, adversarial robustness, energy-based models

0

0

0

0

8:23

02/02/2021

Deep Low-Contrast Image Enhancement using Structure Tensor Representation

Hyungjoo Jung, Hyunsung Jang, Namkoo Ha, Kwanghoon Sohn

Keywords Paper

0

0

0

0

16:31

26/04/2020

Target-Embedding Autoencoders for Supervised Representation Learning

Daniel Jarrett, Mihaela van der Schaar

Keywords Paper

autoencoders, supervised learning, representation learning, target-embedding, label-embedding

0

0

0

0

10:47

03/05/2021

Initialization and Regularization of Factorized Neural Layers

Misha Khodak, Neil Tenenholtz, Lester Mackey, Nicolo Fusi

Keywords Paper

matrix factorization, knowledge distillation, multi-head attention, model compression

0

0

0

0

4:25

12/07/2020

Unique Properties of Wide Minima in Deep Networks

Rotem Mulayoff, Tomer Michaeli

Keywords Paper

Deep Learning - Theory

0

0

0

0

14:35

18/07/2021

Offline Reinforcement Learning with Fisher Divergence Critic Regularization

Ilya Kostrikov, Rob Fergus, Jonathan Tompson, Ofir Nachum

Keywords Paper

Reinforcement Learning and Planning, Deep RL

0

0

0

0

4:49

02/02/2021

Visual Transfer For Reinforcement Learning Via Wasserstein Domain Confusion

Josh Roy, George D. Konidaris

Keywords Paper

0

0

0

0

17:01

09/07/2020

Implicit regularization for deep neural networks driven by an Ornstein-Uhlenbeck like process

Guy Blanc, Neha Gupta, Gregory Valiant, Paul Valiant

Keywords Paper

Neural networks/deep learning, Stochastic optimization

0

0

0

0

12:17

14/06/2020

MTL-NAS: Task-Agnostic Neural Architecture Search Towards General-Purpose Multi-Task Learning

Yuan Gao, Haoping Bai, Zequn Jie and
Jiayi Ma, Kui Jia, Wei Liu

Keywords Paper

neural architecture search, general-purpose multi-task learning, task-agnostic search space, single-shot gradient-based search algorithm, minimal entropy regularization

0

0

1

0

1:00

06/12/2021

$(\textrm{Implicit})^2$: Implicit Layers for Implicit Representations

Zhichun Huang, Shaojie Bai, J. Zico Kolter

Keywords Paper

deep learning, representation learning

1

0

0

1

12:23

18/07/2021

Recomposing the Reinforcement Learning Building Blocks with Hypernetworks

Elad Sarafian, Shai Keynan, Sarit Kraus

Keywords Paper

Reinforcement Learning and Planning, Deep RL

0

0

0

0

5:16

18/07/2021

A Wasserstein Minimax Framework for Mixed Linear Regression

Theo Diamandis, Yonina Eldar, Alireza Fallah and
Farzan Farnia, Asuman Ozdaglar

Keywords Paper

Algorithms, Multimodal Learning

0

0

0

0

25:41

03/05/2021

Separation and Concentration in Deep Networks

John Zarka, Florentin Guth, Stéphane Mallat

Keywords Paper

concentration, mean separation, neural collapse, fisher ratio, image classification, variance reduction, deep learning

0

0

0

0

5:11

06/12/2021

Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds and Benign Overfitting

Frederic Koehler, Lijia Zhou, [deadname] J Sutherland, Nathan Srebro

Keywords Paper

theory

0

0

0

0

19:56

22/11/2021

Domain Attention Consistency for Multi-Source Domain Adaptation

Zhongying Deng, Kaiyang Zhou, Yongxin Yang, Tao Xiang

Keywords Paper

Transferable Attribute Learning, Domain Attention Consistency, Multi-Source Domain Adaptation

0

0

0

0

9:24

06/12/2020

A Contour Stochastic Gradient Langevin Dynamics Algorithm for Simulations of Multi-modal Distributions

Wei Deng, Guang Lin, Faming Liang

Keywords Paper

0

0

0

0

3:26

06/12/2020

Spectra of the Conjugate Kernel and Neural Tangent Kernel for linear-width neural networks

Zhou Fan, Zhichao Wang

Keywords Paper

0

0

0

0

3:25

18/07/2021

Stochastic Sign Descent Methods: New Algorithms and Better Theory

Mher Safaryan, Peter Richtarik

Keywords Paper

Optimization, Distributed and Parallel Optimization

0

0

0

0

5:12

02/02/2021

Learning Cycle-Consistent Cooperative Networks via Alternating MCMC Teaching for Unsupervised Cross-Domain Translation

Jianwen Xie, Zilong Zheng, Xiaolin Fang and
Song-Chun Zhu, Ying Nian Wu

Keywords Paper

0

0

0

0

14:48

18/07/2021

Revealing the Structure of Deep Neural Networks via Convex Duality

Tolga Ergen, Mert Pilanci

Keywords Paper

Theory, Deep learning Theory

0

0

0

0

5:35

03/05/2021

Generalized Energy Based Models

Michael Arbel, Liang Zhou, Arthur Gretton

Keywords Paper

Generative Models, Optimization, Density estimation, Adversarial training, MCMC, Sampling

0

0

0

0

4:42

26/04/2020

Gradient Descent Maximizes the Margin of Homogeneous Neural Networks

Kaifeng Lyu, Jian Li

Keywords Paper

margin, homogeneous, gradient descent

0

0

0

0

15:02

03/05/2021

Deep Networks and the Multiple Manifold Problem

Sam Buchanan, Dar Gilboa, John Wright

Keywords Paper

low-dimensional structure, overparameterized neural networks, deep learning

0

0

0

0

5:14

22/11/2021

Multi-Source Domain Adaptation via supervised contrastive learning and confident consistency regularization

Marin Scalbert, Florent Couzinié-Devy, Maria Vakalopoulou

Keywords Paper

unsupervised domain adaptation, contrastive learning, semi-supervised learning, consistency regularization, domain shift

0

0

0

0

2:57