Directional convergence and alignment in deep learning

Abstract: In this paper, we show that although the minimizers of cross-entropy and related classification losses are off at infinity, network weights learned by gradient flow converge in direction, with an immediate corollary that network predictions, training errors, and the margin distribution also converge. This proof holds for deep homogeneous networks — a broad class of networks allowing for ReLU, max-pooling, linear, and convolutional layers — and we additionally provide empirical support not just close to the theory (e.g., the AlexNet), but also on non-homogeneous networks (e.g., the DenseNet). If the network further has locally Lipschitz gradients, we show that these gradients also converge in direction, and asymptotically align with the gradient flow path, with consequences on margin maximization, convergence of saliency maps, and a few other settings. Our analysis complements and is distinct from the well-known neural tangent and mean-field theories, and in particular makes no requirements on network width and initialization, instead merely requiring perfect classification accuracy. The proof proceeds by developing a theory of unbounded nonsmooth Kurdyka-Łojasiewicz inequalities for functions definable in an o-minimal structure, and is also applicable outside deep learning.

19/08/2021

Data Mining, Feature Extraction, Selection and Dimensionality Reduction, Mining Graphs, Semi Structured Data, Complex Data, Mining Text, Web, Social Media

14:58

06/12/2021

Directional convergence and alignment in deep learning

Ziwei Ji, Matus Telgarsky

Comments

Similar Papers

Learning Stochastic Equivalence based on Discrete Ricci Curvature

Xuan Guo, Qiang Tian, Wang Zhang and Wenjun Wang, Pengfei Jiao

Keywords Abstract Paper

Data Mining, Feature Extraction, Selection and Dimensionality Reduction, Mining Graphs, Semi Structured Data, Complex Data, Mining Text, Web, Social Media

Continuous vs. Discrete Optimization of Deep Neural Networks

Omer Elkabetz, Nadav Cohen

Keywords Abstract Paper

theory, deep learning, optimization

Deep Networks Provably Classify Data on Curves

Tingran Wang, Sam Buchanan, Dar Gilboa, John Wright

Keywords Abstract Paper

theory, deep learning, optimization, machine learning, kernel methods

Adversarial Examples in Multi-Layer Random ReLU Networks

Peter Bartlett, Sebastien Bubeck, Yeshwanth Cherapanamjeri

Keywords Abstract Paper

theory, adversarial robustness and security

Improved Sample Complexities for Deep Neural Networks and Robust Classification via an All-Layer Margin

Colin Wei, Tengyu Ma

Keywords Abstract Paper

deep learning theory, generalization bounds, adversarially robust generalization, data-dependent generalization bounds

The Heavy-Tail Phenomenon in SGD

Mert Gurbuzbalaban, Umut Simsekli, Lingjiong Zhu

Keywords Abstract Paper

Optimization, Stochastic Optimization

The inductive bias of ReLU networks on orthogonally separable data

Mary Phuong, Christoph H Lampert

Keywords Abstract Paper

implicit bias, extremal sector, gradient descent, inductive bias, max-margin, ReLU networks

Deep Networks and the Multiple Manifold Problem

Sam Buchanan, Dar Gilboa, John Wright

Keywords Abstract Paper

low-dimensional structure, overparameterized neural networks, deep learning

A Mean Field Analysis Of Deep ResNet And Beyond: Towards Provably Optimization Via Overparameterization From Depth

Yiping Lu, Chao Ma, Yulong Lu and Jianfeng Lu, Lexing Ying

Keywords Abstract Paper

Communication-Efficient Distributed Optimization in Networks with Gradient Tracking and Variance Reduction

Boyue Li, Shicong Cen, Yuxin Chen, Yuejie Chi

Keywords Abstract Paper

Multi-Proxy Wasserstein Classifier for Image Classification

Benlin Liu, Yongming Rao, Jiwen Lu and Jie Zhou, Cho-Jui Hsieh

Keywords Abstract Paper

Second-Order Provable Defenses against Adversarial Attacks

Sahil Singla, Soheil Feizi

Keywords Abstract Paper

Orthogonalizing Convolutional Layers with the Cayley Transform

Asher Trockman, Zico Kolter

Keywords Abstract Paper

Lipschitz constrained networks, orthogonal layers, adversarial robustness

Estimating Lipschitz constants of monotone deep equilibrium models

Chirag Pabbaraju, Ezra Winston, Zico Kolter

Keywords Abstract Paper

deep equilibrium models, Lipschitz constants

Monotone operator equilibrium networks

Ezra Winston, J. Zico Kolter

Keywords Abstract Paper

On the Verification of Neural ODEs with Stochastic Guarantees

Sophie Grunbacher, Ramin Hasani, Mathias Lechner and Jacek Cyranka, Scott A. Smolka, Radu Grosu

Keywords Abstract Paper

Invertible DenseNets with Concatenated LipSwish

Yura Perugachi-Diaz, Jakub M. Tomczak, Sandjai Bhulai

Keywords Abstract Paper

Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm

sajad khodadadian, Zaiwei Chen, Siva Maguluri

Keywords Abstract Paper

Analytic Insights into Structure and Rank of Neural Network Hessian Maps

Sidak Pal Singh, Gregor Bachmann, Thomas Hofmann

Keywords Abstract Paper

deep learning, optimization, generative model

Unique Properties of Wide Minima in Deep Networks

Rotem Mulayoff, Tomer Michaeli

Keywords Abstract Paper

Curvature-corrected learning dynamics in deep neural networks

Keywords Abstract Paper

Gradient Descent on Two-layer Nets: Margin Maximization and Simplicity Bias

Kaifeng Lyu, Zhiyuan Li, Runzhe Wang, Sanjeev Arora

Keywords Abstract Paper

Xuan Guo, Qiang Tian, Wang Zhang and
Wenjun Wang, Pengfei Jiao

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Yiping Lu, Chao Ma, Yulong Lu and
Jianfeng Lu, Lexing Ying

Keywords Paper

Keywords Paper

Benlin Liu, Yongming Rao, Jiwen Lu and
Jie Zhou, Cho-Jui Hsieh

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Sophie Grunbacher, Ramin Hasani, Mathias Lechner and
Jacek Cyranka, Scott A. Smolka, Radu Grosu

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Kimia Nadjahi, Alain Durmus, Lénaïc Chizat and
Soheil Kolouri, Shahin Shahrampour, Umut Simsekli

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Erik Daxberger, Eric Nalisnick, James Allingham and
Javier Antorán, Jose Miguel Hernandez-Lobato

Keywords Paper