Tensor Programs IV: Feature Learning in Infinite-Width Neural Networks

Abstract: As its width tends to infinity, a deep neural network's behavior under gradient descent can become simplified and predictable (e.g. given by the Neural Tangent Kernel (NTK)), if it is parametrized appropriately (e.g. the NTK parametrization). However, we show that the standard and NTK parametrizations of a neural network do not admit infinite-width limits that can *learn* features, which is crucial for pretraining and transfer learning such as with BERT. We propose simple modifications to the standard parametrization to allow for feature learning in the limit. Using the *Tensor Programs* technique, we derive explicit formulas for such limits. On Word2Vec and few-shot learning on Omniglot via MAML, two canonical tasks that rely crucially on feature learning, we compute these limits exactly. We find that they outperform both NTK baselines and finite-width networks, with the latter approaching the infinite-width feature learning performance as width increases.

06/12/2021

Tensor Programs IV: Feature Learning in Infinite-Width Neural Networks

Greg Yang, Edward Hu

Comments

Similar Papers

The Implicit Bias of Minima Stability: A View from Function Space

Rotem Mulayoff, Tomer Michaeli, Daniel Soudry

Keywords Abstract Paper

deep learning, optimization

Robust Implicit Networks via Non-Euclidean Contractions

Saber Jafarpour, Alexander Davydov, Anton Proskurnikov, Francesco Bullo

Keywords Abstract Paper

theory, deep learning, machine learning, robustness, vision

Subquadratic Overparameterization for Shallow Neural Networks

ChaeHwan Song, Ali Ramezani-Kebrya, Thomas Pethick and Armin Eftekhari, Volkan Cevher

Keywords Abstract Paper

theory, deep learning, optimization

Nonparametric Regression with Shallow Overparametrized Neural Networks Trained by GD with Early Stopping

Ilja Kuzborskij , Csaba Szepesvari

Keywords Abstract Paper

Don't Waste Your Bits! Squeeze Activations and Gradients for Deep Neural Networks via TinyScript

Fangcheng Fu, Yuzheng Hu, Yihan He and Jiawei Jiang, Yingxia Shao, Ce Zhang, Bin Cui

Keywords Abstract Paper

Optimization - Large Scale, Parallel and Distributed

Reweighting Augmented Samples by Minimizing the Maximal Expected Loss

Mingyang Yi, LU HOU, Lifeng Shang and Xin Jiang, Qun Liu, Zhi-Ming Ma

Keywords Abstract Paper

sample reweighting, data augmentation

Gradient Descent on Two-layer Nets: Margin Maximization and Simplicity Bias

Kaifeng Lyu, Zhiyuan Li, Runzhe Wang, Sanjeev Arora

Keywords Abstract Paper

deep learning, optimization, machine learning

Sparsifying Networks via Subdifferential Inclusion

Sagar Verma, Jean-Christophe Pesquet

Keywords Abstract Paper

Optimization, Convex Optimization

Finite Depth and Width Corrections to the Neural Tangent Kernel

Boris Hanin, Mihai Nica

Keywords Abstract Paper

Neural Tangent Kernel, Finite Width Corrections, Random ReLU Net, Wide Networks, Deep Networks

Improved Sample Complexities for Deep Neural Networks and Robust Classification via an All-Layer Margin

Colin Wei, Tengyu Ma

Keywords Abstract Paper

deep learning theory, generalization bounds, adversarially robust generalization, data-dependent generalization bounds

HRank: Filter Pruning Using High-Rank Feature Map

Mingbao Lin, Rongrong Ji, Yan Wang and Yichen Zhang, Baochang Zhang, Yonghong Tian, Ling Shao

Keywords Abstract Paper

network pruning, neural network compression and acceleration, high-rank feature map, efficient deep learning computing

The Limitations of Large Width in Neural Networks: A Deep Gaussian Process Perspective

Geoff Pleiss, John Cunningham

Keywords Abstract Paper

deep learning, kernel methods

Why Gradient Clipping Accelerates Training: A Theoretical Justification for Adaptivity

Jingzhao Zhang, Tianxing He, Suvrit Sra, Ali Jadbabaie

Keywords Abstract Paper

Adaptive methods, optimization, deep learning

When Expressivity Meets Trainability: Fewer than $n$ Neurons Can Work

Jiawei Zhang, Yushun Zhang, Mingyi Hong and Ruoyu Sun, Zhi-Quan Luo

Keywords Abstract Paper

deep learning, optimization

Dynamics of Deep Neural Networks and Neural Tangent Hierarchy

Jiaoyang Huang, Horng-Tzer Yau

Keywords Abstract Paper

A Single-Loop Smoothed Gradient Descent-Ascent Algorithm for Nonconvex-Concave Min-Max Problems

Jiawei Zhang, Peijun Xiao, Ruoyu Sun, Zhiquan Luo

Keywords Abstract Paper

Non-Euclidean Universal Approximation

Anastasis Kratsios, Eugene Bilokopytov

Keywords Abstract Paper

Stochastic Solutions for Linear Inverse Problems using the Prior Implicit in a Denoiser

Zahra Kadkhodaie, Eero P Simoncelli

Keywords Abstract Paper

deep learning, self-supervised learning

Unique Properties of Wide Minima in Deep Networks

Rotem Mulayoff, Tomer Michaeli

Keywords Abstract Paper

Curvature-corrected learning dynamics in deep neural networks

Keywords Abstract Paper

Phase-Wise Parameter Aggregation for Improving SGD Optimization

Keywords Abstract Paper

Finite Versus Infinite Neural Networks: an Empirical Study

Keywords Paper

Keywords Paper

ChaeHwan Song, Ali Ramezani-Kebrya, Thomas Pethick and
Armin Eftekhari, Volkan Cevher

Keywords Paper

Keywords Paper

Fangcheng Fu, Yuzheng Hu, Yihan He and
Jiawei Jiang, Yingxia Shao, Ce Zhang, Bin Cui

Keywords Paper

Mingyang Yi, LU HOU, Lifeng Shang and
Xin Jiang, Qun Liu, Zhi-Ming Ma

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Mingbao Lin, Rongrong Ji, Yan Wang and
Yichen Zhang, Baochang Zhang, Yonghong Tian, Ling Shao

Keywords Paper

Keywords Paper

Keywords Paper

Jiawei Zhang, Yushun Zhang, Mingyi Hong and
Ruoyu Sun, Zhi-Quan Luo

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Jaehoon Lee, Sam Schoenholz, Jeffrey Pennington and
Ben Adlam, Lechao Xiao, Roman Novak, Jascha Sohl-Dickstein

Keywords Paper

Yiping Lu, Chao Ma, Yulong Lu and
Jianfeng Lu, Lexing Ying

Keywords Paper

Sanjeev Arora, Simon S. Du, Zhiyuan Li and
Ruslan Salakhutdinov, Ruosong Wang, Dingli Yu

Keywords Paper

Mingjian Zhu, Kai Han, Enhua Wu and
Qiulin Zhang, Ying Nie, Zhenzhong Lan, Yunhe Wang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Erik Daxberger, Eric Nalisnick, James Allingham and
Javier Antorán, Jose Miguel Hernandez-Lobato

Keywords Paper

Yufeng Zhang, Qi Cai, Zhuoran Yang and
Yongxin Chen, Zhaoran Wang

Keywords Paper

Sophie Grunbacher, Ramin Hasani, Mathias Lechner and
Jacek Cyranka, Scott A. Smolka, Radu Grosu

Keywords Paper