Training Neural Networks for and by Interpolation

Abstract: In modern supervised learning, many deep neural networks are able to interpolate the data: the empirical loss can be driven to near zero on all samples simultaneously. In this work, we explicitly exploit this interpolation property for the design of a new optimization algorithm for deep learning, which we term Adaptive Learning-rates for Interpolation with Gradients (ALI-G). ALI-G retains the two main advantages of Stochastic Gradient Descent (SGD), which are (i) a low computational cost per iteration and (ii) good generalization performance in practice. At each iteration, ALI-G exploits the interpolation property to compute an adaptive learning-rate in closed form. In addition, ALI-G clips the learning-rate to a maximal value, which we prove to be helpful for non-convex problems. Crucially, in contrast to the learning-rate of SGD, the maximal learning-rate of ALI-G does not require a decay schedule. This makes ALI-G considerably easier to tune than SGD. We prove the convergence of ALI-G in various stochastic settings. Notably, we tackle the realistic case where the interpolation property is satisfied up to some tolerance. We also provide experiments on a variety of deep learning architectures and tasks: (i) learning a differentiable neural computer; (ii) training a wide residual network on the SVHN data set; (iii) training a Bi-LSTM on the SNLI data set; and (iv) training wide residual networks and densely connected networks on the CIFAR data sets. ALI-G produces state-of-the-art results among adaptive methods, and even yields comparable performance with SGD, which requires manually tuned learning-rate schedules. Furthermore, ALI-G is simple to implement in any standard deep learning framework and can be used as a drop-in replacement in existing code.

06/12/2021

dataset corruption, infinite-width networks, neural kernels, kernel-ridge regression, dataset compression, dataset distillation, meta-learning

4:59

06/12/2020

Training Neural Networks for and by Interpolation

Leonard Berrada, M. Pawan Kumar, Andrew Zisserman

Comments

Similar Papers

Simple Stochastic and Online Gradient Descent Algorithms for Pairwise Learning

ZHENHUAN YANG, Yunwen Lei, Puyu Wang and Tianbao Yang, Yiming Ying

Keywords Abstract Paper

optimization, machine learning, privacy

Regularizing Meta-Learning via Gradient Dropout

Hung-Yu Tseng, Yi-Wen Chen, Yi-Hsuan Tsai and Sifei Liu, Yen-Yu Lin, Ming-Hsuan Yang

Keywords Abstract Paper

Dataset Meta-Learning from Kernel Ridge-Regression

Timothy Nguyen, Zhourong Chen, Jaehoon Lee

Keywords Abstract Paper

dataset corruption, infinite-width networks, neural kernels, kernel-ridge regression, dataset compression, dataset distillation, meta-learning

Bayesian Meta-Learning for the Few-Shot Setting via Deep Kernels

Massimiliano Patacchiola, Jack Turner, Elliot Crowley and Michael O'Boyle, Amos Storkey

Keywords Abstract Paper

Deep Learning; Deep Learning -> CNN Architectures; Theory -> Spaces of Functions and Kernels, Theory

A Single-Loop Smoothed Gradient Descent-Ascent Algorithm for Nonconvex-Concave Min-Max Problems

Jiawei Zhang, Peijun Xiao, Ruoyu Sun, Zhiquan Luo

Keywords Abstract Paper

Provable Benefit of Orthogonal Initialization in Optimizing Deep Linear Networks

Wei Hu, Lechao Xiao, Jeffrey Pennington

Keywords Abstract Paper

deep learning theory, non-convex optimization, orthogonal initialization

Understanding Over-parameterization in Generative Adversarial Networks

Yogesh Balaji, Mohammadmahdi Sajedi, Neha Kalibhat and Mucong Ding, Dominik Stöger, Mahdi Soltanolkotabi, Soheil Feizi

Keywords Abstract Paper

min-max optimization, Over-parameterization, GAN

Safe Deep Semi-Supervised Learning for Unseen-Class Unlabeled Data

Lan-Zhe Guo, Zhen-Yu Zhang, Yuan Jiang and Yufeng Li, Zhi-Hua Zhou

Keywords Abstract Paper

Unsupervised and Semi-Supervised Learning

Sparsifying Networks via Subdifferential Inclusion

Sagar Verma, Jean-Christophe Pesquet

Keywords Abstract Paper

Optimization, Convex Optimization

Initialization and Regularization of Factorized Neural Layers

Misha Khodak, Neil Tenenholtz, Lester Mackey, Nicolo Fusi

Keywords Abstract Paper

matrix factorization, knowledge distillation, multi-head attention, model compression

Extrapolation for Large-batch Training in Deep Learning

Tao LIN, Lingjing Kong, Sebastian Stich, Martin Jaggi

Keywords Abstract Paper

Deep Learning - Algorithms

R-Drop: Regularized Dropout for Neural Networks

xiaobo liang, Lijun Wu, Juntao Li and Yue Wang, Qi Meng, Tao Qin, Wei Chen, Min Zhang, Tie-Yan Liu

Keywords Abstract Paper

deep learning, machine learning, transformers, vision, language

DivideMix: Learning with Noisy Labels as Semi-supervised Learning

Junnan Li, Richard Socher, Steven C.H. Hoi

Keywords Abstract Paper

label noise, semi-supervised learning

$(\textrm{Implicit})^2$: Implicit Layers for Implicit Representations

Zhichun Huang, Shaojie Bai, J. Zico Kolter

Keywords Abstract Paper

deep learning, representation learning

Robust Unsupervised Learning via L-statistic Minimization

Andreas Maurer, Daniela Angela Parletta, Andrea Paudice, Massimiliano Pontil

Keywords Abstract Paper

Theory, Statistical Learning Theory

Self-supervised learning for pairwise data refinement

Gustavo Hernandez Abrego, Bowen Liang, Wei Wang and Zarana Parekh, Yinfei Yang, Yunhsuan Sung

Keywords Abstract Paper

Unsupervised Data Augmentation for Consistency Training

Qizhe Xie, Zihang Dai, Eduard Hovy and Thang Luong, Quoc V Le

Keywords Abstract Paper

Leveraging Recursive Gumbel-Max Trick for Approximate Inference in Combinatorial Spaces

Kirill Struminsky, Artyom Gadetsky, Denis Rakitin and Danil Karpushkin, Dmitry Vetrov

Keywords Abstract Paper

deep learning, optimization

Sparse Spectrum Warped Input Measures for Nonstationary Kernel Learning

Anthony Tompkins, Rafael Oliveira, Fabio Ramos

Keywords Abstract Paper

Reweighting Augmented Samples by Minimizing the Maximal Expected Loss

Mingyang Yi, LU HOU, Lifeng Shang and Xin Jiang, Qun Liu, Zhi-Ming Ma

Keywords Abstract Paper

sample reweighting, data augmentation

Sparse Flows: Pruning Continuous-depth Models

ZHENHUAN YANG, Yunwen Lei, Puyu Wang and
Tianbao Yang, Yiming Ying

Keywords Paper

Hung-Yu Tseng, Yi-Wen Chen, Yi-Hsuan Tsai and
Sifei Liu, Yen-Yu Lin, Ming-Hsuan Yang

Keywords Paper

Keywords Paper

Massimiliano Patacchiola, Jack Turner, Elliot Crowley and
Michael O'Boyle, Amos Storkey

Keywords Paper

Keywords Paper

Keywords Paper

Yogesh Balaji, Mohammadmahdi Sajedi, Neha Kalibhat and
Mucong Ding, Dominik Stöger, Mahdi Soltanolkotabi, Soheil Feizi

Keywords Paper

Lan-Zhe Guo, Zhen-Yu Zhang, Yuan Jiang and
Yufeng Li, Zhi-Hua Zhou

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

xiaobo liang, Lijun Wu, Juntao Li and
Yue Wang, Qi Meng, Tao Qin, Wei Chen, Min Zhang, Tie-Yan Liu

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Gustavo Hernandez Abrego, Bowen Liang, Wei Wang and
Zarana Parekh, Yinfei Yang, Yunhsuan Sung

Keywords Paper

Qizhe Xie, Zihang Dai, Eduard Hovy and
Thang Luong, Quoc V Le

Keywords Paper

Kirill Struminsky, Artyom Gadetsky, Denis Rakitin and
Danil Karpushkin, Dmitry Vetrov

Keywords Paper

Keywords Paper

Mingyang Yi, LU HOU, Lifeng Shang and
Xin Jiang, Qun Liu, Zhi-Ming Ma

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Yasaman Esfandiari, Sin Yong Tan, Zhanhong Jiang and
Aditya Balu, Ethan Herron, Chinmay Hegde, Soumik Sarkar

Keywords Paper

Hedi Xia, Vai Suliafu, Hangjie Ji and
Tan Nguyen, Andrea Bertozzi, Stanley Osher, Bao Wang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Atal Sahu, Aritra Dutta, Ahmed M. Abdelmoniem and
Trambak Banerjee, Marco Canini, Panos Kalnis

Keywords Paper

Gen Li, Changxiao Cai, Yuxin Chen and
Yuantao Gu, Yuting Wei, Yuejie Chi

Keywords Paper

Yufeng Zhang, Qi Cai, Zhuoran Yang and
Yongxin Chen, Zhaoran Wang

Keywords Paper