Optimal Regularization can Mitigate Double Descent

Abstract: Recent empirical and theoretical studies have shown that many learning algorithms -- from linear regression to neural networks -- can have test performance that is non-monotonic in quantities such the sample size and model size. This striking phenomenon, often referred to as "double descent", has raised questions of if we need to re-think our current understanding of generalization. In this work, we study whether the double-descent phenomenon can be avoided by using optimal regularization. Theoretically, we prove that for certain linear regression models with isotropic data distribution, optimally-tuned $\ell_2$ regularization achieves monotonic test performance as we grow either the sample size or the model size. We also demonstrate empirically that optimally-tuned $\ell_2$ regularization can mitigate double descent for more general models, including neural networks. Our results suggest that it may also be informative to study the test risk scalings of various algorithms in the context of appropriately tuned regularization.

18/07/2021

Optimal Regularization can Mitigate Double Descent

Preetum Nakkiran, Prayaag Venkat, Sham M Kakade, Tengyu Ma

Comments

Similar Papers

Efficient Statistical Tests: A Neural Tangent Kernel Approach

Sheng Jia, Ehsan Nezhadarya, Yuhuai Wu, Jimmy Ba

Keywords Abstract Paper

Deep Learning

Beyond Linearization: On Quadratic and Higher-Order Approximation of Wide Neural Networks

Yu Bai, Jason D. Lee

Keywords Abstract Paper

Neural Tangent Kernels, over-parametrized neural networks, deep learning theory

On the Role of Optimization in Double Descent: A Least Squares Study

Ilja Kuzborskij, Csaba Szepesvari, Omar Rivasplata and Amal Rannen-Triki, Razvan Pascanu

Keywords Abstract Paper

theory, deep learning, optimization

Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms

Alexander Camuto, George Deligiannidis, Murat Erdogdu and Mert Gurbuzbalaban, Umut Simsekli, Lingjiong Zhu

Keywords Abstract Paper

theory, deep learning, optimization

A Linear-time Independence Criterion Based on a Finite Basis Approximation

Longfei Yan, W. Bastiaan Kleijn, thushara abhayapala

Keywords Abstract Paper

Asymptotics of Ridge Regression in Convolutional Models

Moji Sahraee-Ardakan, Tung Mai, Anup Rao and Ryan A. Rossi, Sundeep Rangan, Alyson Fletcher

Keywords Abstract Paper

Theory

Why Are Learned Indexes So Effective?

Paolo Ferragina, Fabrizio Lillo, Giorgio Vinciguerra

Keywords Abstract Paper

Applications - Other

Towards Lower Bounds on the Depth of ReLU Neural Networks

Christoph Hertrich, Amitabh Basu, Marco Di Summa, Martin Skutella

Keywords Abstract Paper

theory, deep learning, optimization

Using Random Effects to Account for High-Cardinality Categorical Features and Repeated Measures in Deep Neural Networks

Giora Simchoni, Saharon Rosset

Keywords Abstract Paper

deep learning, machine learning, vision

Adjusting for Autocorrelated Errors in Neural Networks for Time Series

Fan-Keng Sun, Chris Lang, Duane Boning

Keywords Abstract Paper

deep learning

Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks

Blake Bordelon, Abdulkadir Canatar, Cengiz Pehlevan

Keywords Abstract Paper

General Machine Learning Techniques

Model Fusion via Optimal Transport

Sidak Pal Singh, Martin Jaggi

Keywords Abstract Paper

Entropic gradient descent algorithms and wide flat minima

Fabrizio Pittorino, Carlo Lucibello, Christoph Feinauer and Gabriele Perugini, Carlo Baldassi, Elizaveta Demyanenko, Riccardo Zecchina

Keywords Abstract Paper

flat minima, belief-propagation, statistical physics, entropic algorithms

The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization

Ben Adlam, Jeffrey Pennington

Keywords Abstract Paper

Deep Learning - Theory

Constructing a provably adversarially-robust classifier from a high accuracy one

Grzegorz Gluch, Rüdiger Urbanke

Keywords Abstract Paper

Out-of-Distribution Generalization in Kernel Regression

Abdulkadir Canatar, Blake Bordelon, Cengiz Pehlevan

Keywords Abstract Paper

theory, deep learning, machine learning

Regularized ERM on random subspaces

Andrea Della Vecchia, Jaouad Mourtada, Ernesto De Vito, Lorenzo Rosasco

Keywords Abstract Paper

Double Trouble in Double Descent: Bias and Variance(s) in the Lazy Regime

Stéphane d'Ascoli, Maria Refinetti, Giulio Biroli, Florent Krzakala

Keywords Abstract Paper

Deep Learning - Theory

Representation Learning Beyond Linear Prediction Functions

Ziping Xu, Ambuj Tewari

Keywords Abstract Paper

theory, deep learning, optimization, representation learning, few shot learning

Provable Benefits of Overparameterization in Model Compression: From Double Descent to Pruning Neural Networks

Xiangyu Chang, Yingcong Li, Samet Oymak, Christos Thrampoulidis

Keywords Abstract Paper

Adaptive Sampling for Minimax Fair Classification

Keywords Paper

Keywords Paper

Ilja Kuzborskij, Csaba Szepesvari, Omar Rivasplata and
Amal Rannen-Triki, Razvan Pascanu

Keywords Paper

Alexander Camuto, George Deligiannidis, Murat Erdogdu and
Mert Gurbuzbalaban, Umut Simsekli, Lingjiong Zhu

Keywords Paper

Keywords Paper

Moji Sahraee-Ardakan, Tung Mai, Anup Rao and
Ryan A. Rossi, Sundeep Rangan, Alyson Fletcher

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Fabrizio Pittorino, Carlo Lucibello, Christoph Feinauer and
Gabriele Perugini, Carlo Baldassi, Elizaveta Demyanenko, Riccardo Zecchina

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Tejas Gokhale, Rushil Anirudh, Bhavya Kailkhura and
Jayaraman J. Thiagarajan, Chitta Baral, Yezhou Yang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Silviu-Marian Udrescu, Andrew Tan, Jiahai Feng and
Orisvaldo Neto, Tailin Wu, Max Tegmark

Keywords Paper

Yingbo Gao, Weiyue Wang, Christian Herold and
Zijian Yang, Hermann Ney

Keywords Paper

Keywords Paper

Hrayr Harutyunyan, Alessandro Achille, Giovanni Paolini and
Orchid Majumder, Avinash Ravichandran, Rahul Bhotika, Stefano Soatto

Keywords Paper

Keyulu Xu, Mozhi Zhang, Jingling Li and
Simon Du, Ken-Ichi Kawarabayashi, Stefanie Jegelka

Keywords Paper

Mitchell Wortsman, Maxwell Horton, Carlos Guestrin and
Ali Farhadi, Mohammad Rastegari

Keywords Paper

Keywords Paper

Keywords Paper

Yufan Zhou, Zhenyi Wang, Jiayi Xian and
Changyou Chen, Jinhui Xu

Keywords Paper

Keywords Paper