The Benefits of Implicit Regularization from SGD in Least Squares Problems

Abstract: Stochastic gradient descent (SGD) exhibits strong algorithmic regularization effects in practice, which has been hypothesized to play an important role in the generalization of modern machine learning approaches. In this work, we seek to understand these issues in the simpler setting of linear regression (including both underparameterized and overparameterized regimes), where our goal is to make sharp instance-based comparisons of the implicit regularization afforded by (unregularized) average SGD with the explicit regularization of ridge regression. For a broad class of least squares problem instances (that are natural in high-dimensional settings), we show: (1) for every problem instance and for every ridge parameter, (unregularized) SGD, when provided with \emph{logarithmically} more samples than that provided to the ridge algorithm, generalizes no worse than the ridge solution (provided SGD uses a tuned constant stepsize); (2) conversely, there exist instances (in this wide problem class) where optimally-tuned ridge regression requires \emph{quadratically} more samples than SGD in order to have the same generalization performance. Taken together, our results show that, up to the logarithmic factors, the generalization performance of SGD is always no worse than that of ridge regression in a wide range of overparameterized problems, and, in fact, could be much better for some problem instances. More generally, our results show how algorithmic regularization has important consequences even in simpler (overparameterized) convex settings.

06/12/2021

The Benefits of Implicit Regularization from SGD in Least Squares Problems

Difan Zou, Jingfeng Wu, Vladimir Braverman, Quanquan Gu, Dean Foster, Sham Kakade

Comments

Similar Papers

Non-convex Distributionally Robust Optimization: Non-asymptotic Analysis

Jikai Jin, Bohang Zhang, Haiyang Wang, Liwei Wang

Keywords Abstract Paper

Benign Overfitting of Constant-Stepsize SGD for Linear Regression

Difan Zou, Jingfeng Wu, Vladimir Braverman and Quanquan Gu, Sham Kakade

Keywords Abstract Paper

Implicit Regularization and Convergence for Weight Normalization

Xiaoxia (Shirley) Wu, Edgar Dobriban, Tongzheng Ren and Shanshan Wu, Zhiyuan Li, Suriya Gunasekar, Rachel Ward, Qiang Liu

Keywords Abstract Paper

Leveraging Non-uniformity in First-order Non-convex Optimization

Jincheng Mei, Yue Gao, Bo Dai and Csaba Szepesvari, Dale Schuurmans

Keywords Abstract Paper

Theory, RL, Decisions and Control Theory

Learning To Scale Mixed-Integer Programs

Timo Berthold, Gregor Hendel

Keywords Abstract Paper

Improving the Sample and Communication Complexity for Decentralized Non-Convex Optimization: Joint Gradient Estimation and Tracking

Haoran Sun, Songtao Lu, Mingyi Hong

Keywords Abstract Paper

Smooth Bilevel Programming for Sparse Regularization

Clarice Poon, Gabriel Peyré

Keywords Abstract Paper

Implicit differentiation of Lasso-type models for hyperparameter optimization

Quentin Bertrand, Quentin Klopfenstein, Mathieu Blondel and Samuel Vaiter, Alexandre Gramfort, Joseph Salmon

Keywords Abstract Paper

Understanding Over-parameterization in Generative Adversarial Networks

Yogesh Balaji, Mohammadmahdi Sajedi, Neha Kalibhat and Mucong Ding, Dominik Stöger, Mahdi Soltanolkotabi, Soheil Feizi

Keywords Abstract Paper

min-max optimization, Over-parameterization, GAN

Closing the Gap: Tighter Analysis of Alternating Stochastic Gradient Methods for Bilevel Problems

Tianyi Chen, Yuejiao Sun, Wotao Yin

Keywords Abstract Paper

theory, optimization, reinforcement learning and planning, machine learning

Regularized Submodular Maximization at Scale

Ehsan Kazemi, shervin minaee, Moran Feldman, Amin Karbasi

Keywords Abstract Paper

Optimization, Combinatorial Optimization

Nonasymptotic Guarantees for Spiked Matrix Recovery with Generative Priors

Jorio Cocola, Paul Hand, Vlad Voroninski

Keywords Abstract Paper

Functions with average smoothness: structure, algorithms, and learning

Yair Ashlagi, Lee-Ad Gottlieb, Aryeh Kontorovich

Keywords Abstract Paper

Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses

Raef Bassily, Vitaly Feldman, Cristóbal Guzmán, Kunal Talwar

Keywords Abstract Paper

The Performance Analysis of Generalized Margin Maximizers on Separable Data

Fariborz Salehi, Ehsan Abbasi, Babak Hassibi

Keywords Abstract Paper

Extrapolation for Large-batch Training in Deep Learning

Tao LIN, Lingjing Kong, Sebastian Stich, Martin Jaggi

Keywords Abstract Paper

Obtaining Adjustable Regularization for Free via Iterate Averaging

Jingfeng Wu, Vladimir Braverman, Lin Yang

Keywords Abstract Paper

Determinantal point processes based on orthogonal polynomials for sampling minibatches in SGD

Rémi Bardenet, Subhroshekhar Ghosh, Meixia LIN

Keywords Abstract Paper

optimization, machine learning

SLURP: Side Learning Uncertainty for Regression Problems

Xuanlong Yu, Gianni Franchi, Emanuel Aldea

Keywords Abstract Paper

Uncertainty estimation, Confidence estimation, Auxiliary model, Monocular depth, Optical flow

Stochastic Recursive Variance-Reduced Cubic Regularization Methods

Dongruo Zhou, Quanquan Gu

Keywords Abstract Paper

Optimal approximation for unconstrained non-submodular minimization

Marwa El Halabi, Stefanie Jegelka

Keywords Abstract Paper

Improved Penalty Method via Doubly Stochastic Gradients for Bilevel Hyperparameter Optimization

Wanli Shi, Bin Gu

Keywords Abstract Paper

Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms

Dheeraj Nagaraj, Xian Wu, Guy Bresler and Prateek Jain, Praneeth Netrapalli

Keywords Abstract Paper

A Second look at Exponential and Cosine Step Sizes: Simplicity, Adaptivity, and Performance

Keywords Paper

Difan Zou, Jingfeng Wu, Vladimir Braverman and
Quanquan Gu, Sham Kakade

Keywords Paper

Xiaoxia (Shirley) Wu, Edgar Dobriban, Tongzheng Ren and
Shanshan Wu, Zhiyuan Li, Suriya Gunasekar, Rachel Ward, Qiang Liu

Keywords Paper

Jincheng Mei, Yue Gao, Bo Dai and
Csaba Szepesvari, Dale Schuurmans

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Quentin Bertrand, Quentin Klopfenstein, Mathieu Blondel and
Samuel Vaiter, Alexandre Gramfort, Joseph Salmon

Keywords Paper

Yogesh Balaji, Mohammadmahdi Sajedi, Neha Kalibhat and
Mucong Ding, Dominik Stöger, Mahdi Soltanolkotabi, Soheil Feizi

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Dheeraj Nagaraj, Xian Wu, Guy Bresler and
Prateek Jain, Praneeth Netrapalli

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Mingrui Zhang, Zebang Shen, Aryan Mokhtari and
Hamed Hassani, Amin Karbasi

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Tommaso d'Orsi, Chih-Hung Liu, Rajai Nasser and
Gleb Novikov, David Steurer, Stefan Tiegel

Keywords Paper

Marine Le Morvan, Julie Josse, Thomas Moreau and
Erwan Scornet, Gael Varoquaux

Keywords Paper

Keywords Paper