Gradient descent follows the regularization path for general losses

09/07/2020

Gradient descent follows the regularization path for general losses

Ziwei Ji, Miroslav Dudik, Robert Schapire, Matus Telgarsky

Keywords: Loss functions, Classification, Convex optimization

Abstract Paper Similar Papers

Abstract: Recent work across many machine learning disciplines has highlighted that standard descent methods, even without explicit regularization, do not merely minimize the training error, but also exhibit an \emph{implicit bias}. This bias is typically towards a certain regularized solution, and relies upon the details of the learning process, for instance the use of the cross-entropy loss.\n\nIn this work, we show that for empirical risk minimization over linear predictors with \emph{arbitrary} convex, strictly decreasing losses, if the risk does not attain its infimum, then the gradient-descent path and the \emph{algorithm-independent} regularization path converge to the same direction (whenever either converges to a direction). Using this result, we provide a justification for the widely-used exponentially-tailed losses (such as the exponential loss or the logistic loss): while this convergence to a direction for exponentially-tailed losses is necessarily to the maximum-margin direction, other losses such as polynomially-tailed losses may induce convergence to a direction with a poor margin.\n

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at COLT 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

18/07/2021

Robust Unsupervised Learning via L-statistic Minimization

Andreas Maurer, Daniela Angela Parletta, Andrea Paudice, Massimiliano Pontil

Keywords Paper

Theory, Statistical Learning Theory

0

0

0

0

5:03

26/08/2020

Ordered SGD: A New Stochastic Optimization Framework for Empirical Risk Minimization

Kenji Kawaguchi, Haihao Lu

Keywords Paper

0

0

0

0

14:10

18/07/2021

Guarantees for Tuning the Step Size using a Learning-to-Learn Approach

Xiang Wang, Shuai Yuan, Chenwei Wu, Rong Ge

Keywords Paper

Theory, Computational Learning Theory

0

0

0

0

5:20

06/12/2021

On the Convergence of Step Decay Step-Size for Stochastic Optimization

Xiaoyu Wang, Sindri Magnússon, Mikael Johansson

Keywords Paper

deep learning, optimization, machine learning

0

0

0

0

14:58

13/04/2021

Exponential convergence rates of classification errors on learning with SGD and random features

Shingo Yashima, Atsushi Nitanda, Taiji Suzuki

Keywords Paper

0

0

0

0

2:58

06/12/2021

An Analysis of Constant Step Size SGD in the Non-convex Regime: Asymptotic Normality and Bias

Lu Yu, Krishnakumar Balasubramanian, Stanislav Volgushev, Murat Erdogdu

Keywords Paper

optimization, machine learning

0

0

0

0

10:21

06/12/2021

Non-convex Distributionally Robust Optimization: Non-asymptotic Analysis

Jikai Jin, Bohang Zhang, Haiyang Wang, Liwei Wang

Keywords Paper

optimization

0

0

0

0

14:05

06/12/2020

Task-Robust Model-Agnostic Meta-Learning

Liam Collins, Aryan Mokhtari, Sanjay Shakkottai

Keywords Paper

0

0

0

0

3:17

13/04/2021

On the generalization properties of adversarial training

Yue Xing, Qifan Song, Guang Cheng

Keywords Paper

0

0

0

0

3:05

18/07/2021

A Second look at Exponential and Cosine Step Sizes: Simplicity, Adaptivity, and Performance

Xiaoyu Li, Zhenxun Zhuang, Francesco Orabona

Keywords Paper

Optimization, Non-Convex Optimization

0

0

0

0

5:07

12/07/2020

Double Trouble in Double Descent: Bias and Variance(s) in the Lazy Regime

Stéphane d'Ascoli, Maria Refinetti, Giulio Biroli, Florent Krzakala

Keywords Paper

Deep Learning - Theory

0

0

0

0

15:11

12/07/2020

Extrapolation for Large-batch Training in Deep Learning

Tao LIN, Lingjing Kong, Sebastian Stich, Martin Jaggi

Keywords Paper

Deep Learning - Algorithms

0

0

0

0

13:21

06/12/2021

An Exact Characterization of the Generalization Error for the Gibbs Algorithm

Gholamali Aminian, Yuheng Bu, Laura Toni and
Miguel Rodrigues, Gregory Wornell

Keywords Paper

0

0

0

0

15:01

03/05/2021

Benefit of deep learning with non-convex noisy gradient descent: Provable excess risk bound and superiority to kernel methods

Taiji Suzuki, Akiyama Shunta

Keywords Paper

local Rademacher complexity, minimax optimal rate, Excess risk, linear estimator, kernel method, fast learning rate

0

0

0

0

10:13

04/08/2021

Adversarially Robust Low Dimensional Representations

Pranjal Awasthi, Vaggos Chatziafratis, Xue Chen, Aravindan Vijayaraghavan

Keywords Paper

0

0

0

0

20:19

04/08/2021

Benign Overfitting of Constant-Stepsize SGD for Linear Regression

Difan Zou, Jingfeng Wu, Vladimir Braverman and
Quanquan Gu, Sham Kakade

Keywords Paper

0

0

0

0

18:27

18/07/2021

Toward Better Generalization Bounds with Locally Elastic Stability

Zhun Deng, Hangfeng He, Weijie Su

Keywords Paper

Theory, Computational Learning Theory

0

0

0

0

4:59

06/12/2021

Risk Bounds for Over-parameterized Maximum Margin Classification on Sub-Gaussian Mixtures

Yuan Cao, Quanquan Gu, Mikhail Belkin

Keywords Paper

deep learning, machine learning

0

0

0

0

13:47

06/12/2021

High-probability Bounds for Non-Convex Stochastic Optimization with Heavy Tails

Ashok Cutkosky, Harsh Mehta

Keywords Paper

deep learning, optimization

0

0

0

0

20:14

06/12/2021

Rethinking and Reweighting the Univariate Losses for Multi-Label Ranking: Consistency and Generalization

Guoqiang Wu, Chongxuan LI, Kun Xu, Jun Zhu

Keywords Paper

theory, machine learning

0

0

0

0

9:29

18/07/2021

RATT: Leveraging Unlabeled Data to Guarantee Generalization

Saurabh Garg, Sivaraman Balakrishnan, Zico Kolter, Zachary Lipton

Keywords Paper

Probabilistic Methods, Graphical Models, Theory, Computational Complexity, Theory, Models of Learning and Generalization

0

0

0

1

17:27

06/12/2021

Localization, Convexity, and Star Aggregation

Suhas Vijaykumar

Keywords Paper

theory, online learning

0

0

0

0

13:24

06/12/2021

ReLU Regression with Massart Noise

Ilias Diakonikolas, Jong Ho Park, Christos Tzamos

Keywords Paper

0

0

0

0

11:59

18/07/2021

Self Normalizing Flows

T. Anderson Keller, Jorn Peters, Priyank Jaini and
Emiel Hoogeboom, Patrick Forré, Max Welling

Keywords Paper

Deep Learning, Generative Models

0

1

1

0

4:24

06/12/2021

Surrogate Regret Bounds for Polyhedral Losses

Rafael Frongillo, Bo Waggoner

Keywords Paper

machine learning

0

0

0

0

15:05

06/12/2020

Provably Efficient Reinforcement Learning with Kernel and Neural Function Approximations

Zhuoran Yang, Chi Jin, Zhaoran Wang and
Mengdi Wang, Michael Jordan

Keywords Paper

0

0

0

0

3:42

06/12/2021

Overparameterization Improves Robustness to Covariate Shift in High Dimensions

Nilesh Tripuraneni, Ben Adlam, Jeffrey Pennington

Keywords Paper

theory, deep learning, machine learning, robustness

0

0

0

0

15:11

06/12/2021

Leveraging Recursive Gumbel-Max Trick for Approximate Inference in Combinatorial Spaces

Kirill Struminsky, Artyom Gadetsky, Denis Rakitin and
Danil Karpushkin, Dmitry Vetrov

Keywords Paper

deep learning, optimization

0

0

0

0

14:26

02/02/2021

Stable Adversarial Learning under Distributional Shifts

Jiashuo Liu, Zheyan Shen, Peng Cui and
Linjun Zhou, Kun Kuang, Bo Li, Yishi Lin

Keywords Paper

0

0

0

0

14:30

18/07/2021

The Heavy-Tail Phenomenon in SGD

Mert Gurbuzbalaban, Umut Simsekli, Lingjiong Zhu

Keywords Paper

Optimization, Stochastic Optimization

0

0

0

0

5:37

04/08/2021

Efficient Bandit Convex Optimization: Beyond Linear Losses

Arun Sai Suggala, Pradeep Ravikumar, Praneeth Netrapalli

Keywords Paper

0

0

0

0

20:29

02/02/2021

Gradient Descent Averaging and Primal-dual Averaging for Strongly Convex Optimization

Wei Tao, Wei Li, Zhisong Pan, Qing Tao

Keywords Paper

0

0

0

0

15:10

26/04/2020

Unbiased Contrastive Divergence Algorithm for Training Energy-Based Latent Variable Models

Yixuan Qiu, Lingsong Zhang, Xiao Wang

Keywords Paper

energy model, restricted Boltzmann machine, contrastive divergence, unbiased Markov chain Monte Carlo, distribution coupling

0

0

0

0

4:34

09/07/2020

Estimating Principal Components under Adversarial Perturbations

Pranjal Awasthi, Xue Chen, Aravindan Vijayaraghavan

Keywords Paper

Unsupervised and semi-supervised learning, Adversarial learning and robustness

0

0

0

0

15:40

06/12/2021

Adversarial Regression with Doubly Non-negative Weighting Matrices

Tam Le, Truyen Nguyen, Makoto Yamada and
Jose Blanchet, Viet Anh Nguyen

Keywords Paper

machine learning

0

0

0

0

7:27

06/12/2021

Calibration and Consistency of Adversarial Surrogate Losses

Pranjal Awasthi, Natalie Frank, Anqi Mao and
Mehryar Mohri, Yutao Zhong

Keywords Paper

theory, optimization, machine learning, robustness, adversarial robustness and security

0

0

0

0

13:30

06/12/2021

Scaling Ensemble Distribution Distillation to Many Classes with Proxy Targets

Max Ryabinin, Andrey Malinin, Mark Gales

Keywords Paper

machine learning

0

0

0

0

12:36

06/12/2021

Bayesian decision-making under misspecified priors with applications to meta-learning

Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy and
Daniel Hsu, Thodoris Lykouris, Miro Dudik, Robert E Schapire

Keywords Paper

meta learning, bandits

0

0

0

0

14:58

06/12/2020

Understanding Double Descent Requires A Fine-Grained Bias-Variance Decomposition

Ben Adlam, Jeffrey Pennington

Keywords Paper

0

0

0

0

3:30

06/12/2020

Large-Scale Methods for Distributionally Robust Optimization

Daniel Levy, Yair Carmon, John Duchi, Aaron Sidford

Keywords Paper

0

0

0

0

3:11