Stochastic polyak step-size for SGD: An adaptive learning rate for fast convergence

13/04/2021

Stochastic polyak step-size for SGD: An adaptive learning rate for fast convergence

Nicolas Loizou, Sharan Vaswani, Issam Hadj Laradji, Simon Lacoste-Julien

Keywords:

Abstract Paper Similar Papers

Abstract: We propose a stochastic variant of the classical Polyak step-size (Polyak, 1987) commonly used in the subgradient method. Although computing the Polyak step-size requires knowledge of the optimal function values, this information is readily available for typical modern machine learning applications. Consequently, the proposed stochastic Polyak step-size (SPS) is an attractive choice for setting the learning rate for stochastic gradient descent (SGD). We provide theoretical convergence guarantees for SGD equipped with SPS in different settings, including strongly convex, convex and non-convex functions. Furthermore, our analysis results in novel convergence guarantees for SGD with a constant step-size. We show that SPS is particularly effective when training over-parameterized models capable of interpolating the training data. In this setting, we prove that SPS enables SGD to converge to the true solution at a fast rate without requiring the knowledge of any problem-dependent constants or additional computational overhead. We experimentally validate our theoretical results via extensive experiments on synthetic and real datasets. We demonstrate the strong performance of SGD with SPS compared to state-of-the-art optimization methods when training over-parameterized models.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at AISTATS 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2021

Fast Training Method for Stochastic Compositional Optimization Problems

Hongchang Gao, Heng Huang

Keywords Paper

optimization, machine learning, meta learning

0

0

0

0

14:00

06/12/2020

Bayesian Attention Modules

Xinjie Fan, Shujian Zhang, Bo Chen, Mingyuan Zhou

Keywords Paper

0

0

0

0

3:32

26/04/2020

Unbiased Contrastive Divergence Algorithm for Training Energy-Based Latent Variable Models

Yixuan Qiu, Lingsong Zhang, Xiao Wang

Keywords Paper

energy model, restricted Boltzmann machine, contrastive divergence, unbiased Markov chain Monte Carlo, distribution coupling

0

0

0

0

4:34

12/07/2020

Improving Transformer Optimization Through Better Initialization

Xiao Shi Huang, Felipe Perez, Jimmy Ba, Maksims Volkovs

Keywords Paper

Sequential, Network, and Time-Series Modeling

0

0

0

0

14:52

02/02/2021

GLISTER: Generalization based Data Subset Selection for Efficient and Robust Learning

Krishnateja Killamsetty, Durga Sivasubramanian, Ganesh Ramakrishnan, Rishabh Iyer

Keywords Paper

0

0

0

0

19:14

18/07/2021

A Second look at Exponential and Cosine Step Sizes: Simplicity, Adaptivity, and Performance

Xiaoyu Li, Zhenxun Zhuang, Francesco Orabona

Keywords Paper

Optimization, Non-Convex Optimization

0

0

0

0

5:07

26/08/2020

One Sample Stochastic Frank-Wolfe

Mingrui Zhang, Zebang Shen, Aryan Mokhtari and
Hamed Hassani, Amin Karbasi

Keywords Paper

0

0

0

0

6:05

03/05/2021

Dataset Meta-Learning from Kernel Ridge-Regression

Timothy Nguyen, Zhourong Chen, Jaehoon Lee

Keywords Paper

dataset corruption, infinite-width networks, neural kernels, kernel-ridge regression, dataset compression, dataset distillation, meta-learning

0

0

0

0

4:59

13/04/2021

Differentiating the value function by using convex duality

Sheheryar Mehmood, Peter Ochs

Keywords Paper

0

0

0

0

2:55

06/12/2021

Differentiable Optimization of Generalized Nondecomposable Functions using Linear Programs

Zihang Meng, Lopamudra Mukherjee, Yichao Wu and
Vikas Singh, Sathya Narayanan Ravi

Keywords Paper

deep learning, optimization

0

0

0

0

13:21

06/12/2020

Parabolic Approximation Line Search for DNNs

Maximus Mutschler, Andreas Zell

Keywords Paper

0

0

0

0

3:19

06/12/2020

Task-Robust Model-Agnostic Meta-Learning

Liam Collins, Aryan Mokhtari, Sanjay Shakkottai

Keywords Paper

0

0

0

0

3:17

12/07/2020

Empirical Study of the Benefits of Overparameterization in Learning Latent Variable Models

Rares-Darius Buhai, Yoni Halpern, Yoon Kim and
Andrej Risteski, David Sontag

Keywords Paper

Probabilistic Inference - Models and Probabilistic Programming

0

0

0

0

15:04

06/12/2021

Meta-learning to Improve Pre-training

Aniruddh Raghu, Jonathan Lorraine, Simon Kornblith and
Matthew McDermott, David Duvenaud

Keywords Paper

deep learning, optimization, graph learning, meta learning

0

0

0

0

12:57

02/02/2021

Frugal Optimization for Cost-related Hyperparameters

Qingyun Wu, Chi Wang, Silu Huang

Keywords Paper

0

0

0

0

16:07

18/07/2021

Training Data Subset Selection for Regression with Controlled Generalization Error

Durga S, Rishabh Iyer, Ganesh Ramakrishnan, Abir De

Keywords Paper

, Algorithms, Online Learning, Algorithms, Supervised Learning

0

0

0

0

4:15

06/12/2021

A Faster Decentralized Algorithm for Nonconvex Minimax Problems

Wenhan Xian, Feihu Huang, Yanfu Zhang, Heng Huang

Keywords Paper

optimization, machine learning, adversarial robustness and security

0

0

0

0

13:59

15/06/2020

Learning fast and precise numerical analysis

Jingxuan He, Gagandeep Singh, Markus Püschel, Martin Vechev

Keywords Paper

Abstract interpretation, Performance optimization, Machine learning, Numerical domains

0

0

0

0

14:20

12/07/2020

Extrapolation for Large-batch Training in Deep Learning

Tao LIN, Lingjing Kong, Sebastian Stich, Martin Jaggi

Keywords Paper

Deep Learning - Algorithms

0

0

0

0

13:21

19/08/2021

Fine-grained Generalization Analysis of Structured Output Prediction

Waleed Mustafa, Yunwen Lei, Antoine Ledent, Marius Kloft

Keywords Paper

Machine Learning, Learning Theory, Structured Prediction

0

0

0

0

15:46

12/07/2020

Overfitting in adversarially robust deep learning

Eric Wong, Leslie Rice, Zico Kolter

Keywords Paper

Adversarial Examples

0

0

0

0

14:44

18/07/2021

Sparsifying Networks via Subdifferential Inclusion

Sagar Verma, Jean-Christophe Pesquet

Keywords Paper

Optimization, Convex Optimization

0

0

0

0

5:10

13/04/2021

Critical parameters for scalable distributed learning with large batches and asynchronous updates

Sebastian Stich, Amirkeivan Mohtashami, Martin Jaggi

Keywords Paper

0

0

0

0

3:00

03/05/2021

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu and
Dehao Chen, Orhan Firat, Yanping Huang, Maxim Krikun, Noam Shazeer, Zhifeng Chen

Keywords Paper

0

0

0

0

5:07

06/12/2020

Efficient Learning of Generative Models via Finite-Difference Score Matching

Tianyu Pang, Kun Xu, Chongxuan LI and
Yang Song, Stefano Ermon, Jun Zhu

Keywords Paper

0

0

0

0

2:59

05/01/2021

Learning Data Augmentation With Online Bilevel Optimization for Image Classification

Saypraseuth Mounsaveng, Issam Laradji, Ismail Ben Ayed and
David Vazquez, Marco Pedersoli

Keywords Paper

0

0

0

0

4:36

12/07/2020

Obtaining Adjustable Regularization for Free via Iterate Averaging

Jingfeng Wu, Vladimir Braverman, Lin Yang

Keywords Paper

Optimization - General

0

0

0

0

12:07

04/08/2021

Adversarially Robust Low Dimensional Representations

Pranjal Awasthi, Vaggos Chatziafratis, Xue Chen, Aravindan Vijayaraghavan

Keywords Paper

0

0

0

0

20:19

06/12/2020

A Contour Stochastic Gradient Langevin Dynamics Algorithm for Simulations of Multi-modal Distributions

Wei Deng, Guang Lin, Faming Liang

Keywords Paper

0

0

0

0

3:26

06/12/2020

Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search

Houwen Peng, Hao Du, Hongyuan Yu and
QI LI, Jing Liao, Jianlong Fu

Keywords Paper

0

0

0

0

3:12

06/12/2020

Counterexample-Guided Learning of Monotonic Neural Networks

Aishwarya Sivaraman, Golnoosh Farnadi, Todd Millstein, Guy Van den Broeck

Keywords Paper

0

0

0

0

3:22

14/09/2020

Orthant Based Proximal Stochastic Gradient Method for l1-Regularized Optimization

Tianyi Chen, Tianyu Ding, Bo Ji and
Guanyi Wang, Yixin Shi, Jing Tian, Sheng Yi, Xiao Tu, Zhihui Zhu

Keywords Paper

stochastic learning, sparsity, orthant prediction

0

0

0

0

15:18

06/12/2020

Understanding Approximate Fisher Information for Fast Convergence of Natural Gradient Descent in Wide Neural Networks

Ryo Karakida, Kazuki Osawa

Keywords Paper

0

0

0

0

3:19

02/02/2021

Asynchronous Optimization Methods for Efficient Training of Deep Neural Networks with Guarantees

Vyacheslav Kungurtsev, Malcolm Egan, Bapi Chatterjee, Dan Alistarh

Keywords Paper

0

0

0

0

19:56

12/07/2020

Multi-Task Learning with User Preferences: Gradient Descent with Controlled Ascent in Pareto Optimization

Debabrata Mahapatra, Vaibhav Rajan

Keywords Paper

Transfer, Multitask and Meta-learning

0

0

0

0

15:35

18/07/2021

Scalable Normalizing Flows for Permutation Invariant Densities

Marin Biloš, Stephan Günnemann

Keywords Paper

Deep Learning, Generative Models

0

0

0

0

5:10

05/04/2021

Pipelined Backpropagation at Scale: Training Large Models without Batches

Atli Kosson, Vitaliy Chiley, Abhi Venigalla and
Joel Hestness, Urs Koster

Keywords Paper

0

0

0

0

4:14

05/04/2021

Pipelined Backpropagation at Scale: Training Large Models without Batches

Atli Kosson, Vitaliy Chiley, Abhi Venigalla and
Joel Hestness, Urs Koster

Keywords Paper

0

0

0

0

18:00

06/12/2020

Revisiting the Sample Complexity of Sparse Spectrum Approximation of Gaussian Processes

Minh Hoang, Nghia Hoang, Hai Pham, David Woodruff

Keywords Paper

, Deep Learning

0

0

0

0

3:25

12/07/2020

Training Neural Networks for and by Interpolation

Leonard Berrada, M. Pawan Kumar, Andrew Zisserman

Keywords Paper

Deep Learning - General

0

0

0

0

16:12