On the (asymptotic) convergence of Stochastic Gradient Descent and Stochastic Heavy Ball

Abstract: We study stochastic gradient descent (SGD) and the stochastic heavy ball method (SHB, otherwise known as the momentum method) for the general stochastic approximation problem. For SGD, in the convex and smooth setting, we provide the first \emph{almost sure} asymptotic convergence \emph{rates} for a weighted average of the iterates . More precisely, we show that the convergence rate of the function values is arbitrarily close to $o(1/\sqrt{k})$, and is exactly $o(1/k)$ in the so-called overparametrized case. We show that these results still hold when using a decreasing step size version of stochastic line search and stochastic Polyak stepsizes, thereby giving the first proof of convergence of these methods in the non-overparametrized regime. Using a substantially different analysis, we show that these rates hold for SHB as well, but at the last iterate. This distinction is important because it is the last iterate of SGD and SHB which is used in practice. We also show that the last iterate of SHB converges to a minimizer \emph{almost surely}. Additionally, we prove that the function values of the deterministic HB converge at a $o(1/k)$ rate, which is faster than the previously known $O(1/k)$. Finally, in the nonconvex setting, we prove similar rates on the lowest gradient norm along the trajectory of SGD.

13/04/2021

On the (asymptotic) convergence of Stochastic Gradient Descent and Stochastic Heavy Ball

Othmane Sebbouh, Robert M Gower, Aaron Defazio

Comments

Similar Papers

SGD for structured nonconvex functions: Learning rates, minibatching and interpolation

Robert Gower, Othmane Sebbouh, Nicolas Loizou

Keywords Abstract Paper

Convergence Rates of Gradient Descent and MM Algorithms for Bradley-Terry Models

Milan Vojnovic, Se-Young Yun, Kaifang Zhou

Keywords Abstract Paper

Robustness Analysis of Non-Convex Stochastic Gradient Descent using Biased Expectations

Kevin Scaman, Cedric Malherbe

Keywords Abstract Paper

On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems

Panayotis Mertikopoulos, Nadav Hallak, Ali Kavis, Volkan Cevher

Keywords Abstract Paper

Last Iterate is Slower than Averaged Iterate in Smooth Convex-Concave Saddle Point Problems

Noah Golowich, Sarath Pattathil, Constantinos Daskalakis, Asuman Ozdaglar

Keywords Abstract Paper

Convex optimization, Economics, game theory, and incentives, Non-convex optimization

The Role of Momentum Parameters in the Optimal Convergence of Adaptive Polyak's Heavy-ball Methods

Wei Tao, sheng long, Gaowei Wu, Qing Tao

Keywords Abstract Paper

optimal convergence, convex optimization, momentum methods, Deep learning, adaptive heavy-ball methods

High-probability Bounds for Non-Convex Stochastic Optimization with Heavy Tails

Ashok Cutkosky, Harsh Mehta

Keywords Abstract Paper

deep learning, optimization

Sparse Convex Optimization via Adaptively Regularized Hard Thresholding

Kyriakos Axiotis, Maxim Sviridenko

Keywords Abstract Paper

A Comprehensively Tight Analysis of Gradient Descent for PCA

Zhiqiang Xu, Ping Li

Keywords Abstract Paper

Fast Stochastic Bregman Gradient Methods: Sharp Analysis and Variance Reduction

Radu Alexandru Dragomir, Mathieu Even, Hadrien Hendrikx

Keywords Abstract Paper

Optimization, Convex Optimization

Sinkhorn Barycenter via Functional Gradient Descent

Zebang Shen, Zhenfu Wang, Alejandro Ribeiro, Hamed Hassani

Keywords Abstract Paper

Near-Optimal Methods for Minimizing Star-Convex Functions and Beyond

Oliver Hinder, Aaron Sidford, Nimit S Sohoni

Keywords Abstract Paper

Non-convex optimization,

A first-order primal-dual method with adaptivity to local smoothness

Maria-Luiza Vladarean, Yura Malitsky, Volkan Cevher

Keywords Abstract Paper

Low-Rank Extragradient Method for Nonsmooth and Low-Rank Matrix Optimization Problems

Atara Kaplan, Dan Garber

Keywords Abstract Paper

optimization, machine learning

The Wasserstein Proximal Gradient Algorithm

Adil Salim, Anna Korba, Giulia Luise

Keywords Abstract Paper

Stochastic Optimization for Non-convex Inf-Projection Problems

Yan Yan, Yi Xu, Lijun Zhang and Wang Xiaoyu, Tianbao Yang

Keywords Abstract Paper

Explicit regularization of stochastic gradient methods through duality

Anant Raj, Francis Bach

Keywords Abstract Paper

Averaging on the Bures-Wasserstein manifold: dimension-free convergence of gradient descent

Jason Altschuler, Sinho Chewi, Patrik R Gerber, Austin Stromme

Keywords Abstract Paper

optimization, optimal transport

KALE Flow: A Relaxed KL Gradient Flow for Probabilities with Disjoint Support

Pierre Glaser, Michael Arbel, Arthur Gretton

Keywords Abstract Paper

generative model, kernel methods, optimal transport

Adaptive Gradient Descent without Descent

Konstantin Mishchenko, Yura Malitsky

Keywords Abstract Paper

On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and Non-Asymptotic Concentration

Wenlong Mou, Chris Junchi Li, Martin Wainwright and Peter Bartlett, Michael Jordan

Keywords Abstract Paper

Stochastic optimization, Concentration inequalities, Convex optimization, Reinforcement learning

How Good is SGD with Random Shuffling?

Itay M Safran, Ohad Shamir

Keywords Abstract Paper

Convex optimization,

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Yan Yan, Yi Xu, Lijun Zhang and
Wang Xiaoyu, Tianbao Yang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Wenlong Mou, Chris Junchi Li, Martin Wainwright and
Peter Bartlett, Michael Jordan

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Nicolas Loizou, Hugo Berard, Gauthier Gidel and
Ioannis Mitliagkas, Simon Lacoste-Julien

Keywords Paper

Keywords Paper

Prashant Khanduri, Siliang Zeng, Mingyi Hong and
Hoi-To Wai, Zhaoran Wang, Zhuoran Yang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Long Yang, Gang Zheng, Yu Zhang and
Qian Zheng, Pengfei Li, Gang Pan

Keywords Paper

Keywords Paper

Keywords Paper