On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems

Abstract: In this paper, we analyze the trajectories of stochastic gradient descent (SGD) with the aim of understanding their convergence properties in non-convex problems. We first show that the sequence of iterates generated by SGD remains bounded and converges with probability $1$ under a very broad range of step-size schedules. Subsequently, we prove that the algorithm's rate of convergence to local minimizers with a positive-definite Hessian is $O(1/n^p)$ if the method is run with a $Θ(1/n^p)$ step-size. This provides an important guideline for tuning the algorithm's step-size as it suggests that a cool-down phase with a vanishing step-size could lead to significant performance gains; we demonstrate this heuristic using ResNet architectures on CIFAR. Finally, going beyond existing positive probability guarantees, we show that SGD avoids strict saddle points/manifolds with probability $1$ for the entire spectrum of step-size policies considered.

04/08/2021

On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems

Panayotis Mertikopoulos, Nadav Hallak, Ali Kavis, Volkan Cevher

Comments

Similar Papers

Convergence rates and approximation results for SGD and its continuous-time counterpart

Xavier Fontaine, Valentin De Bortoli, Alain Durmus

Keywords Abstract Paper

A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms

Philip Amortila, Doina Precup, Prakash Panangaden, Marc G. Bellemare

Keywords Abstract Paper

Fast Stochastic Bregman Gradient Methods: Sharp Analysis and Variance Reduction

Radu Alexandru Dragomir, Mathieu Even, Hadrien Hendrikx

Keywords Abstract Paper

Optimization, Convex Optimization

A study of condition numbers for first-order optimization

Charles Guille-Escuret, Manuela Girotti, Baptiste Goujaud, Ioannis Mitliagkas

Keywords Abstract Paper

On the (asymptotic) convergence of Stochastic Gradient Descent and Stochastic Heavy Ball

Othmane Sebbouh, Robert M Gower, Aaron Defazio

Keywords Abstract Paper

Fine-Grained Analysis of Stability and Generalization for Stochastic Gradient Descent

Yunwen Lei, Yiming Ying

Keywords Abstract Paper

Learning Theory

Revisiting Frank-Wolfe for Polytopes: Strict Complementarity and Sparsity

Dan Garber

Keywords Abstract Paper

SGD for structured nonconvex functions: Learning rates, minibatching and interpolation

Robert Gower, Othmane Sebbouh, Nicolas Loizou

Keywords Abstract Paper

One Sample Stochastic Frank-Wolfe

Mingrui Zhang, Zebang Shen, Aryan Mokhtari and Hamed Hassani, Amin Karbasi

Keywords Abstract Paper

On the Verification of Neural ODEs with Stochastic Guarantees

Sophie Grunbacher, Ramin Hasani, Mathias Lechner and Jacek Cyranka, Scott A. Smolka, Radu Grosu

Keywords Abstract Paper

Explicit regularization of stochastic gradient methods through duality

Anant Raj, Francis Bach

Keywords Abstract Paper

On Convergence of Gradient Expected Sarsa(λ)

Long Yang, Gang Zheng, Yu Zhang and Qian Zheng, Pengfei Li, Gang Pan

Keywords Abstract Paper

Gradient Descent Maximizes the Margin of Homogeneous Neural Networks

Kaifeng Lyu, Jian Li

Keywords Abstract Paper

margin, homogeneous, gradient descent

Robustness Analysis of Non-Convex Stochastic Gradient Descent using Biased Expectations

Kevin Scaman, Cedric Malherbe

Keywords Abstract Paper

Sinkhorn Barycenter via Functional Gradient Descent

Zebang Shen, Zhenfu Wang, Alejandro Ribeiro, Hamed Hassani

Keywords Abstract Paper

Langevin Monte Carlo without smoothness

Niladri Chatterji, Jelena Diakonikolas, Michael Jordan, Peter Bartlett

Keywords Abstract Paper

Asymptotic Guarantees for Generative Modeling Based on the Smooth Wasserstein Distance

Ziv Goldfeld, Kristjan Greenewald, Kengo Kato

Keywords Abstract Paper

CSER: Communication-efficient SGD with Error Reset

Cong Xie, Shuai Zheng, Sanmi Koyejo and Indranil Gupta, Mu Li, Haibin Lin

Keywords Abstract Paper

Linear Convergence of Adaptive Stochastic Gradient Descent

Yuege Xie, Xiaoxia Wu, Rachel Ward

Keywords Abstract Paper

Tensor Programs IV: Feature Learning in Infinite-Width Neural Networks

Greg Yang, Edward Hu

Keywords Abstract Paper

Theory, Deep learning Theory

Last iterate convergence of SGD for Least-Squares in the Interpolation regime.

Aditya Vardhan Varre, Loucas Pillaud-Vivien, Nicolas Flammarion

Keywords Abstract Paper

deep learning, optimization

STORM+: Fully Adaptive SGD with Recursive Momentum for Nonconvex Optimization

Kfir Levy, Ali Kavis, Volkan Cevher

Keywords Abstract Paper

optimization

Spatio-Temporal Variational Gaussian Processes

Oliver Hamelijnck, William Wilkinson, Niki Loppi and Arno Solin, Theodoros Damoulas

Keywords Abstract Paper

generative model, kernel methods

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Mingrui Zhang, Zebang Shen, Aryan Mokhtari and
Hamed Hassani, Amin Karbasi

Keywords Paper

Sophie Grunbacher, Ramin Hasani, Mathias Lechner and
Jacek Cyranka, Scott A. Smolka, Radu Grosu

Keywords Paper

Keywords Paper

Long Yang, Gang Zheng, Yu Zhang and
Qian Zheng, Pengfei Li, Gang Pan

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Cong Xie, Shuai Zheng, Sanmi Koyejo and
Indranil Gupta, Mu Li, Haibin Lin

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Oliver Hamelijnck, William Wilkinson, Niki Loppi and
Arno Solin, Theodoros Damoulas

Keywords Paper

Keywords Paper

Keywords Paper

Xiaoxia (Shirley) Wu, Edgar Dobriban, Tongzheng Ren and
Shanshan Wu, Zhiyuan Li, Suriya Gunasekar, Rachel Ward, Qiang Liu

Keywords Paper

Jiang Qian, Yuren Wu, Bojin Zhuang and
Shaojun Wang, Jing Xiao

Keywords Paper

Si Yi Meng, Sharan Vaswani, Issam Hadj Laradji and
Mark Schmidt, Simon Lacoste-Julien

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Alessandro Rinaldo, Daren Wang, Qin Wen and
Rebecca Willett, Yi Yu

Keywords Paper

Keywords Paper

Keywords Paper

MINGZHI DONG, Xiaochen Yang, Rui Zhu and
Yujiang Wang, Jing-Hao Xue

Keywords Paper