RMSprop converges with proper hyper-parameter

Abstract: Despite the existence of divergence examples, RMSprop remains one of the most popular algorithms in machine learning. Towards closing the gap between theory and practice, we prove that RMSprop converges with proper choice of hyper-parameters under certain conditions. More specifically, we prove that when the hyper-parameter $\beta_2$ is close enough to $1$, RMSprop and its random shuffling version converge to a bounded region in general, and to critical points in the interpolation regime. It is worth mentioning that our results do not depend on ``bounded gradient" assumption, which is often the key assumption utilized by existing theoretical work for Adam-type adaptive gradient method. Removing this assumption allows us to establish a phase transition from divergence to non-divergence for RMSprop. Finally, based on our theory, we conjecture that in practice there is a critical threshold $\sf{\beta_2^*}$, such that RMSprop generates reasonably good results only if $1>\beta_2\ge \sf{\beta_2^*}$. We provide empirical evidence for such a phase transition in our numerical experiments.

26/08/2020

energy model, restricted Boltzmann machine, contrastive divergence, unbiased Markov chain Monte Carlo, distribution coupling

4:34

06/12/2021

RMSprop converges with proper hyper-parameter

Naichen Shi, Dawei Li, Mingyi Hong, Ruoyu Sun

Comments

Similar Papers

One Sample Stochastic Frank-Wolfe

Mingrui Zhang, Zebang Shen, Aryan Mokhtari and Hamed Hassani, Amin Karbasi

Keywords Abstract Paper

Task-Robust Model-Agnostic Meta-Learning

Liam Collins, Aryan Mokhtari, Sanjay Shakkottai

Keywords Abstract Paper

Convergence of Meta-Learning with Task-Specific Adaptation over Partial Parameters

Kaiyi Ji, Jason Lee, Yingbin Liang, H. Vincent Poor

Keywords Abstract Paper

Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack

Francesco Croce, Matthias Hein

Keywords Abstract Paper

Memory and Computation-Efficient Kernel SVM via Binary Embedding and Ternary Model Coefficients

Zijian Lei, Liang Lan

Keywords Abstract Paper

Fast Approximation of the Sliced-Wasserstein Distance Using Concentration of Random Projections

Kimia Nadjahi, Alain Durmus, Pierre E Jacob and Roland Badeau, Umut Simsekli

Keywords Abstract Paper

machine learning, generative model, optimal transport

Group testing and local search: is there a computational-statistical gap?

Fotis Iliopoulos, Ilias Zadik

Keywords Abstract Paper

Adversarially Robust Low Dimensional Representations

Pranjal Awasthi, Vaggos Chatziafratis, Xue Chen, Aravindan Vijayaraghavan

Keywords Abstract Paper

Fast Deterministic CUR Matrix Decomposition with Accuracy Assurance

Yasutoshi Ida, Sekitoshi Kanai, Yasuhiro Fujiwara and Tomoharu Iwata, Koh Takeuchi, Hisashi Kashima

Keywords Abstract Paper

Unbiased Contrastive Divergence Algorithm for Training Energy-Based Latent Variable Models

Yixuan Qiu, Lingsong Zhang, Xiao Wang

Keywords Abstract Paper

energy model, restricted Boltzmann machine, contrastive divergence, unbiased Markov chain Monte Carlo, distribution coupling

Consistent Estimation for PCA and Sparse Regression with Oblivious Outliers

Tommaso d'Orsi, Chih-Hung Liu, Rajai Nasser and Gleb Novikov, David Steurer, Stefan Tiegel

Keywords Abstract Paper

Whitening and Second Order Optimization Both Make Information in the Dataset Unusable During Training, and Can Reduce or Prevent Generalization

Neha Wadia, Daniel Duckworth, Samuel Schoenholz and Ethan Dyer, Jascha Sohl-Dickstein

Keywords Abstract Paper

Optimization, Probabilistic Methods, Topic Models, Probabilistic Methods, Latent Variable Models

Boosting Frank-Wolfe by Chasing Gradients

Cyrille Combettes, Sebastian Pokutta

Keywords Abstract Paper

PAGE: A Simple and Optimal Probabilistic Gradient Estimator for Nonconvex Optimization

Zhize Li, Hongyan Bao, Xiangliang Zhang, Peter Richtarik

Keywords Abstract Paper

Bayesian Optimization of Function Networks

Raul Astudillo, Peter Frazier

Keywords Abstract Paper

optimization, reinforcement learning and planning, kernel methods

An Improved Analysis of Stochastic Gradient Descent with Momentum

Yanli Liu, Yuan Gao, Wotao Yin

Keywords Abstract Paper

Iteratively Reweighted Least Squares for Basis Pursuit with Global Linear Convergence Rate

Christian Kümmerle, Claudio Mayrink Verdun, Dominik Stöger

Keywords Abstract Paper

theory, optimization, machine learning

SAdam: A Variant of Adam for Strongly Convex Functions

Guanghui Wang, Shiyin Lu, Quan Cheng and Wei-wei Tu, Lijun Zhang

Keywords Abstract Paper

Online convex optimization, Adaptive online learning, Adam

Stability and Generalization of Bilevel Programming in Hyperparameter Optimization

Fan Bao, Guoqiang Wu, Chongxuan LI and Jun Zhu, Bo Zhang

Keywords Abstract Paper

Low-Rank Extragradient Method for Nonsmooth and Low-Rank Matrix Optimization Problems

Atara Kaplan, Dan Garber

Keywords Abstract Paper

optimization, machine learning

Sharper Generalization Bounds for Learning with Gradient-dominated Objective Functions

Yunwen Lei, Yiming Ying

Keywords Abstract Paper

generalization bounds, non-convex learning

Implicit Finite-Horizon Approximation and Efficient Optimal Algorithms for Stochastic Shortest Path

Liyu Chen, Mehdi Jafarnia-Jahromi, Rahul Jain, Haipeng Luo

Keywords Abstract Paper

Convex Representation Learning for Generalized Invariance in Semi-Inner-Product Space

Yingyi Ma, Vignesh Ganapathiraman, Yaoliang Yu, Xinhua Zhang

Mingrui Zhang, Zebang Shen, Aryan Mokhtari and
Hamed Hassani, Amin Karbasi

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Kimia Nadjahi, Alain Durmus, Pierre E Jacob and
Roland Badeau, Umut Simsekli

Keywords Paper

Keywords Paper

Keywords Paper

Yasutoshi Ida, Sekitoshi Kanai, Yasuhiro Fujiwara and
Tomoharu Iwata, Koh Takeuchi, Hisashi Kashima

Keywords Paper

Keywords Paper

Tommaso d'Orsi, Chih-Hung Liu, Rajai Nasser and
Gleb Novikov, David Steurer, Stefan Tiegel

Keywords Paper

Neha Wadia, Daniel Duckworth, Samuel Schoenholz and
Ethan Dyer, Jascha Sohl-Dickstein

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Guanghui Wang, Shiyin Lu, Quan Cheng and
Wei-wei Tu, Lijun Zhang

Keywords Paper

Fan Bao, Guoqiang Wu, Chongxuan LI and
Jun Zhu, Bo Zhang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Sean Sinclair, Tianyu Wang, Gauri Jain and
Sid Banerjee, Christina Yu

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Tianyu Pang, Kun Xu, Chongxuan LI and
Yang Song, Stefano Ermon, Jun Zhu

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Zhuoran Yang, Chi Jin, Zhaoran Wang and
Mengdi Wang, Michael Jordan

Keywords Paper