Leveraging Non-uniformity in First-order Non-convex Optimization

Abstract: Classical global convergence results for first-order methods rely on uniform smoothness and the \L{}ojasiewicz inequality. Motivated by properties of objective functions that arise in machine learning, we propose a non-uniform refinement of these notions, leading to \emph{Non-uniform Smoothness} (NS) and \emph{Non-uniform \L{}ojasiewicz inequality} (N\L{}). The new definitions inspire new geometry-aware first-order methods that are able to converge to global optimality faster than the classical $\Omega(1/t^2)$ lower bounds. To illustrate the power of these geometry-aware methods and their corresponding non-uniform analysis, we consider two important problems in machine learning: policy gradient optimization in reinforcement learning (PG), and generalized linear model training in supervised learning (GLM). For PG, we find that normalizing the gradient ascent method can accelerate convergence to $O(e^{- c \cdot t})$ (where $c > 0$) while incurring less overhead than existing algorithms. For GLM, we show that geometry-aware normalized gradient descent can also achieve a linear convergence rate, which significantly improves the best known results. We additionally show that the proposed geometry-aware gradient descent methods escape landscape plateaus faster than standard gradient descent. Experimental results are used to illustrate and complement the theoretical findings.

12/07/2020

Leveraging Non-uniformity in First-order Non-convex Optimization

Jincheng Mei, Yue Gao, Bo Dai, Csaba Szepesvari, Dale Schuurmans

Comments

Similar Papers

Accelerated Message Passing for Entropy-Regularized MAP Inference

Jonathan Lee, Aldo Pacchiano, Peter Bartlett, Michael Jordan

Keywords Abstract Paper

Probabilistic Inference - Models and Probabilistic Programming

A simpler approach to accelerated optimization: iterative averaging meets optimism

Pooria Joulani, Anant Raj, András György, Csaba Szepesvari

Keywords Abstract Paper

Online Learning, Active Learning, and Bandits

Geometric Insights into the Convergence of Nonlinear TD Learning

David Brandfonbrener, Joan Bruna

Keywords Abstract Paper

TD, nonlinear, convergence, value estimation, reinforcement learning

Rectangular Flows for Manifold Learning

Anthony Caterini, Gabriel Loaiza-Ganem, Geoff Pleiss, John Cunningham

Keywords Abstract Paper

deep learning, optimization, generative model

An Online Riemannian PCA for Stochastic Canonical Correlation Analysis

Zihang Meng, Rudrasis Chakraborty, Vikas Singh

Keywords Abstract Paper

optimization, fairness

Simple Stochastic and Online Gradient Descent Algorithms for Pairwise Learning

ZHENHUAN YANG, Yunwen Lei, Puyu Wang and Tianbao Yang, Yiming Ying

Keywords Abstract Paper

optimization, machine learning, privacy

Convergence rates and approximation results for SGD and its continuous-time counterpart

Xavier Fontaine, Valentin De Bortoli, Alain Durmus

Keywords Abstract Paper

PAGE: A Simple and Optimal Probabilistic Gradient Estimator for Nonconvex Optimization

Zhize Li, Hongyan Bao, Xiangliang Zhang, Peter Richtarik

Keywords Abstract Paper

Optimization

Improving the Sample and Communication Complexity for Decentralized Non-Convex Optimization: Joint Gradient Estimation and Tracking

Haoran Sun, Songtao Lu, Mingyi Hong

Keywords Abstract Paper

Optimization - Non-convex

Non-convex Distributionally Robust Optimization: Non-asymptotic Analysis

Jikai Jin, Bohang Zhang, Haiyang Wang, Liwei Wang

Keywords Abstract Paper

optimization

Generalization Guarantee of SGD for Pairwise Learning

Yunwen Lei, Mingrui Liu, Yiming Ying

Keywords Abstract Paper

optimization, machine learning

One Sample Stochastic Frank-Wolfe

Mingrui Zhang, Zebang Shen, Aryan Mokhtari and Hamed Hassani, Amin Karbasi

Keywords Abstract Paper

Understanding Over-parameterization in Generative Adversarial Networks

Yogesh Balaji, Mohammadmahdi Sajedi, Neha Kalibhat and Mucong Ding, Dominik Stöger, Mahdi Soltanolkotabi, Soheil Feizi

Keywords Abstract Paper

min-max optimization, Over-parameterization, GAN

Explicit regularization of stochastic gradient methods through duality

Anant Raj, Francis Bach

Keywords Abstract Paper

Kernelized Wasserstein Natural Gradient

M Arbel, A Gretton, W Li, G Montufar

Keywords Abstract Paper

kernel methods, natural gradient, information geometry, Wasserstein metric

Sharper Generalization Bounds for Learning with Gradient-dominated Objective Functions

Yunwen Lei, Yiming Ying

Keywords Abstract Paper

generalization bounds, non-convex learning

Hamiltonian Monte Carlo using an adjoint-differentiated Laplace approximation: Bayesian inference for latent Gaussian models and beyond

Charles Margossian, Aki Vehtari, Daniel Simpson, Raj Agrawal

Keywords Abstract Paper

Communication-Efficient Distributed Optimization with Quantized Preconditioners

Foivos Alimisis, Peter Davies, Dan Alistarh

Keywords Abstract Paper

Optimization, Distributed and Parallel Optimization

Improved Penalty Method via Doubly Stochastic Gradients for Bilevel Hyperparameter Optimization

Wanli Shi, Bin Gu

Keywords Abstract Paper

An Empirical Process Approach to the Union Bound: Practical Algorithms for Combinatorial and Linear Bandits

Julian Katz-Samuels, Lalit Jain, zohar karnin, Kevin Jamieson

Keywords Abstract Paper

Improved Analysis of Clipping Algorithms for Non-convex Optimization

Bohang Zhang, Jikai Jin, Cong Fang, Liwei Wang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

ZHENHUAN YANG, Yunwen Lei, Puyu Wang and
Tianbao Yang, Yiming Ying

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Mingrui Zhang, Zebang Shen, Aryan Mokhtari and
Hamed Hassani, Amin Karbasi

Keywords Paper

Yogesh Balaji, Mohammadmahdi Sajedi, Neha Kalibhat and
Mucong Ding, Dominik Stöger, Mahdi Soltanolkotabi, Soheil Feizi

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

MINGZHI DONG, Xiaochen Yang, Rui Zhu and
Yujiang Wang, Jing-Hao Xue

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Jincheng Mei, Chenjun Xiao, Bo Dai and
Lihong Li, Csaba Szepesvari, Dale Schuurmans

Keywords Paper

Keywords Paper