One Sample Stochastic Frank-Wolfe

Abstract: One of the beauties of the projected gradient descent method lies in its rather simple mechanism and yet stable behavior with inexact, stochastic gradients, which has led to its wide-spread use in many machine learning applications. However, once we replace the projection operator with a simpler linear program, as is done in the Frank-Wolfe method, both simplicity and stability take a serious hit. The aim of this paper is to bring them back without sacrificing the efficiency. In this paper, we propose the first one-sample stochastic Frank-Wolfe algorithm, called 1-SFW, that avoids the need to carefully tune the batch size, step size, learning rate, and other complicated hyper parameters. In particular, 1-SFW achieves the optimal convergence rate of $\mathcal{O}(1/\epsilon^2)$ for reaching an $\epsilon$-suboptimal solution in the stochastic convex setting, and a $(1-1/e)-\epsilon$ approximate solution for a stochastic monotone DR-submodular maximization problem. Moreover, in a general non-convex setting, 1-SFW finds an $\epsilon$-first-order stationary point after at most $\mathcal{O}(1/\epsilon^3)$ iterations, achieving the current best known convergence rate. All of this is possible by designing a novel unbiased momentum estimator that governs the stability of the optimization process while using a single sample at each iteration.

12/07/2020

One Sample Stochastic Frank-Wolfe

Mingrui Zhang, Zebang Shen, Aryan Mokhtari, Hamed Hassani, Amin Karbasi

Comments

Similar Papers

A simpler approach to accelerated optimization: iterative averaging meets optimism

Pooria Joulani, Anant Raj, András György, Csaba Szepesvari

Keywords Abstract Paper

Online Learning, Active Learning, and Bandits

Boosting Frank-Wolfe by Chasing Gradients

Cyrille Combettes, Sebastian Pokutta

Keywords Abstract Paper

Smooth Bilevel Programming for Sparse Regularization

Clarice Poon, Gabriel Peyré

Keywords Abstract Paper

STORM+: Fully Adaptive SGD with Recursive Momentum for Nonconvex Optimization

Kfir Levy, Ali Kavis, Volkan Cevher

Keywords Abstract Paper

Efficient Projection-free Algorithms for Saddle Point Problems

Cheng Chen, Luo Luo, Weinan Zhang, Yong Yu

Keywords Abstract Paper

Task-Robust Model-Agnostic Meta-Learning

Liam Collins, Aryan Mokhtari, Sanjay Shakkottai

Keywords Abstract Paper

PAGE: A Simple and Optimal Probabilistic Gradient Estimator for Nonconvex Optimization

Zhize Li, Hongyan Bao, Xiangliang Zhang, Peter Richtarik

Keywords Abstract Paper

On the Convergence Theory of Gradient-Based Model-Agnostic Meta-Learning Algorithms

Alireza Fallah, Aryan Mokhtari, Asuman Ozdaglar

Keywords Abstract Paper

A Graduated Filter Method for Large Scale Robust Estimation

Huu Le, Christopher Zach

Keywords Abstract Paper

robust fitting, bundle adjustment, non-convex, poor local minima, non-linear least squares, graduated non-convexity.

Gaussian-Smoothed Optimal Transport: Metric Structure and Statistical Efficiency

Ziv Goldfeld, Kristjan Greenewald

Keywords Abstract Paper

Efficient Learning of Generative Models via Finite-Difference Score Matching

Tianyu Pang, Kun Xu, Chongxuan LI and Yang Song, Stefano Ermon, Jun Zhu

Keywords Abstract Paper

A Catalyst Framework for Minimax Optimization

Junchi Yang, Siqi Zhang, Negar Kiyavash, Niao He

Keywords Abstract Paper

Fast Deterministic CUR Matrix Decomposition with Accuracy Assurance

Yasutoshi Ida, Sekitoshi Kanai, Yasuhiro Fujiwara and Tomoharu Iwata, Koh Takeuchi, Hisashi Kashima

Keywords Abstract Paper

An efficient nonconvex reformulation of stagewise convex optimization problems

Rudy Bunel, Oliver Hinder, Srinadh Bhojanapalli, Krishnamurthy Dvijotham

Keywords Abstract Paper

Sharper Generalization Bounds for Learning with Gradient-dominated Objective Functions

Yunwen Lei, Yiming Ying

Keywords Abstract Paper

generalization bounds, non-convex learning

RMSprop converges with proper hyper-parameter

Naichen Shi, Dawei Li, Mingyi Hong, Ruoyu Sun

Keywords Abstract Paper

convergence, hyperparameter, RMSprop

Communication-Efficient Frank-Wolfe Algorithm for Nonconvex Decentralized Distributed Learning

Wenhan Xian, Feihu Huang, Heng Huang

Keywords Abstract Paper

Closing the Gap: Tighter Analysis of Alternating Stochastic Gradient Methods for Bilevel Problems

Tianyi Chen, Yuejiao Sun, Wotao Yin

Keywords Abstract Paper

theory, optimization, reinforcement learning and planning, machine learning

Accelerated Stochastic Gradient-free and Projection-free Methods

Feihu Huang, Lue Tao, Songcan Chen

Keywords Abstract Paper

Stochastic Bias-Reduced Gradient Methods

Hilal Asi, Yair Carmon, Arun Jambulapati and Yujia Jin, Aaron Sidford

Keywords Abstract Paper

theory, optimization, privacy

On Efficient Low Distortion Ultrametric Embedding

Vincent Cohen-Addad, Karthik C. S., Guillaume Lagarde

Keywords Abstract Paper

Self-concordant analysis of Frank-Wolfe algorithm

Mathias Staudigl, Pavel Dvurechenskii, Shimrit Shtern and Kamil Safin, Petr Ostroukhov

Keywords Abstract Paper

Leveraging Non-uniformity in First-order Non-convex Optimization

Jincheng Mei, Yue Gao, Bo Dai and Csaba Szepesvari, Dale Schuurmans

Keywords Abstract Paper

Theory, RL, Decisions and Control Theory

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Tianyu Pang, Kun Xu, Chongxuan LI and
Yang Song, Stefano Ermon, Jun Zhu

Keywords Paper

Keywords Paper

Yasutoshi Ida, Sekitoshi Kanai, Yasuhiro Fujiwara and
Tomoharu Iwata, Koh Takeuchi, Hisashi Kashima

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Hilal Asi, Yair Carmon, Arun Jambulapati and
Yujia Jin, Aaron Sidford

Keywords Paper

Keywords Paper

Mathias Staudigl, Pavel Dvurechenskii, Shimrit Shtern and
Kamil Safin, Petr Ostroukhov

Keywords Paper

Jincheng Mei, Yue Gao, Bo Dai and
Csaba Szepesvari, Dale Schuurmans

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

T. Anderson Keller, Jorn Peters, Priyank Jaini and
Emiel Hoogeboom, Patrick Forré, Max Welling

Keywords Paper

Sophie Grunbacher, Ramin Hasani, Mathias Lechner and
Jacek Cyranka, Scott A. Smolka, Radu Grosu

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper