Stochastic Sign Descent Methods: New Algorithms and Better Theory

18/07/2021

Stochastic Sign Descent Methods: New Algorithms and Better Theory

Mher Safaryan, Peter Richtarik

Keywords: Optimization, Distributed and Parallel Optimization

Abstract Paper Similar Papers

Abstract: Various gradient compression schemes have been proposed to mitigate the communication cost in distributed training of large scale machine learning models. Sign-based methods, such as signSGD (Bernstein et al., 2018), have recently been gaining popularity because of their simple compression rule and connection to adaptive gradient methods, like ADAM. In this paper, we analyze sign-based methods for non-convex optimization in three key settings: (i) standard single node, (ii) parallel with shared data and (iii) distributed with partitioned data. For single machine case, we generalize the previous analysis of signSGD relying on intuitive bounds on success probabilities and allowing even biased estimators. Furthermore, we extend the analysis to parallel setting within a parameter server framework, where exponentially fast noise reduction is guaranteed with respect to number of nodes, maintaining $1$-bit compression in both directions and using small mini-batch sizes. Next, we identify a fundamental issue with signSGD to converge in distributed environment. To resolve this issue, we propose a new sign-based method, {\em Stochastic Sign Descent with Momentum (SSDM)}, which converges under standard bounded variance assumption with the optimal asymptotic rate. We validate several aspects of our theoretical findings with numerical experiments.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICML 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

03/08/2020

Lagrangian Decomposition for Neural Network Verification

Rudy Bunel, Alessandro De Palma, Alban Desmaison and
Krishnamurthy Dvijotham, Pushmeet Kohli, Philip Torr, M. Pawan Kumar

Keywords Paper

0

0

0

0

8:05

12/07/2020

A Unified Theory of Decentralized SGD with Changing Topology and Local Updates

Anastasiia Koloskova, Nicolas Loizou, Sadra Boreiri and
Martin Jaggi, Sebastian Stich

Keywords Paper

Optimization - Large Scale, Parallel and Distributed

0

0

0

0

13:46

06/12/2020

A Contour Stochastic Gradient Langevin Dynamics Algorithm for Simulations of Multi-modal Distributions

Wei Deng, Guang Lin, Faming Liang

Keywords Paper

0

0

0

0

3:26

06/12/2021

Efficient First-Order Contextual Bandits: Prediction, Allocation, and Triangular Discrimination

Dylan J Foster, Akshay Krishnamurthy

Keywords Paper

theory, reinforcement learning and planning, bandits, online learning

0

0

0

0

19:34

18/07/2021

Robust Unsupervised Learning via L-statistic Minimization

Andreas Maurer, Daniela Angela Parletta, Andrea Paudice, Massimiliano Pontil

Keywords Paper

Theory, Statistical Learning Theory

0

0

0

0

5:03

06/12/2020

Adaptive Discretization for Model-Based Reinforcement Learning

Sean Sinclair, Tianyu Wang, Gauri Jain and
Sid Banerjee, Christina Yu

Keywords Paper

0

0

0

0

3:12

12/07/2020

Consistent Structured Prediction with Max-Min Margin Markov Networks

Alex Nowak, Francis Bach, Alessandro Rudi

Keywords Paper

Sequential, Network, and Time-Series Modeling

0

0

0

0

13:42

12/07/2020

A simpler approach to accelerated optimization: iterative averaging meets optimism

Pooria Joulani, Anant Raj, András György, Csaba Szepesvari

Keywords Paper

Online Learning, Active Learning, and Bandits

0

0

1

1

16:17

06/12/2020

Gaussian Process Bandit Optimization of the Thermodynamic Variational Objective

Vu Nguyen, Vaden Masrani, Rob Brekelmans and
Michael A Osborne, Frank Wood

Keywords Paper

0

0

0

0

3:23

26/04/2020

Accelerating SGD with momentum for over-parameterized learning

Chaoyue Liu, Mikhail Belkin

Keywords Paper

SGD, acceleration, momentum, stochastic, over-parameterized, Nesterov

0

0

0

0

4:50

04/08/2021

Fast Rates for Structured Prediction

Vivien A Cabannnes, Francis Bach, Alessandro Rudi

Keywords Paper

0

0

0

0

16:17

02/02/2021

Learning Continuous High-Dimensional Models using Mutual Information and Copula Bayesian Networks

Marvin Lasserre, Régis Lebrun, Pierre-Henri Wuillemin

Keywords Paper

0

0

0

0

19:18

18/07/2021

Bilevel Optimization: Convergence Analysis and Enhanced Design

Kaiyi Ji, Junjie Yang, Yingbin LIANG

Keywords Paper

Optimization, Non-Convex Optimization

0

0

0

0

5:02

03/05/2021

Mixed-Features Vectors and Subspace Splitting

Alejandro Pimentel-Alarcón, Daniel L Pimentel-Alarcón

Keywords Paper

0

0

0

0

13:27

12/07/2020

dS^2LBI: Exploring Structural Sparsity on Deep Network via Differential Inclusion Paths

Yanwei Fu, Chen Liu, Donghao Li and
Xinwei Sun, Jinshan ZENG, Yuan Yao

Keywords Paper

Deep Learning - Algorithms

0

0

0

1

12:45

06/12/2021

Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex Decentralized Optimization Over Time-Varying Networks

Dmitry Kovalev, Elnur Gasanov, Alexander Gasnikov, Peter Richtarik

Keywords Paper

optimization

0

0

0

0

15:02

26/08/2020

One Sample Stochastic Frank-Wolfe

Mingrui Zhang, Zebang Shen, Aryan Mokhtari and
Hamed Hassani, Amin Karbasi

Keywords Paper

0

0

0

0

6:05

26/08/2020

On the Convergence Theory of Gradient-Based Model-Agnostic Meta-Learning Algorithms

Alireza Fallah, Aryan Mokhtari, Asuman Ozdaglar

Keywords Paper

0

0

0

0

15:02

13/04/2021

Critical parameters for scalable distributed learning with large batches and asynchronous updates

Sebastian Stich, Amirkeivan Mohtashami, Martin Jaggi

Keywords Paper

0

0

0

0

3:00

12/07/2020

Communication-Efficient Federated Learning with Sketching

Daniel Rothchild, Ashwinee Panda, Enayat Ullah and
Nikita Ivkin, Vladimir Braverman, Joseph Gonzalez, Ion Stoica, Raman Arora

Keywords Paper

Optimization - Large Scale, Parallel and Distributed

0

0

0

1

15:26

06/12/2021

EF21: A New, Simpler, Theoretically Better, and Practically Faster Error Feedback

Peter Richtarik, Igor Sokolov, Ilyas Fatkhullin

Keywords Paper

optimization, machine learning

0

0

0

0

19:56

06/12/2021

Breaking the centralized barrier for cross-device federated learning

Sai Praneeth Karimireddy, Martin Jaggi, Satyen Kale and
Mehryar Mohri, Sashank Reddi, Sebastian Stich, Ananda Theertha Suresh

Keywords Paper

optimization, reinforcement learning and planning, federated learning

0

0

0

0

13:48

18/07/2021

Communication-Efficient Distributed Optimization with Quantized Preconditioners

Foivos Alimisis, Peter Davies, Dan Alistarh

Keywords Paper

Optimization, Distributed and Parallel Optimization

0

0

0

0

5:33

06/12/2021

A Law of Iterated Logarithm for Multi-Agent Reinforcement Learning

Gugan Chandrashekhar Thoppe, Bhumesh Kumar

Keywords Paper

reinforcement learning and planning, graph learning

0

0

0

0

15:06

13/04/2021

Adversarially robust estimate and risk analysis in linear regression

Yue Xing, Ruizhi Zhang, Guang Cheng

Keywords Paper

0

0

0

0

3:03

06/12/2021

Towards a Unified Information-Theoretic Framework for Generalization

Mahdi Haghifam, Gintare Karolina Dziugaite, Shay Moran, Dan Roy

Keywords Paper

graph learning

0

0

0

0

11:51

06/12/2020

Fourier Sparse Leverage Scores and Approximate Kernel Learning

Tamas Erdelyi, Cameron Musco, Christopher Musco

Keywords Paper

0

0

0

0

3:25

13/04/2021

Explicit regularization of stochastic gradient methods through duality

Anant Raj, Francis Bach

Keywords Paper

0

0

0

0

2:53

09/07/2020

Noise-tolerant, Reliable Active Classification with Comparison Queries

Max Hopkins, Shachar Lovett, Daniel Kane, Gaurav Mahajan

Keywords Paper

Active learning, Classification, Learning with algebraic or combinatorial structure, PAC learning

0

0

0

0

15:23

02/02/2021

Asynchronous Optimization Methods for Efficient Training of Deep Neural Networks with Guarantees

Vyacheslav Kungurtsev, Malcolm Egan, Bapi Chatterjee, Dan Alistarh

Keywords Paper

0

0

0

0

19:56

09/07/2020

Finite Time Analysis of Linear Two-timescale Stochastic Approximation with Markovian Noise

Maksim Kaledin, Eric Moulines, Alexey Naumov and
Vladislav Tadic, Hoi-To Wai

Keywords Paper

Stochastic optimization, Reinforcement learning

0

0

0

0

12:29

03/05/2021

Accelerating Convergence of Replica Exchange Stochastic Gradient MCMC via Variance Reduction

Wei Deng, Qi Feng, Georgios Karagiannis and
Guang Lin, Faming Liang

Keywords Paper

Markov jump process, uncertainty quantification, generalized Girsanov theorem, change of measure, stochastic gradient Langevin dynamics, parallel tempering, replica exchange, Dirichlet form, variance reduction

0

0

0

0

5:19

02/02/2021

Infinite Gaussian Mixture Modeling with an Improved Estimation of the Number of Clusters

Avi Matza, Yuval Bistritz

Keywords Paper

0

0

0

0

20:14

12/07/2020

The continuous categorical: a novel simplex-valued exponential family

Elliott Gordon-Rodriguez, Gabriel Loaiza-Ganem, John Cunningham

Keywords Paper

Probabilistic Inference - Models and Probabilistic Programming

0

0

0

0

14:59

13/04/2021

CWY parametrization: A solution for parallelized optimization of orthogonal and stiefel matrices

Valerii Likhosherstov, Jared Davis, Krzysztof Choromanski, Adrian Weller

Keywords Paper

0

0

0

0

3:02

09/07/2020

A Fast Spectral Algorithm for Mean Estimation with Sub-Gaussian Rates

Zhixian Lei, Kyle Luh, Prayaag Venkat, Fred Zhang

Keywords Paper

High-dimensional statistics, Adversarial learning and robustness

0

0

0

0

15:00

06/12/2021

Stability and Generalization of Bilevel Programming in Hyperparameter Optimization

Fan Bao, Guoqiang Wu, Chongxuan LI and
Jun Zhu, Bo Zhang

Keywords Paper

optimization

0

0

0

0

8:58

06/12/2020

Task-Robust Model-Agnostic Meta-Learning

Liam Collins, Aryan Mokhtari, Sanjay Shakkottai

Keywords Paper

0

0

0

0

3:17

14/06/2020

Discrete Model Compression With Resource Constraint for Deep Neural Networks

Shangqian Gao, Feihu Huang, Jian Pei, Heng Huang

Keywords Paper

covutional neural networks, model compression, channel pruning, discrete optimization

0

0

0

0

1:01

06/12/2021

Hyperparameter Tuning is All You Need for LISTA

Xiaohan Chen, Jialin Liu, Zhangyang Wang, Wotao Yin

Keywords Paper

deep learning

0

0

0

0

15:05