Gap-Aware Mitigation of Gradient Staleness

26/04/2020

Gap-Aware Mitigation of Gradient Staleness

Saar Barkai, Ido Hakimi, Assaf Schuster

Keywords: distributed, asynchronous, large scale, gradient staleness, staleness penalization, sgd, deep learning, neural networks, optimization

Abstract Paper Code Similar Papers

Abstract: Cloud computing is becoming increasingly popular as a platform for distributed training of deep neural networks. Synchronous stochastic gradient descent (SSGD) suffers from substantial slowdowns due to stragglers if the environment is non-dedicated, as is common in cloud computing. Asynchronous SGD (ASGD) methods are immune to these slowdowns but are scarcely used due to gradient staleness, which encumbers the convergence process. Recent techniques have had limited success mitigating the gradient staleness when scaling up to many workers (computing nodes). In this paper we define the Gap as a measure of gradient staleness and propose Gap-Aware (GA), a novel asynchronous-distributed method that penalizes stale gradients linearly to the Gap and performs well even when scaling to large numbers of workers. Our evaluation on the CIFAR, ImageNet, and WikiText-103 datasets shows that GA outperforms the currently acceptable gradient penalization method, in final test accuracy. We also provide convergence rate proof for GA. Despite prior beliefs, we show that if GA is applied, momentum becomes beneficial in asynchronous environments, even when the number of workers scales up.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICLR 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

14/06/2020

On the Acceleration of Deep Learning Model Parallelism With Staleness

An Xu, Zhouyuan Huo, Heng Huang

Keywords Paper

layer-wise staleness, asynchronous model parallelism, convolutional neural networks.

0

0

0

0

1:01

18/07/2021

Training Adversarially Robust Sparse Networks via Bayesian Connectivity Sampling

Ozan Özdenizci, Robert Legenstein

Keywords Paper

Algorithms, Adversarial Examples

0

0

0

1

6:27

18/07/2021

Addressing Catastrophic Forgetting in Few-Shot Problems

Pauching Yap, Hippolyt Ritter, David Barber

Keywords Paper

Applications, Computer Vision, Deep Learning, CNN Architectures; Deep Learning, Generative Models, Algorithms, Multitask, Transfer, and Meta Learning

0

0

0

0

5:11

25/07/2020

Accelerated convergence for counterfactual learning to rank

Rolf Jagerman, Maarten Rijke

Keywords Paper

unbiased learning, counterfactual learning, learning to rank

0

0

0

0

14:21

06/12/2020

Improved Analysis of Clipping Algorithms for Non-convex Optimization

Bohang Zhang, Jikai Jin, Cong Fang, Liwei Wang

Keywords Paper

0

0

0

0

3:16

13/07/2020

Model-Switching: Dealing with Fluctuating Workloads in Machine-Learning-as-a-Service Systems

Jeff Zhang, Sameh Elnikety, Shuayb Zarar and
Atul Gupta, Siddharth Garg

Keywords Paper

0

0

0

0

17:05

26/04/2020

Distributionally Robust Neural Networks

Shiori Sagawa, Pang Wei Koh, Tatsunori B. Hashimoto, Percy Liang

Keywords Paper

distributionally robust optimization, deep learning, robustness, generalization, regularization

0

0

0

1

5:22

06/12/2020

Constant-Expansion Suffices for Compressed Sensing with Generative Priors

Constantinos Daskalakis, Dhruv Rohatgi, Emmanouil Zampetakis

Keywords Paper

0

0

0

0

3:13

13/04/2021

Federated learning with compression: Unified analysis and sharp guarantees

Farzin Haddadpour, Mohammad Mahdi Kamani, Aryan Mokhtari, Mehrdad Mahdavi

Keywords Paper

0

0

0

0

3:03

06/12/2021

Asynchronous Decentralized SGD with Quantized and Local Updates

Giorgi Nadiradze, Amirmojtaba Sabour, Peter Davies and
Shigang Li, Dan Alistarh

Keywords Paper

optimization, machine learning, graph learning

0

0

0

0

12:37

14/06/2020

EcoNAS: Finding Proxies for Economical Neural Architecture Search

Dongzhan Zhou, Xinchi Zhou, Wenwei Zhang and
Chen Change Loy, Shuai Yi, Xuesen Zhang, Wanli Ouyang

Keywords Paper

neural architecture search, evaluation proxy, acceleration, evolutionary algorithm, image recognition

0

0

0

0

1:01

03/05/2021

MetaNorm: Learning to Normalize Few-Shot Batches Across Domains

Yingjun Du, Xiantong Zhen, Ling Shao, Cees G Snoek

Keywords Paper

batch normalization, Meta-learning, few-shot domain generalization

0

0

0

0

5:48

06/12/2021

Gradient-based Hyperparameter Optimization Over Long Horizons

Paul Micaelli, Amos Storkey

Keywords Paper

optimization, meta learning

0

0

0

0

14:44

09/07/2020

A Greedy Anytime Algorithm for Sparse PCA

Dan Vilenchik, Adam Soffer, Guy Holtzman

Keywords Paper

Non-convex optimization, Combinatorial optimization, Computational complexity, High-dimensional statistics, Unsupervised and semi-supervised learning

0

0

0

0

15:31

18/07/2021

Crowdsourcing via Annotator Co-occurrence Imputation and Provable Symmetric Nonnegative Matrix Factorization

Shahana Ibrahim, Xiao Fu

Keywords Paper

Algorithms, Crowdsourcing

0

0

0

0

15:55

18/07/2021

Globally-Robust Neural Networks

Klas Leino, Zifan Wang, Matt Fredrikson

Keywords Paper

Social Aspects of Machine Learning, AI Safety

0

0

0

0

7:55

06/12/2021

Powerpropagation: A sparsity inducing weight reparameterisation

Jonathan Schwarz, Siddhant M Jayakumar, Razvan Pascanu and
Peter E Latham, Yee Teh

Keywords Paper

deep learning, optimization, continual learning

0

0

0

1

9:08

06/12/2021

Fast Routing under Uncertainty: Adaptive Learning in Congestion Games via Exponential Weights

Dong Quan Vu, Kimon Antonakopoulos, Panayotis Mertikopoulos

Keywords Paper

theory

0

0

0

0

10:19

12/07/2020

Fast and Three-rious: Speeding Up Weak Supervision with Triplet Methods

Dan Fu, Mayee Chen, Frederic Sala and
Sarah Hooper, Kayvon Fatahalian, Christopher Re

Keywords Paper

Probabilistic Inference - Models and Probabilistic Programming

0

0

0

0

15:01

13/04/2021

Local stochastic gradient descent ascent: Convergence analysis and communication efficiency

Yuyang Deng, Mehrdad Mahdavi

Keywords Paper

0

0

0

0

2:58

06/12/2021

Intrinsic Dimension, Persistent Homology and Generalization in Neural Networks

Tolga Birdal, Aaron Lou, Leonidas Guibas, Umut Simsekli

Keywords Paper

theory, deep learning, optimization

0

0

0

0

14:38

13/04/2021

A linearly convergent algorithm for decentralized optimization: Sending less bits for free!

Dmitry Kovalev, Anastasia Koloskova, Martin Jaggi and
Peter Richtarik, Sebastian Stich

Keywords Paper

0

0

0

0

3:07

06/12/2021

Exponential Graph is Provably Efficient for Decentralized Deep Training

Bicheng Ying, Kun Yuan, Yiming Chen and
Hanbin Hu, PAN PAN, Wotao Yin

Keywords Paper

deep learning, optimization, graph learning

0

0

0

0

14:16

14/09/2020

An efficient K-means clustering algorithm for tall data

Marco Capó, Aritz Pérez, Jose A. Lozan

Keywords Paper

0

0

0

0

14:46

03/05/2021

Scaling the Convex Barrier with Active Sets

Alessandro De Palma, Harkirat Singh Behl, Rudy R Bunel and
Philip Torr, M. Pawan Kumar

Keywords Paper

Optimisation for Deep Learning, Neural Network Bounding, Neural Network Verification

0

0

0

0

5:15

06/12/2021

A Faster Decentralized Algorithm for Nonconvex Minimax Problems

Wenhan Xian, Feihu Huang, Yanfu Zhang, Heng Huang

Keywords Paper

optimization, machine learning, adversarial robustness and security

0

0

0

0

13:59

12/07/2020

Active Learning on Attributed Graphs via Graph Cognizant Logistic Regression and Preemptive Query Generation

Florence Regol, Soumyasundar Pal, Yingxue Zhang, Mark Coates

Keywords Paper

Online Learning, Active Learning, and Bandits

0

0

0

0

8:16

14/06/2020

Towards Unified INT8 Training for Convolutional Neural Network

Feng Zhu, Ruihao Gong, Fengwei Yu and
Xianglong Liu, Yanfei Wang, Zhelong Li, Xiuqi Yang, Junjie Yan

Keywords Paper

int8 training, gradient quantization, direction sensitive gradient clipping, learning rate scaling, gradient distribution

0

0

0

0

1:01

06/12/2021

AC/DC: Alternating Compressed/DeCompressed Training of Deep Neural Networks

Alexandra Peste, Eugenia Iofinova, Adrian Vladu, Dan Alistarh

Keywords Paper

deep learning

0

0

0

0

14:01

18/07/2021

The Implicit Bias for Adaptive Optimization Algorithms on Homogeneous Neural Networks

Bohan Wang, Qi Meng, Wei Chen, Tie-Yan Liu

Keywords Paper

Theory, Deep learning Theory

0

0

0

0

16:53

13/07/2020

Stratus: Clouds with Microarchitectural Resource Management

Kaveh Razavi, Animesh Trivedi

Keywords Paper

0

0

0

0

14:31

23/08/2020

DeepTriage: Automated transfer assistance for incidents in cloud services

Phuong Pham, Vivek Jain, Lukas Dauterman and
Justin Ormont, Navendu Jain

Keywords Paper

incident management, deep learning, transfer assistant, incident transfer, incident triage

0

0

0

0

10:44

12/07/2020

Non-Stationary Bandits with Intermediate Observations

Claire Vernade, András György, Timothy Mann

Keywords Paper

Online Learning, Active Learning, and Bandits

1

1

0

0

14:40

03/05/2021

On the Bottleneck of Graph Neural Networks and its Practical Implications

Uri Alon, Eran Yahav

Keywords Paper

GNNs, graphs, over-squashing, bottleneck, understanding, limitations

0

0

0

1

5:16

12/07/2020

On hyperparameter tuning in general clustering problemsm

Xinjie Fan, Yuguang Yue, Purnamrita Sarkar, Y. X. Rachel Wang

Keywords Paper

Unsupervised and Semi-Supervised Learning

0

0

0

0

12:53

18/07/2021

PC-MLP: Model-based Reinforcement Learning with Policy Cover Guided Exploration

Yuda Song, Wen Sun

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

5:13

06/12/2020

MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures

Jeong Un Ryu, JWoong Shin, Hae Beom Lee, Sung Ju Hwang

Keywords Paper

0

0

0

0

3:32

12/07/2020

Communication-Efficient Distributed Stochastic AUC Maximization with Deep Neural Networks

Zhishuai Guo, Mingrui Liu, Zhuoning Yuan and
Li Shen, Wei Liu, Tianbao Yang

Keywords Paper

Optimization - Large Scale, Parallel and Distributed

0

0

0

0

14:42

18/07/2021

Lipschitz normalization for self-attention layers with application to graph neural networks

George Dasoulas, Kevin Scaman, Aladin Virmaux

Keywords Paper

Deep Learning

0

0

0

0

4:53

03/08/2020

Genuinely distributed byzantine machine learning

El-Mahdi El-Mhamdi, Rachid Guerraoui, Arsany Guirguis and
Lê Nguyên Hoang, Sébastien Rouault

Keywords Paper

distributed machine learning, byzantine parameter servers, byzantine fault tolerance

1

1

0

1

22:05