Accelerating Gossip SGD with Periodic Global Averaging

18/07/2021

Accelerating Gossip SGD with Periodic Global Averaging

Yiming Chen, Kun Yuan, Yingya Zhang, Pan Pan, Yinghui Xu, Wotao Yin

Keywords: Optimization, Distributed and Parallel Optimization

Abstract Paper Similar Papers

Abstract: Communication overhead hinders the scalability of large-scale distributed training. Gossip SGD, where each node averages only with its neighbors, is more communication-efficient than the prevalent parallel SGD. However, its convergence rate is reversely proportional to quantity $1-\beta$ which measures the network connectivity. On large and sparse networks where $1-\beta \to 0$, Gossip SGD requires more iterations to converge, which offsets against its communication benefit. This paper introduces Gossip-PGA, which adds Periodic Global Averaging to accelerate Gossip SGD. Its transient stage, i.e., the iterations required to reach asymptotic linear speedup stage, improves from $\Omega(\beta^4 n^3/(1-\beta)^4)$ to $\Omega(\beta^4 n^3 H^4)$ for non-convex problems. The influence of network topology in Gossip-PGA can be controlled by the averaging period $H$. Its transient-stage complexity is also superior to local SGD which has order $\Omega(n^3 H^4)$. Empirical results of large-scale training on image classification (ResNet50) and language modeling (BERT) validate our theoretical findings.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICML 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

23/08/2020

Edge-consensus learning: Deep learning on P2P networks with nonhomogeneous data

Kenta Niwa, Noboru Harada, Guoqiang Zhang, W. Bastiaan Kleijn

Keywords Paper

alternating direction method of multiplier (ADMM), asynchronous communication, deep neural network (DNN), non-independent and identically distributed (non-iid) data, peer-to-peer (P2P) network, primal-dual method of multiplier (PDMM)

0

0

0

0

17:09

06/12/2021

Exponential Graph is Provably Efficient for Decentralized Deep Training

Bicheng Ying, Kun Yuan, Yiming Chen and
Hanbin Hu, PAN PAN, Wotao Yin

Keywords Paper

deep learning, optimization, graph learning

0

0

0

0

14:16

03/08/2020

Brief announcement: Deterministic lower bound for dynamic balanced graph partitioning

Maciej Pacut, Mahmoud Parham, Stefan Schmid

Keywords Paper

online algorithms, graph partitioning, self-adjusting networks

0

0

0

0

10:22

06/12/2021

Asynchronous Decentralized SGD with Quantized and Local Updates

Giorgi Nadiradze, Amirmojtaba Sabour, Peter Davies and
Shigang Li, Dan Alistarh

Keywords Paper

optimization, machine learning, graph learning

0

0

0

0

12:37

18/07/2021

Consistent Nonparametric Methods for Network Assisted Covariate Estimation

Xueyu Mao, Deepayan Chakrabarti, Purnamrita Sarkar

Keywords Paper

Algorithms, Networks and Relational Learning

0

0

0

0

5:15

03/08/2020

Computing shortest paths and diameter in the hybrid network model

Fabian Kuhn, Philipp Schneider

Keywords Paper

0

0

0

0

27:41

06/12/2020

Online Influence Maximization under Linear Threshold Model

Shuai Li, Fang Kong, Kejie Tang and
Qizhi Li, Wei Chen

Keywords Paper

0

0

0

0

3:15

06/12/2021

Distributed Saddle-Point Problems Under Data Similarity

Aleksandr Beznosikov, Gesualdo Scutari, Alexander Rogozin, Alexander Gasnikov

Keywords Paper

optimization

0

0

0

0

12:13

06/12/2021

The Limitations of Large Width in Neural Networks: A Deep Gaussian Process Perspective

Geoff Pleiss, John Cunningham

Keywords Paper

deep learning, kernel methods

0

0

0

0

6:59

18/07/2021

PHEW : Constructing Sparse Networks that Learn Fast and Generalize Well without Training Data

Shreyas Malakarjun Patil, Constantine Dovrolis

Keywords Paper

Deep Learning

1

1

0

1

5:20

13/04/2021

Federated learning with compression: Unified analysis and sharp guarantees

Farzin Haddadpour, Mohammad Mahdi Kamani, Aryan Mokhtari, Mehrdad Mahdavi

Keywords Paper

0

0

0

0

3:03

06/12/2020

Improved Analysis of Clipping Algorithms for Non-convex Optimization

Bohang Zhang, Jikai Jin, Cong Fang, Liwei Wang

Keywords Paper

0

0

0

0

3:16

25/07/2020

Accelerated convergence for counterfactual learning to rank

Rolf Jagerman, Maarten Rijke

Keywords Paper

unbiased learning, counterfactual learning, learning to rank

0

0

0

0

14:21

05/12/2020

Rumor detection on Twitter using multiloss hierarchical BiLSTM with an attenuation factor

Yudianto Sujana, Jiawen Li, Hung-Yu Kao

Keywords Paper

0

0

0

0

13:00

26/04/2020

Network Deconvolution

Chengxi Ye, Matthew Evanusa, Hua He and
Anton Mitrokhin, Tom Goldstein, James A. Yorke, Cornelia Fermuller, Yiannis Aloimonos

Keywords Paper

convolutional networks, network deconvolution, whitening

0

0

0

0

4:59

06/12/2020

Efficient Low Rank Gaussian Variational Inference for Neural Networks

Marcin Tomczak, Siddharth Swaroop, Richard Turner

Keywords Paper

Probabilistic Methods -> Latent Variable Models, Probabilistic Methods -> Topic Models

0

0

0

0

2:48

06/12/2021

RelaySum for Decentralized Deep Learning on Heterogeneous Data

Thijs Vogels, Lie He, Anastasiia Koloskova and
Sai Praneeth Karimireddy, Tao Lin, Sebastian Stich, Martin Jaggi

Keywords Paper

deep learning, optimization, machine learning, privacy

0

0

0

1

14:03

03/05/2021

Simple Spectral Graph Convolution

Hao Zhu, Piotr Koniusz

Keywords Paper

Graph Convolutional Network, Oversmoothing

0

0

0

0

5:06

02/02/2021

STL-SGD: Speeding Up Local SGD with Stagewise Communication Period

Shuheng Shen, Yifei Cheng, Jingchang Liu, Linli Xu

Keywords Paper

0

0

0

0

14:53

06/12/2021

Fast Routing under Uncertainty: Adaptive Learning in Congestion Games via Exponential Weights

Dong Quan Vu, Kimon Antonakopoulos, Panayotis Mertikopoulos

Keywords Paper

theory

0

0

0

0

10:19

18/07/2021

Training Adversarially Robust Sparse Networks via Bayesian Connectivity Sampling

Ozan Özdenizci, Robert Legenstein

Keywords Paper

Algorithms, Adversarial Examples

0

0

0

1

6:27

02/02/2021

Adversarial Permutation Guided Node Representations for Link Prediction

Indradyumna Roy, Abir De, Soumen Chakrabarti

Keywords Paper

0

0

0

0

15:27

06/12/2021

Delayed Gradient Averaging: Tolerate the Communication Latency for Federated Learning

Ligeng Zhu, Hongzhou Lin, Yao Lu and
Yujun Lin, Song Han

Keywords Paper

optimization, machine learning, federated learning

0

0

0

1

14:48

26/08/2020

Communication-Efficient Distributed Optimization in Networks with Gradient Tracking and Variance Reduction

Boyue Li, Shicong Cen, Yuxin Chen, Yuejie Chi

Keywords Paper

0

0

0

0

13:50

14/06/2020

Attention Scaling for Crowd Counting

Xiaoheng Jiang, Li Zhang, Mingliang Xu and
Tianzhu Zhang, Pei Lv, Bing Zhou, Xin Yang, Yanwei Pang

Keywords Paper

convolutional neural network, crowd counting, density attention, attention scaling

0

0

0

0

1:00

26/08/2020

Accelerated Primal-Dual Algorithms for Distributed Smooth Convex Optimization over Networks

Jinming Xu, Ye Tian, Ying Sun, Gesualdo Scutari

Keywords Paper

0

0

0

0

14:06

14/06/2020

ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks

Qilong Wang, Banggu Wu, Pengfei Zhu and
Peihua Li, Wangmeng Zuo, Qinghua Hu

Keywords Paper

channel attention, efficient, adaptive 1d convolution, deep cnns, image classifcation, object detection, instance segmentation

0

0

0

0

0:57

05/04/2021

An Efficient Statistical-based Gradient Compression Technique for Distributed Training Systems

Ahmed M. Abdelmoniem, Ahmed Elzanaty Elzanaty, Mohamed-Slim Alouini , Marco Canini

Keywords Paper

0

0

0

0

4:13

05/04/2021

An Efficient Statistical-based Gradient Compression Technique for Distributed Training Systems

Ahmed M. Abdelmoniem, Ahmed Elzanaty Elzanaty, Mohamed-Slim Alouini , Marco Canini

Keywords Paper

0

0

0

0

22:37

18/07/2021

Optimal Complexity in Decentralized Training

Yucheng Lu, Christopher De Sa

Keywords Paper

Optimization, Distributed and Parallel Optimization

0

0

0

1

19:59

19/08/2021

Learning Deeper Non-Monotonic Networks by Softly Transferring Solution Space

Zheng-Fan Wu, Hui Xue, Weimin Bai

Keywords Paper

Machine Learning, Kernel Methods, Deep Learning, Classification

0

0

0

0

12:50

03/05/2021

Multiplicative Filter Networks

Rizal Fathony, Anit Kumar Sahu, Devin Willmott, Zico Kolter

Keywords Paper

Fourier Features, Implicit Neural Representations, Deep Architectures

0

0

0

0

6:06

06/12/2021

Communication-efficient SGD: From Local SGD to One-Shot Averaging

Artin Spiridonoff, Alex Olshevsky, Yannis Paschalidis

Keywords Paper

optimization

0

0

0

0

14:57

12/07/2020

Low Bias Low Variance Gradient Estimates for Hierarchical Boolean Stochastic Networks

Adeel Pervez, Taco Cohen, Efstratios Gavves

Keywords Paper

Deep Learning - Generative Models and Autoencoders

0

0

0

0

14:28

16/11/2020

Best-First Beam Search

Clara Meister, Ryan Cotterell, Tim Vieira

Keywords Paper

nlp tasks, exact search, decoding, heuristic algorithm

0

0

0

0

12:19

14/09/2020

Mend The Learning Approach, Not the Data: Insights for Ranking E-Commerce Products

Muhammad Umer Anwaar, Dmytro Rybalko, Martin Kleinsteuber

Keywords Paper

information retrieval, ranking and preference learning, learning to rank, e-commerce search, implicit feedback, counterfactual risk minimization, dataset, mining data logs

0

0

0

0

11:21

14/06/2020

Dynamic Graph Message Passing Networks

Li Zhang, Dan Xu, Anurag Arnab, Philip H.S. Torr

Keywords Paper

message passing, graph convolutional networks, semantic segmentation, object detection, instance segmentation, representation learning

0

0

0

0

4:41

02/02/2021

Continuous Self-Attention Models with Neural ODE Networks

Jing Zhang, Peng Zhang, Baiwen Kong and
Junqiu Wei, Xin Jiang

Keywords Paper

0

0

0

0

15:25

06/12/2020

LoCo: Local Contrastive Representation Learning

Yuwen Xiong, Mengye Ren, Raquel Urtasun

Keywords Paper

0

1

0

1

3:18

06/12/2020

Sharp Representation Theorems for ReLU Networks with Precise Dependence on Depth

Guy Bresler, Dheeraj Nagaraj

Keywords Paper

Algorithms -> Representation Learning, Deep Learning -> Embedding Approaches

0

0

0

0

3:07