Towards Scalable Distributed Training of Deep Learning on Public Cloud Clusters

05/04/2021

Towards Scalable Distributed Training of Deep Learning on Public Cloud Clusters

Shaohuai Shi, Xianhao Zhou, Shutao Song, Xingyao Wang, Zilin Zhu, Xue Huang, Xinan Jiang, Feihu Zhou, Zhenyu Guo, Liqiang Xie, Rui Lan, Xianbin Ouyang, Yan Zhang, Jieqian Wei, Jing Gong, Weiliang Lin, Ping Gao, Peng Meng, Xiaomin Xu, Chenyang Guo, Bo Yang, Zhibo Chen, Yongjian Wu, Xiaowen Chu

Keywords:

Abstract Paper Similar Papers

Abstract: Distributed training techniques have been widely deployed in large-scale deep models training on dense-GPU clusters. However, on public cloud clusters, due to the moderate inter-connection bandwidth between instances, traditional state-of-the-art distributed training systems cannot scale well in training large-scale models. In this paper, we propose a new computing and communication efficient top-k sparsification communication library for distributed training. To further improve the system scalability, we optimize I/O by proposing a simple yet efficient multi-level data caching mechanism and optimize the update operation by introducing a novel parallel tensor operator. Experimental results on a 16-node Tencent Cloud cluster (each node with 8 Nvidia Tesla V100 GPUs) show that our system achieves 25%-40% faster than existing state-of-the-art systems on CNNs and Transformer. We finally break the record on DAWNBench on training ResNet-50 to 93% top-5 accuracy on ImageNet.

The video of this talk cannot be embedded. You can watch it here:

https://slideslive.com/38952751

(Link will open in new window)

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at MLSYS 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

05/04/2021

Towards Scalable Distributed Training of Deep Learning on Public Cloud Clusters

Shaohuai Shi, Xianhao Zhou, Shutao Song and
Xingyao Wang, Zilin Zhu, Xue Huang, Xinan Jiang, Feihu Zhou, Zhenyu Guo, Liqiang Xie, Rui Lan, Xianbin Ouyang, Yan Zhang, Jieqian Wei, Jing Gong, Weiliang Lin, Ping Gao, Peng Meng, Xiaomin Xu, Chenyang Guo, Bo Yang, Zhibo Chen, Yongjian Wu, Xiaowen Chu

Keywords Paper

0

0

0

0

5:02

06/12/2021

BatchQuant: Quantized-for-all Architecture Search with Robust Quantizer

Haoping Bai, Meng Cao, Ping Huang, Jiulong Shan

Keywords Paper

deep learning, optimization

0

0

0

0

4:12

03/05/2021

Practical Massively Parallel Monte-Carlo Tree Search Applied to Molecular Design

Xiufeng Yang, Tanuj Aasawat, Kazuki Yoshizoe

Keywords Paper

molecular design, Upper Confidence bound applied to Trees (UCT), parallel Monte Carlo Tree Search (MCTS)

0

0

0

0

4:59

04/11/2020

A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU/CPU Clusters

Yimin Jiang, Yibo Zhu, Chang Lan and
Bairen Yi, Yong Cui, Chuanxiong Guo

Keywords Paper

0

0

0

0

19:36

11/08/2020

A computational approach to packet classification

Alon Rashelbach, Ori Rottenstreich, Mark Silberstein

Keywords Paper

Neural Networks, Virtual Switches, Packet Classification

0

0

0

0

16:56

06/12/2021

Distributed Deep Learning In Open Collaborations

Michael Diskin, Alexey Bukhtiyarov, Max Ryabinin and
Lucile Saulnier, quentin lhoest, Anton Sinitsin, Dmitry Popov, Dmitry V. Pyrkin, Maxim Kashirin, Alexander Borzunov, Albert Villanova del Moral, Denis Mazur, Ilia Kobelev, Yacine Jernite, Thomas Wolf, Gennady Pekhimenko

Keywords Paper

deep learning, machine learning, generative model, transfer learning

0

0

0

0

8:48

04/11/2020

Rammer: Enabling Holistic Deep Learning Compiler Optimizations with rTasks

Lingxiao Ma, Zhiqiang Xie, Zhi Yang and
Jilong Xue, Youshan Miao, Wei Cui, Wenxiang Hu, Fan Yang, Lintao Zhang, Lidong Zhou

Keywords Paper

0

0

0

0

16:30

04/11/2020

A Tensor Compiler for Unified Machine Learning Prediction Serving

Supun Nakandala, Karla Saur, Gyeong-In Yu and
Konstantinos Karanasos, Carlo Curino, Markus Weimer, Matteo Interlandi

Keywords Paper

0

0

0

0

19:56

06/12/2020

BanditPAM: Almost Linear Time k-Medoids Clustering via Multi-Armed Bandits

Mo Tiwari, Martin Zhang, James J Mayclin and
Sebastian Thrun, Chris Piech, Ilan Shomorony

Keywords Paper

0

0

0

0

3:16

05/04/2021

Horizontally Fused Training Array: An Effective Hardware Utilization Squeezer for Training Novel Deep Learning Models

Shang Wang, Peiming Yang, Yuxuan Zheng and
Xin Li, Gennady Pekhimenko

Keywords Paper

Theory -> Statistical Physics of Learning, Optimization -> Non-Convex Optimization

0

0

0

0

20:09

05/04/2021

Horizontally Fused Training Array: An Effective Hardware Utilization Squeezer for Training Novel Deep Learning Models

Shang Wang, Peiming Yang, Yuxuan Zheng and
Xin Li, Gennady Pekhimenko

Keywords Paper

Theory -> Statistical Physics of Learning, Optimization -> Non-Convex Optimization

0

0

0

0

4:46

04/11/2020

PACEMAKER: Avoiding HeART attacks in storage clusters with disk-adaptive redundancy

Saurabh Kadekodi, Francisco Maturana, Suhas Jayaram Subramanya and
Juncheng Yang, K. V. Rashmi, Gregory R. Ganger

Keywords Paper

0

0

0

0

19:08

12/07/2020

Multi-Precision Policy Enforced Training (MuPPET) : A Precision-Switching Strategy for Quantised Fixed-Point Training of CNNs

Aditya Rajagopal, Diederik Vink, Stylianos Venieris, Christos-Savvas Bouganis

Keywords Paper

Applications - Other

0

0

0

0

15:30

06/12/2020

Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts

Max Ryabinin, Anton Gusev

Keywords Paper

0

0

0

0

3:12

26/04/2020

Linear Symmetric Quantization of Neural Networks for Low-precision Integer Hardware

Xiandong Zhao, Ying Wang, Xuyi Cai and
Cheng Liu, Lei Zhang

Keywords Paper

quantization, integer-arithmetic-only DNN accelerator, acceleration

0

0

0

0

4:43

16/11/2020

MuGNet: Multi-Resolution Graph Neural Network for Segmenting Large-Scale Pointclouds

Liuyue Xie, Tomotake Furuhata, Kenji Shimada

Keywords Paper

0

0

0

0

3:10

30/11/2020

Sparse Convolutions on Continuous Domains for Point Cloud and Event Stream Networks

Dominic Jack, Frederic Maire, Simon Denman, Anders Eriksson

Keywords Paper

0

0

0

0

9:38

18/07/2021

Training Graph Neural Networks with 1000 Layers

Guohao Li, Matthias Müller, Bernard Ghanem, Vladlen Koltun

Keywords Paper

Algorithms, Large Scale Learning

0

0

0

0

5:14

12/07/2020

Sample Factory: Egocentric 3D Control from Pixels at 100000 FPS with Asynchronous Reinforcement Learning

Aleksei Petrenko, Zhehui Huang, Tushar Kumar and
Gaurav Sukhatme, Vladlen Koltun

Keywords Paper

Reinforcement Learning - Deep RL

0

0

0

0

15:56

06/12/2020

Efficient Algorithms for Device Placement of DNN Graph Operators

Jakub Tarnawski, Amar Phanishayee, Nikhil Devanur and
Divya Mahajan, Fanny Nina Paravecino

Keywords Paper

0

0

1

0

3:20

04/11/2020

PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications

Zhihao Bai, Zhen Zhang, Yibo Zhu, Xin Jin

Keywords Paper

0

0

0

0

19:44

06/12/2021

Delayed Gradient Averaging: Tolerate the Communication Latency for Federated Learning

Ligeng Zhu, Hongzhou Lin, Yao Lu and
Yujun Lin, Song Han

Keywords Paper

optimization, machine learning, federated learning

0

0

0

1

14:48

15/06/2020

DADI: Block-Level Image Service for Agile and Elastic Application Deployment

Huiba Li, Yifan Yuan, Rui Du and
Kai Ma, Lanzheng Liu, Windsor Hsu

Keywords Paper

0

0

0

0

20:47

26/04/2020

Once for All: Train One Network and Specialize it for Efficient Deployment

Han Cai, Chuang Gan, Tianzhe Wang and
Zhekai Zhang, Song Han

Keywords Paper

Efficient Deep Learning, Specialized Neural Network Architecture, AutoML

0

0

0

0

4:53

06/12/2020

MCUNet: Tiny Deep Learning on IoT Devices

Ji Lin, Wei-Ming Chen, Yujun Lin and
john cohn, Chuang Gan, Song Han

Keywords Paper

0

0

0

0

3:13

23/06/2021

Frequent Background Polling on a Shared Thread, using Light-Weight Compiler Interrupts

Nilanjana Basu, Claudio Montanari, Jakob Eriksson

Keywords Paper

compiler interrupts, fine-grained thread sharing, control flow graph analysis and transformation, code instrumentation, efficient polling

0

0

0

0

15:32

15/06/2020

Fine-Grained Isolation for Scalable, Dynamic, Multi-tenant Edge Clouds

Yuxin Ren, Guyue Liu, Vlad Nitu and
Wenyuan Shao, Riley Kennedy, Gabriel Parmer, Timothy Wood, Alain Tchana

Keywords Paper

0

0

0

0

20:42

15/11/2020

Shiftry: RNN Inference in 2KB of RAM

Aayan Kumar, Vivek Seshadri, Rahul Sharma

Keywords Paper

Programming language, Fixed-point, Memory management, Machine learning, Embedded devices, Compiler, IoT device

0

0

0

0

16:06

14/09/2020

Learning I/O Access patterns to Improve Prefetching in SSDs

Chandranil Chakraborttii, Heiner Litz

Keywords Paper

prefetching, neural network, flash

0

0

0

0

14:20

19/08/2021

Challenges and Opportunities of Building Fast GBDT Systems

Zeyi Wen, Qinbin Li, Bingsheng He, Bin Cui

Keywords Paper

Machine learning, General

0

0

0

0

14:56

05/04/2021

TT-Rec: Tensor Train Compression for Deep Learning Recommendation Models

Chunxing Yin, Bilge Acun, Carole-Jean Wu, Xing Liu

Keywords Paper

0

0

0

0

5:15

05/04/2021

TT-Rec: Tensor Train Compression for Deep Learning Recommendation Models

Chunxing Yin, Bilge Acun, Carole-Jean Wu, Xing Liu

Keywords Paper

0

0

0

0

23:05

15/11/2020

Dynamic Dispatch of Context-Sensitive Optimizations

Gabriel Poesia, Fernando Magno Quintão Pereira

Keywords Paper

Dynamic dispatch, Compiler, Context-sensitive optimization

0

0

0

0

9:10

02/02/2021

Efficient On-Chip Learning for Optical Neural Networks Through Power-Aware Sparse Zeroth-Order Optimization

Jiaqi Gu, Chenghao Feng, Zheng Zhao and
Zhoufeng Ying, Ray T. Chen, David Z. Pan

Keywords Paper

0

0

0

0

15:32

03/05/2021

Optimizing Memory Placement using Evolutionary Graph Reinforcement Learning

Shauharda Khadka, Estelle Aflalo, Mattias Marder and
Avrech Ben-David, Santiago Miret, Shie Mannor, Tamir Hazan, Hanlin Tang, Somdeb Majumdar

Keywords Paper

Evolutionary Algorithms, Device Placement, Memory Mapping, Reinforcement Learning

0

0

0

0

5:49

25/06/2020

Understanding and Tackling the Hidden Memory Latency for Edge-based Heterogeneous Platform

Zhendong Wang, Zhen Wang, Cong Liu, Yang Hu

Keywords Paper

0

0

0

0

11:11

05/04/2021

Accelerating SLIDE Deep Learning on Modern CPUs: Vectorization, Quantizations, Memory Optimizations, and More

Shabnam Daghaghi, Nicholas Meisburger, Mengnan Zhao, Anshumali Shrivastava

Keywords Paper

0

0

0

0

5:31

05/04/2021

Accelerating SLIDE Deep Learning on Modern CPUs: Vectorization, Quantizations, Memory Optimizations, and More

Shabnam Daghaghi, Nicholas Meisburger, Mengnan Zhao, Anshumali Shrivastava

Keywords Paper

0

0

0

0

20:29

17/08/2020

A massively parallel and scalable multi-GPU material point method

Xinlei Wang, Yuxing Qiu, Stuart R. Slattery and
Yu Fang, Minchen Li, Song-Chun Zhu, Yixin Zhu, Min Tang, Dinesh Manocha, Chenfanfu Jiang

Keywords Paper

numerical methods, parallel computing, GPU

0

0

0

0

18:35

05/04/2021

Wavelet: Efficient DNN Training with Tick-Tock Scheduling

Guanhua Wang, Kehan Wang, Kenan Jiang and
XIANGJUN LI, Ion Stoica

Keywords Paper

0

0

0

0

17:49