Pipelined Backpropagation at Scale: Training Large Models without Batches

05/04/2021

Pipelined Backpropagation at Scale: Training Large Models without Batches

Atli Kosson, Vitaliy Chiley, Abhi Venigalla, Joel Hestness, Urs Koster

Keywords:

Abstract Paper Similar Papers

Abstract: New hardware can substantially increase the speed and efficiency of deep neural network training. To guide the development of future hardware architectures, it is pertinent to explore the hardware and machine learning properties of alternative training algorithms. In this work we evaluate the use of small batch, fine-grained Pipelined Backpropagation, an asynchronous pipeline parallel training algorithm that has significant hardware advantages. We introduce two methods, Spike Compensation and Linear Weight Prediction, that effectively mitigate the downsides caused by the asynchronicity of Pipelined Backpropagation and outperform existing techniques in our setting. We show that appropriate normalization and small batch sizes can also aid training. With our methods, fine-grained Pipelined Backpropagation using a batch size of one can match the accuracy of SGD for multiple networks trained on CIFAR-10 and ImageNet. Simple scaling rules allow the use of existing hyperparameters for traditional training without additional tuning.

The video of this talk cannot be embedded. You can watch it here:

https://slideslive.com/38952698

(Link will open in new window)

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at MLSYS 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

05/04/2021

Pipelined Backpropagation at Scale: Training Large Models without Batches

Atli Kosson, Vitaliy Chiley, Abhi Venigalla and
Joel Hestness, Urs Koster

Keywords Paper

0

0

0

0

18:00

14/06/2020

Augment Your Batch: Improving Generalization Through Instance Repetition

Elad Hoffer, Tal Ben-Nun, Itay Hubara and
Niv Giladi, Torsten Hoefler, Daniel Soudry

Keywords Paper

generalization, augmentation, regularization, large-batch, deep-learning, convolutional-networks

0

0

0

0

1:00

18/07/2021

Learning Neural Network Subspaces

Mitchell Wortsman, Maxwell Horton, Carlos Guestrin and
Ali Farhadi, Mohammad Rastegari

Keywords Paper

Deep Learning, Applications, Dialog- or Communication-Based Learning, Algorithms, Representation Learning

0

0

0

0

5:07

05/04/2021

PipeMare: Asynchronous Pipeline Parallel DNN Training

Bowen Yang, Jian Zhang, Jonathan Li and
Christopher Re, Christopher Aberger, Christopher De Sa

Keywords Paper

0

0

0

0

16:57

03/05/2021

Dataset Meta-Learning from Kernel Ridge-Regression

Timothy Nguyen, Zhourong Chen, Jaehoon Lee

Keywords Paper

dataset corruption, infinite-width networks, neural kernels, kernel-ridge regression, dataset compression, dataset distillation, meta-learning

0

0

0

0

4:59

26/04/2020

Shifted and Squeezed 8-bit Floating Point format for Low-Precision Training of Deep Neural Networks

Leopold Cambier, Anahita Bhiwandiwalla, Ting Gong and
Oguz H. Elibol, Mehran Nekuii, Hanlin Tang

Keywords Paper

Low-precision training, numerics, deep learning

0

0

0

0

4:46

02/02/2021

Fitting the Search Space of Weight-sharing NAS with Graph Convolutional Networks

Xin Chen, Lingxi Xie, Jun Wu and
Longhui Wei, Yuhui Xu, Qi Tian

Keywords Paper

0

0

0

0

15:02

06/12/2020

ShiftAddNet: A Hardware-Inspired Deep Network

Haoran You, Xiaohan Chen, Yongan Zhang and
Chaojian Li, Sicheng Li, Zihao Liu, Zhangyang Wang, Yingyan Lin

Keywords Paper

0

0

0

0

3:25

02/02/2021

Provable Benefits of Overparameterization in Model Compression: From Double Descent to Pruning Neural Networks

Xiangyu Chang, Yingcong Li, Samet Oymak, Christos Thrampoulidis

Keywords Paper

0

0

0

0

18:14

11/08/2020

A computational approach to packet classification

Alon Rashelbach, Ori Rottenstreich, Mark Silberstein

Keywords Paper

Neural Networks, Virtual Switches, Packet Classification

0

0

0

0

16:56

06/12/2020

MomentumRNN: Integrating Momentum into Recurrent Neural Networks

Tan Nguyen, Richard Baraniuk, Andrea Bertozzi and
Stanley Osher, Bao Wang

Keywords Paper

0

0

0

0

3:09

26/04/2020

Training Recurrent Neural Networks Online by Learning Explicit State Variables

Somjit Nath, Vincent Liu, Alan Chan and
Xin Li, Adam White, Martha White

Keywords Paper

Recurrent Neural Network, Partial Observability, Online Prediction, Incremental Learning

0

0

0

0

5:06

02/02/2021

Any-Precision Deep Neural Networks

Haichao Yu, Haoxiang Li, Humphrey Shi and
Thomas S. Huang, Gang Hua

Keywords Paper

0

0

0

0

14:26

26/04/2020

Chameleon: Adaptive Code Optimization for Expedited Deep Neural Network Compilation

Byung Hoon Ahn, Prannoy Pilligundla, Amir Yazdanbakhsh, Hadi Esmaeilzadeh

Keywords Paper

Reinforcement Learning, Learning to Optimize, Combinatorial Optimization, Compilers, Code Optimization, Neural Networks, ML for Systems, Learning for Systems

0

0

0

0

4:55

18/07/2021

Opening the Blackbox: Accelerating Neural Differential Equations by Regularizing Internal Solver Heuristics

Avik Pal, Yingbo Ma, Viral Shah, Christopher Rackauckas

Keywords Paper

Deep Learning

0

0

0

0

5:11

06/12/2021

BatchQuant: Quantized-for-all Architecture Search with Robust Quantizer

Haoping Bai, Meng Cao, Ping Huang, Jiulong Shan

Keywords Paper

deep learning, optimization

0

0

0

0

4:12

02/02/2021

Large Batch Optimization for Deep Learning Using New Complete Layer-Wise Adaptive Rate Scaling

Zhouyuan Huo, Bin Gu, Heng Huang

Keywords Paper

0

0

0

0

15:17

08/12/2020

Dynamic Curriculum Learning for Low-Resource Neural Machine Translation

Chen Xu, Bojie Hu, Yufan Jiang and
Kai Feng, Zeyang Wang, Shen Huang, Qi Ju, Tong Xiao, Jingbo Zhu

Keywords Paper

0

0

0

0

13:28

06/12/2020

Certified Monotonic Neural Networks

Xingchao Liu, Aaron Han, Na Zhang, Qiang Liu

Keywords Paper

0

0

0

0

3:22

03/05/2021

Meta-Learning with Neural Tangent Kernels

Yufan Zhou, Zhenyi Wang, Jiayi Xian and
Changyou Chen, Jinhui Xu

Keywords Paper

neural tangent kernel, meta-learning

0

0

0

0

3:54

06/12/2021

GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training

Chen Zhu, Renkun Ni, Zheng Xu and
Kezhi Kong, W. Ronny Huang, Tom Goldstein

Keywords Paper

deep learning, transformers, vision

0

0

0

0

13:17

18/07/2021

ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training

Jianfei Chen, Lianmin Zheng, Zhewei Yao and
Dequan Wang, Ion Stoica, Michael Mahoney, Joseph E Gonzalez

Keywords Paper

Algorithms, Large Scale Learning

0

0

0

0

18:54

03/05/2021

Reducing the Computational Cost of Deep Generative Models with Binary Neural Networks

Thomas Bird, Friso Kingma, David Barber

Keywords Paper

generative, binary, optimization, compression

0

0

0

0

5:14

16/11/2020

Reproducible and Efficient Benchmarks for Hyperparameter Optimization of Neural Machine Translation Systems

Xuan Zhang, Kevin Duh

Keywords Paper

hyperparameter selection, neural systems, automatic optimization, nmt

0

0

0

0

11:38

12/07/2020

Learning to Rank Learning Curves

Martin Wistuba, Tejaswini Pedapati

Keywords Paper

Transfer, Multitask and Meta-learning

0

0

0

0

15:34

14/06/2020

Computing the Testing Error Without a Testing Set

Ciprian A. Corneanu, Sergio Escalera, Aleix M. Martinez

Keywords Paper

deep learning, algebraic topology, generalization, object recognition, facial analysis, semantic segmentation

0

0

0

0

4:43

06/12/2020

Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search

Houwen Peng, Hao Du, Hongyuan Yu and
QI LI, Jing Liao, Jianlong Fu

Keywords Paper

0

0

0

0

3:12

06/12/2020

Counterexample-Guided Learning of Monotonic Neural Networks

Aishwarya Sivaraman, Golnoosh Farnadi, Todd Millstein, Guy Van den Broeck

Keywords Paper

0

0

0

0

3:22

06/12/2021

NAS-Bench-x11 and the Power of Learning Curves

Shen Yan, Colin White, Yash Savani, Frank Hutter

Keywords Paper

deep learning

0

0

0

0

14:03

26/04/2020

SpikeGrad: An ANN-equivalent Computation Model for Implementing Backpropagation with Spikes

Johannes C. Thiele, Olivier Bichler, Antoine Dupret

Keywords Paper

spiking neural network, neuromorphic engineering, backpropagation

0

0

0

0

5:21

13/04/2021

Faster & more reliable tuning of neural networks: Bayesian optimization with importance sampling

Setareh Ariafar, Zelda Mariet, Dana Brooks and
Jennifer Dy, Jasper Snoek

Keywords Paper

0

0

0

0

3:01

26/04/2020

Don't Use Large Mini-batches, Use Local SGD

Tao Lin, Sebastian U. Stich, Kumar Kshitij Patel, Martin Jaggi

Keywords Paper

0

0

0

0

4:36

12/07/2020

Evolving Machine Learning Algorithms From Scratch

Esteban Real, Chen Liang, David So, Quoc Le

Keywords Paper

Transfer, Multitask and Meta-learning

0

0

0

0

15:01

18/07/2021

Sparsifying Networks via Subdifferential Inclusion

Sagar Verma, Jean-Christophe Pesquet

Keywords Paper

Optimization, Convex Optimization

0

0

0

0

5:10

14/06/2020

APQ: Joint Search for Network Architecture, Pruning and Quantization Policy

Tianzhe Wang, Kuan Wang, Han Cai and
Ji Lin, Zhijian Liu, Hanrui Wang, Yujun Lin, Song Han

Keywords Paper

efficiency, model compression, joint design, neural architecture search, channel pruning, mixed-precision quantization

0

0

0

0

1:00

12/07/2020

Small Data, Big Decisions: Model Selection in the Small-Data Regime

Jorg Bornschein, Francesco Visin, Simon Osindero

Keywords Paper

Deep Learning - General

0

0

0

0

11:47

08/12/2020

E.T.: Entity-Transformers. Coreference augmented Neural Language Model for richer mention representations via Entity-Transformer blocks

Nikolaos Stylianou, Ioannis Vlahavas

Keywords Paper

0

0

0

0

8:49

02/02/2021

Finding Sparse Structures for Domain Specific Neural Machine Translation

Jianze Liang, Chengqi Zhao, Mingxuan Wang and
Xipeng Qiu, Lei Li

Keywords Paper

0

0

0

0

14:45

06/12/2021

Boost Neural Networks by Checkpoints

Feng Wang, Guoyizhe Wei, Qiao Liu and
Jinxiang Ou, xian wei, Hairong Lv

Keywords Paper

deep learning

1

0

0

0

4:45

06/12/2021

EvoGrad: Efficient Gradient-Based Meta-Learning and Hyperparameter Optimization

Ondrej Bohdal, Yongxin Yang, Timothy Hospedales

Keywords Paper

deep learning, optimization, graph learning, meta learning, few shot learning

0

0

0

0

14:09