AdaScale SGD: A User-Friendly Algorithm for Distributed Training

12/07/2020

AdaScale SGD: A User-Friendly Algorithm for Distributed Training

Tyler Johnson, Pulkit Agrawal, Haijie Gu, Carlos Guestrin

Keywords: Optimization - Large Scale, Parallel and Distributed

Abstract Paper Similar Papers

Abstract: When using large-batch training to speed up stochastic gradient descent, learning rates must adapt to new batch sizes in order to maximize speed-ups and preserve model quality. Re-tuning learning rates is resource intensive, while fixed scaling rules often degrade model quality. We propose AdaScale SGD, an algorithm that reliably adapts learning rates to large-batch training. By continually adapting to the gradient's variance, AdaScale automatically achieves speed-ups for a wide range of batch sizes. We formally describe this quality with AdaScale’s convergence bound, which maintains final objective values, even as batch sizes grow large and the number of iterations decreases. In empirical comparisons, AdaScale trains well beyond the batch size limits of popular “linear learning rate scaling” rules. This includes large-batch training with no model degradation for machine translation, image classification, object detection, and speech recognition tasks. AdaScale's qualitative behavior is similar to that of "warm-up" heuristics, but unlike warm-up, this behavior emerges naturally from a principled mechanism. The algorithm introduces negligible computational overhead and no new hyperparameters, making AdaScale an attractive choice for large-scale training in practice.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICML 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

03/05/2021

Neurally Augmented ALISTA

Freya Behrens, Jonathan Sauder, Peter Jung

Keywords Paper

learned ISTA, unrolled algorithms, compressed sensing, sparse reconstruction

0

0

0

0

5:18

19/04/2021

Keep learning: Self-supervised meta-learning for learning from inference

Akhil Kedia, Sai Chetan Chinthakindi

Keywords Paper

0

0

0

0

11:27

18/07/2021

Opening the Blackbox: Accelerating Neural Differential Equations by Regularizing Internal Solver Heuristics

Avik Pal, Yingbo Ma, Viral Shah, Christopher Rackauckas

Keywords Paper

Deep Learning

0

0

0

0

5:11

13/04/2021

Critical parameters for scalable distributed learning with large batches and asynchronous updates

Sebastian Stich, Amirkeivan Mohtashami, Martin Jaggi

Keywords Paper

0

0

0

0

3:00

02/02/2021

Harmonized Dense Knowledge Distillation Training for Multi-Exit Architectures

Xinglu Wang, Yingming Li

Keywords Paper

0

0

0

0

15:12

06/12/2020

On Warm-Starting Neural Network Training

Jordan Ash, Ryan Adams

Keywords Paper

0

0

0

0

2:30

26/04/2020

Training Recurrent Neural Networks Online by Learning Explicit State Variables

Somjit Nath, Vincent Liu, Alan Chan and
Xin Li, Adam White, Martha White

Keywords Paper

Recurrent Neural Network, Partial Observability, Online Prediction, Incremental Learning

0

0

0

0

5:06

12/07/2020

Understanding Self-Training for Gradual Domain Adaptation

Ananya Kumar, Tengyu Ma, Percy Liang

Keywords Paper

Learning Theory

1

1

0

0

15:17

06/12/2020

Task-Robust Model-Agnostic Meta-Learning

Liam Collins, Aryan Mokhtari, Sanjay Shakkottai

Keywords Paper

0

0

0

0

3:17

18/07/2021

Large-Scale Meta-Learning with Continual Trajectory Shifting

JWoong Shin, Hae Beom Lee, Boqing Gong, Sung Ju Hwang

Keywords Paper

Algorithms, Multitask, Transfer, and Meta Learning

0

0

0

0

6:14

26/04/2020

Budgeted Training: Rethinking Deep Neural Network Training Under Resource Constraints

Mengtian Li, Ersin Yumer, Deva Ramanan

Keywords Paper

budgeted training, learning rate schedule, linear schedule, annealing, learning rate decay

0

0

0

0

5:00

26/04/2020

At Stability's Edge: How to Adjust Hyperparameters to Preserve Minima Selection in Asynchronous Training of Neural Networks?

Niv Giladi, Mor Shpigel Nacson, Elad Hoffer, Daniel Soudry

Keywords Paper

implicit bias, stability, neural networks, generalization gap, asynchronous SGD

0

0

0

0

5:03

03/05/2021

Remembering for the Right Reasons: Explanations Reduce Catastrophic Forgetting

Sayna Ebrahimi, Suzanne Petryk, Akash Gokul and
William Gan, Joseph E Gonzalez, Marcus Rohrbach, trevor darrell

Keywords Paper

Explainability, Catastrophic Forgetting, Continual Learning, XAI, Lifelong Learning

0

0

0

0

5:13

04/07/2020

Dynamically Adjusting Transformer Batch Size by Monitoring Gradient Direction Change

Hongfei Xu, Josef van Genabith, Deyi Xiong, Qiuhui Liu

Keywords Paper

Dynamically Size, Monitoring Change, accelerating convergence, training

0

0

0

0

5:51

02/02/2021

GLISTER: Generalization based Data Subset Selection for Efficient and Robust Learning

Krishnateja Killamsetty, Durga Sivasubramanian, Ganesh Ramakrishnan, Rishabh Iyer

Keywords Paper

0

0

0

0

19:14

06/12/2021

Batch Active Learning at Scale

Gui Citovsky, Giulia DeSalvo, Claudio Gentile and
Lazaros Karydas, Anand Rajagopalan, Afshin Rostamizadeh, Sanjiv Kumar

Keywords Paper

active learning

0

0

0

0

12:19

14/09/2020

Predicting Future Classifiers for Evolving Non-Linear Decision Boundaries

Kanishka Khandelwal, Devendra Dhaka, Vivek Barsopia

Keywords Paper

concept drift, data streams, classification

0

0

0

0

15:19

26/08/2020

An Empirical Study of Stochastic Gradient Descent with Structured Covariance Noise

Yeming Wen, Kevin Luk, Maxime Gazeau and
Guodong Zhang, Harris Chan, Jimmy Ba

Keywords Paper

0

0

0

0

8:44

18/07/2021

Self Normalizing Flows

T. Anderson Keller, Jorn Peters, Priyank Jaini and
Emiel Hoogeboom, Patrick Forré, Max Welling

Keywords Paper

Deep Learning, Generative Models

0

1

1

0

4:24

06/12/2021

A Minimalist Approach to Offline Reinforcement Learning

Scott Fujimoto, Shixiang (Shane) Gu

Keywords Paper

reinforcement learning and planning, generative model

1

0

0

0

8:31

07/09/2020

On the Exploration of Incremental Learning for Fine-grained Image Retrieval

Wei Chen, Yu Liu, Weiping Wang and
Tinne Tuytelaars, Erwin M. Bakker, Michael Lew

Keywords Paper

Incremental learning, Fine-grained image retrieval, Catastrophic forgetting, Maximum Mean Discrepancy

0

0

0

0

8:32

06/12/2021

Leveraging Recursive Gumbel-Max Trick for Approximate Inference in Combinatorial Spaces

Kirill Struminsky, Artyom Gadetsky, Denis Rakitin and
Danil Karpushkin, Dmitry Vetrov

Keywords Paper

deep learning, optimization

0

0

0

0

14:26

26/04/2020

SlowMo: Improving Communication-Efficient Distributed SGD with Slow Momentum

Jianyu Wang, Vinayak Tantia, Nicolas Ballas, Michael Rabbat

Keywords Paper

distributed optimization, decentralized training methods, communication-efficient distributed training with momentum, large-scale parallel SGD

0

0

0

0

5:07

03/05/2021

Denoising Diffusion Implicit Models

Jiaming Song, Chenlin Meng, Stefano Ermon

Keywords Paper

denoising score matching, variational inference, generative models, variational autoencoders

0

0

0

0

5:05

13/04/2021

Approximate data deletion from machine learning models

Zachary Izzo, Mary Anne Smart, Kamalika Chaudhuri, James Zou

Keywords Paper

0

0

0

0

3:18

06/12/2021

Efficient Generalization with Distributionally Robust Learning

Soumyadip Ghosh, Mark Squillante, Ebisa Wollega

Keywords Paper

optimization, machine learning

0

0

0

0

14:57

06/12/2020

Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate

Zhiyuan Li, Kaifeng Lyu, Sanjeev Arora

Keywords Paper

0

0

0

0

3:23

06/12/2020

Look-ahead Meta Learning for Continual Learning

Gunshi Gupta, Karmesh Yadav, Liam Paull

Keywords Paper

0

0

0

0

3:21

13/04/2021

Faster & more reliable tuning of neural networks: Bayesian optimization with importance sampling

Setareh Ariafar, Zelda Mariet, Dana Brooks and
Jennifer Dy, Jasper Snoek

Keywords Paper

0

0

0

0

3:01

03/05/2021

Gradient Projection Memory for Continual Learning

Gobinda Saha, Isha Garg, Kaushik Roy

Keywords Paper

Continual Learning, Representation Learning, Computer Vision, Deep learning

0

0

0

0

17:12

22/11/2021

Meta-learning the Learning Trends Shared Across Tasks

Jathushan Rajasegaran, Salman Khan, Munawar Hayat and
Fahad Shahbaz Khan, Mubarak Shah

Keywords Paper

Meta-learning, Few-shot learning

0

0

0

0

2:38

14/09/2020

Automatic Tuning of Stochastic Gradient Descent with Bayesian Optimisation

Victor Picheny, Vincent Dutordoir, Artem Artemev, Nicolas Durrande

Keywords Paper

learning rate, gaussian process, variational inference

0

0

0

0

15:13

18/07/2021

Improved Denoising Diffusion Probabilistic Models

Alexander Nichol, Prafulla Dhariwal

Keywords Paper

Deep Learning, Generative Models, Theory, Game Theory and Computational Economics, Reinforcement Learning and Planning, Multi-Agent RL

0

0

0

0

4:25

06/12/2020

Task-Agnostic Online Reinforcement Learning with an Infinite Mixture of Gaussian Processes

Mengdi Xu, Wenhao Ding, Jiacheng Zhu and
ZUXIN LIU, Baiming Chen, Ding Zhao

Keywords Paper

0

0

0

0

3:21

02/02/2021

Infinite Gaussian Mixture Modeling with an Improved Estimation of the Number of Clusters

Avi Matza, Yuval Bistritz

Keywords Paper

0

0

0

0

20:14

14/06/2020

Online Depth Learning Against Forgetting in Monocular Videos

Zhenyu Zhang, Stéphane Lathuilière, Elisa Ricci and
Nicu Sebe, Yan Yan, Jian Yang

Keywords Paper

depth estimation, online adaptation, domain adaptation, meta-learning, online learning

0

0

0

0

0:59

06/12/2020

Submodular Meta-Learning

Arman Adibi, Aryan Mokhtari, Hamed Hassani

Keywords Paper

0

0

0

0

3:17

12/07/2020

Online Learned Continual Compression with Adaptive Quantization Modules

Lucas Caccia, Eugene Belilovsky, Massimo Caccia, Joelle Pineau

Keywords Paper

Applications - Other

0

0

0

0

18:24

02/02/2021

Any-Precision Deep Neural Networks

Haichao Yu, Haoxiang Li, Humphrey Shi and
Thomas S. Huang, Gang Hua

Keywords Paper

0

0

0

0

14:26