Accumulated Decoupled Learning with Gradient Staleness Mitigation for Convolutional Neural Networks

18/07/2021

Accumulated Decoupled Learning with Gradient Staleness Mitigation for Convolutional Neural Networks

Huiping Zhuang, Zhenyu Weng, Fulin Luo, Kar-Ann Toh, Haizhou Li, Zhiping Lin

Keywords: Optimization, Distributed and Parallel Optimization

Abstract Paper Similar Papers

Abstract: Gradient staleness is a major side effect in decoupled learning when training convolutional neural networks asynchronously. Existing methods that ignore this effect might result in reduced generalization and even divergence. In this paper, we propose an accumulated decoupled learning (ADL), which includes a module-wise gradient accumulation in order to mitigate the gradient staleness. Unlike prior arts ignoring the gradient staleness, we quantify the staleness in such a way that its mitigation can be quantitatively visualized. As a new learning scheme, the proposed ADL is theoretically shown to converge to critical points in spite of its asynchronism. Extensive experiments on CIFAR-10 and ImageNet datasets are conducted, demonstrating that ADL gives promising generalization results while the state-of-the-art methods experience reduced generalization and divergence. In addition, our ADL is shown to have the fastest training speed among the compared methods.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICML 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

02/02/2021

Tackling Instance-Dependent Label Noise via a Universal Probabilistic Model

Qizhou Wang, Bo Han, Tongliang Liu and
Gang Niu, Jian Yang, Chen Gong

Keywords Paper

0

0

0

0

14:56

02/02/2021

Distribution Adaptive INT8 Quantization for Training CNNs

Kang Zhao, Sida Huang, Pan Pan and
Yinghan Li, Yingya Zhang, Zhenyu Gu, Yinghui Xu

Keywords Paper

0

0

0

0

16:42

14/06/2020

On the Acceleration of Deep Learning Model Parallelism With Staleness

An Xu, Zhouyuan Huo, Heng Huang

Keywords Paper

layer-wise staleness, asynchronous model parallelism, convolutional neural networks.

0

0

0

0

1:01

06/12/2020

Improved Analysis of Clipping Algorithms for Non-convex Optimization

Bohang Zhang, Jikai Jin, Cong Fang, Liwei Wang

Keywords Paper

0

0

0

0

3:16

02/02/2021

Amata: An Annealing Mechanism for Adversarial Training Acceleration

Nanyang Ye, Qianxiao Li, Xiao-Yun Zhou, Zhanxing Zhu

Keywords Paper

0

0

0

0

14:30

14/06/2020

Conditional Channel Gated Networks for Task-Aware Continual Learning

Davide Abati, Jakub Tomczak, Tijmen Blankevoort and
Simone Calderara, Rita Cucchiara, Babak Ehteshami Bejnordi

Keywords Paper

continual learning, channel gating, conditional computation, incremental learning, lifelong learning, hard attention

0

0

0

0

5:01

06/12/2020

Towards Better Generalization of Adaptive Gradient Methods

Yingxue Zhou, Belhal Karimi, Jinxing Yu and
Zhiqiang Xu, Ping Li

Keywords Paper

0

0

0

0

3:21

02/02/2021

Step-Ahead Error Feedback for Distributed Training with Compressed Gradient

An Xu, Zhouyuan Huo, Heng Huang

Keywords Paper

0

0

0

0

18:26

12/07/2020

Extrapolation for Large-batch Training in Deep Learning

Tao LIN, Lingjing Kong, Sebastian Stich, Martin Jaggi

Keywords Paper

Deep Learning - Algorithms

0

0

0

0

13:21

06/12/2020

Gradient-EM Bayesian Meta-Learning

Yayi Zou, Xiaoqi Lu

Keywords Paper

0

0

0

0

3:23

06/12/2021

Can we have it all? On the Trade-off between Spatial and Adversarial Robustness of Neural Networks

Sandesh Kamath, Amit Deshpande, Subrahmanyam Kambhampati Venkata, Vineeth N Balasubramanian

Keywords Paper

deep learning, robustness, adversarial robustness and security

0

0

0

0

14:57

18/07/2021

Uniform Convergence, Adversarial Spheres and a Simple Remedy

Gregor Bachmann, Seyed Moosavi, Thomas Hofmann

Keywords Paper

Theory, Deep learning Theory

0

2

0

0

5:52

06/12/2021

End-to-end reconstruction meets data-driven regularization for inverse problems

Subhadip Mukherjee, Marcello Carioni, Ozan Öktem, Carola-Bibiane Schönlieb

Keywords Paper

deep learning, graph learning

0

0

0

0

13:12

14/06/2020

Fixed-Point Back-Propagation Training

Xishan Zhang, Shaoli Liu, Rui Zhang and
Chang Liu, Di Huang, Shiyi Zhou, Jiaming Guo, Qi Guo, Zidong Du, Tian Zhi, Yunji Chen

Keywords Paper

network quantization, fixed-point training, deep learning, neural network

1

0

0

0

1:01

06/12/2021

Boost Neural Networks by Checkpoints

Feng Wang, Guoyizhe Wei, Qiao Liu and
Jinxiang Ou, xian wei, Hairong Lv

Keywords Paper

deep learning

1

0

0

0

4:45

07/09/2020

Transferring Pretrained Networks to Small Data via Category Decorrelation

Ying Jin, Zhangjie Cao, Mingsheng Long, Jianmin Wang

Keywords Paper

Category Decorrelation, Under Transfer

1

1

0

0

8:39

12/07/2020

Adversarial Filters of Dataset Biases

Ronan Le Bras, Swabha Swayamdipta, Chandra Bhagavatula and
Rowan Zellers, Matthew Peters, Ashish Sabharwal, Yejin Choi

Keywords Paper

Deep Learning - Algorithms

0

0

0

0

15:25

12/07/2020

Fast and Three-rious: Speeding Up Weak Supervision with Triplet Methods

Dan Fu, Mayee Chen, Frederic Sala and
Sarah Hooper, Kayvon Fatahalian, Christopher Re

Keywords Paper

Probabilistic Inference - Models and Probabilistic Programming

0

0

0

0

15:01

26/04/2020

On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning

Jian Li, Xuanyuan Luo, Mingda Qiao

Keywords Paper

learning theory, generalization, nonconvex learning, stochastic gradient descent, Langevin dynamics

0

0

0

0

4:50

02/02/2021

Fast and Scalable Adversarial Training of Kernel SVM via Doubly Stochastic Gradients

Huimin Wu, Zhengmian Hu, Bin Gu

Keywords Paper

0

0

0

0

14:04

06/12/2020

Temporal Spike Sequence Learning via Backpropagation for Deep Spiking Neural Networks

Wenrui Zhang, Peng Li

Keywords Paper

0

0

0

0

3:06

18/07/2021

RATT: Leveraging Unlabeled Data to Guarantee Generalization

Saurabh Garg, Sivaraman Balakrishnan, Zico Kolter, Zachary Lipton

Keywords Paper

Probabilistic Methods, Graphical Models, Theory, Computational Complexity, Theory, Models of Learning and Generalization

0

0

0

1

17:27

18/07/2021

Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning

Yue Wu, Shuangfei Zhai, Nitish Srivastava and
Josh Susskind, Jian Zhang, Russ Salakhutdinov, Hanlin Goh

Keywords Paper

Reinforcement Learning and Planning, Deep RL

0

0

0

0

5:01

14/06/2020

Single-Step Adversarial Training With Dropout Scheduling

Vivek B.S., R. Venkatesh Babu

Keywords Paper

adversarial training, robustness, efficient training, representation learning, generalization, supervised learning, recognition, classification, neural networks, deep learning

0

0

0

0

1:01

06/12/2020

Understanding and Improving Fast Adversarial Training

Maksym Andriushchenko, Nicolas Flammarion

Keywords Paper

0

0

0

0

3:23

18/07/2021

Nondeterminism and Instability in Neural Network Optimization

Cecilia Summers, Michael J Dinneen

Keywords Paper

Deep Learning, Optimization for Deep Networks

0

0

0

0

5:12

03/05/2021

Robust Curriculum Learning: from clean label detection to noisy label self-correction

Tianyi Zhou, Shengjie Wang, Jeff Bilmes

Keywords Paper

neural networks, curriculum learning, training dynamics, robust learning, noisy label

0

0

0

0

5:02

13/04/2021

On the generalization properties of adversarial training

Yue Xing, Qifan Song, Guang Cheng

Keywords Paper

0

0

0

0

3:05

06/12/2021

COMBO: Conservative Offline Model-Based Policy Optimization

Tianhe Yu, Aviral Kumar, Rafael Rafailov and
Aravind Rajeswaran, Sergey Levine, Chelsea Finn

Keywords Paper

deep learning, optimization, reinforcement learning and planning

0

0

0

0

12:35

06/12/2020

Adversarial Self-Supervised Contrastive Learning

Minseon Kim, Jihoon Tack, Sung Ju Hwang

Keywords Paper

0

0

0

0

3:19

02/02/2021

Improving Sample Efficiency in Model-Free Reinforcement Learning from Images

Denis Yarats, Amy Zhang, Ilya Kostrikov and
Brandon Amos, Joelle Pineau, Rob Fergus

Keywords Paper

0

0

0

0

12:19

03/05/2021

How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks

Keyulu Xu, Mozhi Zhang, Jingling Li and
Simon Du, Ken-Ichi Kawarabayashi, Stefanie Jegelka

Keywords Paper

graph neural networks, out-of-distribution, deep learning, extrapolation, deep learning theory

0

0

0

1

17:06

14/06/2020

Towards Unified INT8 Training for Convolutional Neural Network

Feng Zhu, Ruihao Gong, Fengwei Yu and
Xianglong Liu, Yanfei Wang, Zhelong Li, Xiuqi Yang, Junjie Yan

Keywords Paper

int8 training, gradient quantization, direction sensitive gradient clipping, learning rate scaling, gradient distribution

0

0

0

0

1:01

19/08/2021

Contrastive Model Invertion for Data-Free Knolwedge Distillation

Gongfan Fang, Jie Song, Xinchao Wang and
Chengchao Shen, Xingen Wang, Mingli Song

Keywords Paper

Machine Learning, Deep Learning, Explainable/Interpretable Machine Learning, Transfer, Adaptation, Multi-task Learning

0

0

0

0

5:51

16/11/2020

Multi-document Summarization with Maximal Marginal Relevance-guided Reinforcement Learning

Yuning Mao, Yanru Qu, Yiqing Xie and
Xiang Ren, Jiawei Han

Keywords Paper

single-document summarization, single-document sds, multi-document summarization, multi-document mds

0

0

0

0

10:58

26/04/2020

SpikeGrad: An ANN-equivalent Computation Model for Implementing Backpropagation with Spikes

Johannes C. Thiele, Olivier Bichler, Antoine Dupret

Keywords Paper

spiking neural network, neuromorphic engineering, backpropagation

0

0

0

0

5:21

14/06/2020

Proxy Anchor Loss for Deep Metric Learning

Sungyeon Kim, Dongwon Kim, Minsu Cho, Suha Kwak

Keywords Paper

deep metric learning, proxy-based loss, image retrieval

0

0

0

0

1:01

03/05/2021

Removing Undesirable Feature Contributions Using Out-of-Distribution Data

Saehyung Lee, Changhwa Park, Hyungyu Lee and
Jihun Yi, Jonghyun Lee, Sungroh Yoon

Keywords Paper

adversarial training, generalization, out-of-distribution, adversarial robustness

0

1

0

1

5:05

14/06/2020

Learning to Forget for Meta-Learning

Sungyong Baik, Seokil Hong, Kyoung Mu Lee

Keywords Paper

meta learning, few-shot learning, reinforcement learning

0

0

0

0

1:01

18/07/2021

Towards Better Robust Generalization with Shift Consistency Regularization

Shufei Zhang, Zhuang Qian, Kaizhu Huang and
Qiufeng Wang, Rui Zhang, Xinping Yi

Keywords Paper

Algorithms, Adversarial Examples

0

0

0

0

5:44