Stochastic Weight Averaging in Parallel: Large-Batch Training That Generalizes Well

Abstract: We propose Stochastic Weight Averaging in Parallel (SWAP), an algorithm to accelerate DNN training. Our algorithm uses large mini-batches to compute an approximate solution quickly and then refines it by averaging the weights of multiple models computed independently and in parallel. The resulting models generalize equally well as those trained with small mini-batches but are produced in a substantially shorter time. We demonstrate the reduction in training time and the good generalization performance of the resulting models on the computer vision datasets CIFAR10, CIFAR100, and ImageNet.

26/04/2020

distributed optimization, decentralized training methods, communication-efficient distributed training with momentum, large-scale parallel SGD

5:07

26/08/2020

Stochastic Weight Averaging in Parallel: Large-Batch Training That Generalizes Well

Vipul Gupta, Santiago Akle Serrano, Dennis DeCoste

Comments

Similar Papers

SlowMo: Improving Communication-Efficient Distributed SGD with Slow Momentum

Jianyu Wang, Vinayak Tantia, Nicolas Ballas, Michael Rabbat

Keywords Abstract Paper

distributed optimization, decentralized training methods, communication-efficient distributed training with momentum, large-scale parallel SGD

Efficient Distributed Hessian Free Algorithm for Large-scale Empirical Risk Minimization via Accumulating Sample Strategy

Majid Jahani, Xi He, Chenxin Ma and Aryan Mokhtari, Dheevatsa Mudigere, Alejandro Ribeiro, Martin Takac

Keywords Abstract Paper

A Multigrid Method for Efficiently Training Video Models

Chao-Yuan Wu, Ross Girshick, Kaiming He and Christoph Feichtenhofer, Philipp Krähenbühl

Keywords Abstract Paper

efficient training, video understanding, video modeling, action recognition

GCN meets GPU: Decoupling “When to Sample” from “How to Sample”

Morteza Ramezani, Weilin Cong, Mehrdad Mahdavi and Anand Sivasubramaniam, Mahmut Kandemir

Keywords Abstract Paper

CompOFA – Compound Once-For-All Networks for Faster Multi-Platform Deployment

Manas Sahni, Shreya Varshini, Alind Khare, Alexey Tumanov

Keywords Abstract Paper

AutoML, Latency-aware Neural Architecture Search, Efficient Deep Learning

Unsupervised Learning of Visual Features by Contrasting Cluster Assignments

Mathilde Caron, Ishan Misra, Julien Mairal and Priya Goyal, Piotr Bojanowski, Armand Joulin

Keywords Abstract Paper

Sparse Weight Activation Training

Md Aamir Raihan, Tor Aamodt

Keywords Abstract Paper

EfficientNetV2: Smaller Models and Faster Training

Mingxing Tan, Quoc Le

Keywords Abstract Paper

Applications, Computer Vision

Dual-Free Stochastic Decentralized Optimization with Variance Reduction

Hadrien Hendrikx, Francis Bach, Laurent Massoulié

Keywords Abstract Paper

Efficient Learning of Generative Models via Finite-Difference Score Matching

Tianyu Pang, Kun Xu, Chongxuan LI and Yang Song, Stefano Ermon, Jun Zhu

Keywords Abstract Paper

Memory Efficient Meta-Learning with Large Images

John Bronskill, Daniela Massiceti, Massimiliano Patacchiola and Katja Hofmann, Sebastian Nowozin, Richard Turner

Keywords Abstract Paper

optimization, machine learning, vision, meta learning, transfer learning, few shot learning

Distributed Online Optimization over a Heterogeneous Network

Nima Eshraghi, Ben Liang

Keywords Abstract Paper

Optimization - Convex

On the Loss Landscape of Adversarial Training: Identifying Challenges and How to Overcome Them

Chen Liu, Mathieu Salzmann, Tao Lin and Ryota Tomioka, Sabine Süsstrunk

Keywords Abstract Paper

Algorithms -> Representation Learning, Applications -> Dialog- or Communication-Based Learning

Score-based Generative Modeling in Latent Space

Arash Vahdat, Karsten Kreis, Jan Kautz

Keywords Abstract Paper

generative model

Adaptive Gradient Quantization for Data-Parallel SGD

Fartash Faghri, Iman Tabrizian, Ilia Markov and Dan Alistarh, Dan Roy, Ali Ramezani-Kebrya

Keywords Abstract Paper

Efficient Training of Retrieval Models using Negative Cache

Erik Lindgren, Sashank Reddi, Ruiqi Guo, Sanjiv Kumar

Keywords Abstract Paper

deep learning, machine learning

Perturb-and-max-product: Sampling and learning in discrete energy-based models

Miguel Lazaro-Gredilla, Antoine Dedieu, Dileep George

Keywords Abstract Paper

generative model, graph learning

Critical parameters for scalable distributed learning with large batches and asynchronous updates

Sebastian Stich, Amirkeivan Mohtashami, Martin Jaggi

Keywords Abstract Paper

Hierarchical Multiple Kernel Clustering

Jiyuan Liu, Xinwang Liu, Siwei Wang and Sihang Zhou, Yuexiang Yang

Keywords Abstract Paper

RETRIEVE: Coreset Selection for Efficient and Robust Semi-Supervised Learning

Krishnateja Killamsetty, Xujiang Zhao, Feng Chen, Rishabh Iyer

Keywords Abstract Paper

optimization, semi-supervised learning

Random Reshuffling is Not Always Better

Christopher De Sa

Keywords Abstract Paper

Don't Waste Your Bits! Squeeze Activations and Gradients for Deep Neural Networks via TinyScript

Fangcheng Fu, Yuzheng Hu, Yihan He and Jiawei Jiang, Yingxia Shao, Ce Zhang, Bin Cui

Keywords Paper

Majid Jahani, Xi He, Chenxin Ma and
Aryan Mokhtari, Dheevatsa Mudigere, Alejandro Ribeiro, Martin Takac

Keywords Paper

Chao-Yuan Wu, Ross Girshick, Kaiming He and
Christoph Feichtenhofer, Philipp Krähenbühl

Keywords Paper

Morteza Ramezani, Weilin Cong, Mehrdad Mahdavi and
Anand Sivasubramaniam, Mahmut Kandemir

Keywords Paper

Keywords Paper

Mathilde Caron, Ishan Misra, Julien Mairal and
Priya Goyal, Piotr Bojanowski, Armand Joulin

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Tianyu Pang, Kun Xu, Chongxuan LI and
Yang Song, Stefano Ermon, Jun Zhu

Keywords Paper

John Bronskill, Daniela Massiceti, Massimiliano Patacchiola and
Katja Hofmann, Sebastian Nowozin, Richard Turner

Keywords Paper

Keywords Paper

Chen Liu, Mathieu Salzmann, Tao Lin and
Ryota Tomioka, Sabine Süsstrunk

Keywords Paper

Keywords Paper

Fartash Faghri, Iman Tabrizian, Ilia Markov and
Dan Alistarh, Dan Roy, Ali Ramezani-Kebrya

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Jiyuan Liu, Xinwang Liu, Siwei Wang and
Sihang Zhou, Yuexiang Yang

Keywords Paper

Keywords Paper

Keywords Paper

Fangcheng Fu, Yuzheng Hu, Yihan He and
Jiawei Jiang, Yingxia Shao, Ce Zhang, Bin Cui

Keywords Paper

Jianyu Wang, Qinghua Liu, Hao Liang and
Gauri Joshi, H. Vincent Poor

Keywords Paper

Somjit Nath, Vincent Liu, Alan Chan and
Xin Li, Adam White, Martha White

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Zhishuai Guo, Mingrui Liu, Zhuoning Yuan and
Li Shen, Wei Liu, Tianbao Yang

Keywords Paper

Michael Luo, Jiahao Yao, Richard Liaw and
Eric Liang, Ion Stoica

Keywords Paper

Biswajit Paria, Chih-Kuan Yeh, Ian E.H. Yen and
Ning Xu, Pradeep Ravikumar, Barnabás Póczos

Keywords Paper

Konstantin Mishchenko, Dmitry Kovalev, Egor Shulgin and
Peter Richtarik, Yura Malitsky

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Sean Sinclair, Tianyu Wang, Gauri Jain and
Sid Banerjee, Christina Yu

Keywords Paper