Augment Your Batch: Improving Generalization Through Instance Repetition

Abstract: Large-batch SGD is important for scaling training of deep neural networks. However, without fine-tuning hyperparameter schedules, the generalization of the model may be hampered. We propose to use batch augmentation: replicating instances of samples within the same batch with different data augmentations. Batch augmentation acts as a regularizer and an accelerator, increasing both generalization and performance scaling for a fixed budget of optimization steps. We analyze the effect of batch augmentation on gradient variance and show that it empirically improves convergence for a wide variety of networks and datasets. Our results show that batch augmentation reduces the number of necessary SGD updates to achieve the same accuracy as the state-of-the-art. Overall, this simple yet effective method enables faster training and better generalization by allowing more computational resources to be used concurrently.

26/04/2020

Augment Your Batch: Improving Generalization Through Instance Repetition

Elad Hoffer, Tal Ben-Nun, Itay Hubara, Niv Giladi, Torsten Hoefler, Daniel Soudry

Comments

Similar Papers

Don't Use Large Mini-batches, Use Local SGD

Tao Lin, Sebastian U. Stich, Kumar Kshitij Patel, Martin Jaggi

Keywords Abstract Paper

Pipelined Backpropagation at Scale: Training Large Models without Batches

Atli Kosson, Vitaliy Chiley, Abhi Venigalla and Joel Hestness, Urs Koster

Keywords Abstract Paper

Pipelined Backpropagation at Scale: Training Large Models without Batches

Atli Kosson, Vitaliy Chiley, Abhi Venigalla and Joel Hestness, Urs Koster

Keywords Abstract Paper

Improving Neural Network Training in Low Dimensional Random Bases

Frithjof Gressmann, Zach Eaton-Rosen, Carlo Luschi

Keywords Abstract Paper

Provable Benefits of Overparameterization in Model Compression: From Double Descent to Pruning Neural Networks

Xiangyu Chang, Yingcong Li, Samet Oymak, Christos Thrampoulidis

Keywords Abstract Paper

Boost Neural Networks by Checkpoints

Feng Wang, Guoyizhe Wei, Qiao Liu and Jinxiang Ou, xian wei, Hairong Lv

Keywords Abstract Paper

deep learning

Reducing the Computational Cost of Deep Generative Models with Binary Neural Networks

Thomas Bird, Friso Kingma, David Barber

Keywords Abstract Paper

generative, binary, optimization, compression

SpikeGrad: An ANN-equivalent Computation Model for Implementing Backpropagation with Spikes

Johannes C. Thiele, Olivier Bichler, Antoine Dupret

Keywords Abstract Paper

spiking neural network, neuromorphic engineering, backpropagation

On the Generalization Benefit of Noise in Stochastic Gradient Descent

Samuel Smith, Erich Elsen, Soham De

Keywords Abstract Paper

Deep Learning - General

Sparse Flows: Pruning Continuous-depth Models

Lucas Liebenwein, Ramin Hasani, Alexander Amini, Daniela Rus

Keywords Abstract Paper

deep learning, generative model

An Empirical Study of Stochastic Gradient Descent with Structured Covariance Noise

Yeming Wen, Kevin Luk, Maxime Gazeau and Guodong Zhang, Harris Chan, Jimmy Ba

Keywords Abstract Paper

Collegial Ensembles

Etai Littwin, Ben Myara, Sima Sabah and Joshua Susskind, Shuangfei Zhai, Oren Golan

Keywords Abstract Paper

GCN meets GPU: Decoupling “When to Sample” from “How to Sample”

Morteza Ramezani, Weilin Cong, Mehrdad Mahdavi and Anand Sivasubramaniam, Mahmut Kandemir

Keywords Abstract Paper

Fitting the Search Space of Weight-sharing NAS with Graph Convolutional Networks

Xin Chen, Lingxi Xie, Jun Wu and Longhui Wei, Yuhui Xu, Qi Tian

Keywords Abstract Paper

Self Normalizing Flows

T. Anderson Keller, Jorn Peters, Priyank Jaini and Emiel Hoogeboom, Patrick Forré, Max Welling

Keywords Abstract Paper

Deep Learning, Generative Models

EvoGrad: Efficient Gradient-Based Meta-Learning and Hyperparameter Optimization

Ondrej Bohdal, Yongxin Yang, Timothy Hospedales

Keywords Abstract Paper

deep learning, optimization, graph learning, meta learning, few shot learning

ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training

Jianfei Chen, Lianmin Zheng, Zhewei Yao and Dequan Wang, Ion Stoica, Michael Mahoney, Joseph E Gonzalez

Keywords Abstract Paper

Algorithms, Large Scale Learning

Dynamically Adjusting Transformer Batch Size by Monitoring Gradient Direction Change

Hongfei Xu, Josef van Genabith, Deyi Xiong, Qiuhui Liu

Keywords Abstract Paper

Dynamically Size, Monitoring Change, accelerating convergence, training

BatchQuant: Quantized-for-all Architecture Search with Robust Quantizer

Haoping Bai, Meng Cao, Ping Huang, Jiulong Shan

Keywords Abstract Paper

deep learning, optimization

Learning Neural Network Subspaces

Mitchell Wortsman, Maxwell Horton, Carlos Guestrin and Ali Farhadi, Mohammad Rastegari

Keywords Abstract Paper

Deep Learning, Applications, Dialog- or Communication-Based Learning, Algorithms, Representation Learning

Optimization Theory for ReLU Neural Networks Trained with Normalization Layers

Yonatan Dukler, Quanquan Gu, Guido Montufar

Keywords Abstract Paper

Deep Learning - Theory

Shifted and Squeezed 8-bit Floating Point format for Low-Precision Training of Deep Neural Networks

Keywords Paper

Atli Kosson, Vitaliy Chiley, Abhi Venigalla and
Joel Hestness, Urs Koster

Keywords Paper

Atli Kosson, Vitaliy Chiley, Abhi Venigalla and
Joel Hestness, Urs Koster

Keywords Paper

Keywords Paper

Keywords Paper

Feng Wang, Guoyizhe Wei, Qiao Liu and
Jinxiang Ou, xian wei, Hairong Lv

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Yeming Wen, Kevin Luk, Maxime Gazeau and
Guodong Zhang, Harris Chan, Jimmy Ba

Keywords Paper

Etai Littwin, Ben Myara, Sima Sabah and
Joshua Susskind, Shuangfei Zhai, Oren Golan

Keywords Paper

Morteza Ramezani, Weilin Cong, Mehrdad Mahdavi and
Anand Sivasubramaniam, Mahmut Kandemir

Keywords Paper

Xin Chen, Lingxi Xie, Jun Wu and
Longhui Wei, Yuhui Xu, Qi Tian

Keywords Paper

T. Anderson Keller, Jorn Peters, Priyank Jaini and
Emiel Hoogeboom, Patrick Forré, Max Welling

Keywords Paper

Keywords Paper

Jianfei Chen, Lianmin Zheng, Zhewei Yao and
Dequan Wang, Ion Stoica, Michael Mahoney, Joseph E Gonzalez

Keywords Paper

Keywords Paper

Keywords Paper

Mitchell Wortsman, Maxwell Horton, Carlos Guestrin and
Ali Farhadi, Mohammad Rastegari

Keywords Paper

Keywords Paper

Leopold Cambier, Anahita Bhiwandiwalla, Ting Gong and
Oguz H. Elibol, Mehran Nekuii, Hanlin Tang

Keywords Paper

Keywords Paper

Setareh Ariafar, Zelda Mariet, Dana Brooks and
Jennifer Dy, Jasper Snoek

Keywords Paper

Mark Kurtz, Justin Kopinsky, Rati Gelashvili and
Alexander Matveev, John Carr, Michael Goin, William Leiserson, Sage Moore, Nir Shavit, Dan Alistarh

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Tao Lin, Sebastian U. Stich, Luis Barba and
Daniil Dmitriev, Martin Jaggi

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Jiawei Huang, Ruomin Huang, wenjie liu and
Nikolaos Freris, Hu Ding

Keywords Paper

Keywords Paper

Jonathan Schwarz, Siddhant M Jayakumar, Razvan Pascanu and
Peter E Latham, Yee Teh

Keywords Paper

Keywords Paper