Stochastic Normalization

Abstract: Fine-tuning pre-trained deep networks on a small dataset is an important component in the deep learning pipeline. A critical problem in fine-tuning is how to avoid over-fitting when data are limited. Existing efforts work from two aspects: (1) impose regularization on parameters or features; (2) transfer prior knowledge to fine-tuning by reusing pre-trained parameters. In this paper, we take an alternative approach by refactoring the widely used Batch Normalization (BN) module to mitigate over-fitting. We propose a two-branch design with one branch normalized by mini-batch statistics and the other branch normalized by moving statistics. During training, two branches are stochastically selected to avoid over-depending on some sample statistics, resulting in a strong regularization effect, which we interpret as ``architecture regularization.'' The resulting method is dubbed stochastic normalization (\textbf{StochNorm}). With the two-branch architecture, it naturally incorporates pre-trained moving statistics in BN layers during fine-tuning, exploiting more prior knowledge of pre-trained networks. Extensive empirical experiments show that StochNorm is a powerful tool to avoid over-fitting in fine-tuning with small datasets. Besides, StochNorm is readily pluggable in modern CNN backbones. It is complementary to other fine-tuning methods and can work together to achieve stronger regularization effect.

14/06/2020

Stochastic Normalization

Zhi Kou, Kaichao You, Mingsheng Long, Jianmin Wang

Comments

Similar Papers

Auxiliary Training: Towards Accurate and Robust Models

Linfeng Zhang, Muzhou Yu, Tong Chen and Zuoqiang Shi, Chenglong Bao, Kaisheng Ma

Keywords Abstract Paper

model robustness, data augmentation, adversarial attack, training method, classification

Co-Tuning for Transfer Learning

Kaichao You, Zhi Kou, Mingsheng Long, Jianmin Wang

Keywords Abstract Paper

Identity Crisis: Memorization and Generalization Under Extreme Overparameterization

Chiyuan Zhang, Samy Bengio, Moritz Hardt and Michael C. Mozer, Yoram Singer

Keywords Abstract Paper

Generalization, Memorization, Understanding, Inductive Bias

Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N:M Transposable Masks

Itay Hubara, Brian Chmiel, Moshe Island and Ron Banner, Joseph Naor, Daniel Soudry

Keywords Abstract Paper

deep learning

Fractional moment-preserving initialization schemes for training deep neural networks

Mert Gurbuzbalaban, Yuanhan Hu

Keywords Abstract Paper

Adaptive End-to-End Budgeted Network Learning via Inverse Scale Space

Zuyuan Zhong, Chen Liu, Yanwei Fu

Keywords Abstract Paper

deep learning, network architecture, growing network, budgeted network learning, pruning

Evolving Normalization-Activation Layers

Hanxiao Liu, Andy Brock, Karen Simonyan, Quoc V Le

Keywords Abstract Paper

Learned step size quantization

Steven K. Esser, Jeffrey L. McKinstry, Deepika Bablani and Rathinakumar Appuswamy, Dharmendra S. Modha

Keywords Abstract Paper

deep learning, low precision, classification, quantization

Tailoring: encoding inductive biases by optimizing unsupervised objectives at prediction time

Ferran Alet, Maria Bauza, Kenji Kawaguchi and Nurullah Giray Kuru, Tomás Lozano-Pérez, Leslie Kaelbling

Keywords Abstract Paper

deep learning, optimization, machine learning, self-supervised learning, meta learning

Encoding Robustness to Image Style via Adversarial Feature Perturbations

Manli Shu, Zuxuan Wu, Micah Goldblum, Tom Goldstein

Keywords Abstract Paper

deep learning, machine learning, robustness, adversarial robustness and security, domain adaptation

Non-Euclidean Universal Approximation

Anastasis Kratsios, Eugene Bilokopytov

Keywords Abstract Paper

DPFPS: Dynamic and Progressive Filter Pruning for Compressing Convolutional Neural Networks from Scratch

Xiaofeng Ruan, Yufan Liu, Bing Li and Chunfeng Yuan, Weiming Hu

Keywords Abstract Paper

Scalable Rule-Based Representation Learning for Interpretable Classification

Zhuo Wang, Wei Zhang, Ning Liu, Jianyong Wang

Keywords Abstract Paper

optimization, machine learning, representation learning, interpretability

Skew Orthogonal Convolutions

Sahil Singla, Soheil Feizi

Keywords Abstract Paper

Algorithms, Adversarial Examples

Once-for-All Adversarial Training: In-Situ Tradeoff between Robustness and Accuracy for Free

Haotao Wang, Tianlong Chen, Shupeng Gui and TingKuei Hu, Ji Liu, Zhangyang Wang

Keywords Abstract Paper

Regularizing CNN Transfer Learning With Randomised Regression

Yang Zhong, Atsuto Maki

Keywords Abstract Paper

transfer learning, network regularization, randomised regression, pseudo task regularization, limited samples

Time-series Generation by Contrastive Imitation

Daniel Jarrett, Ioana Bica, Mihaela van der Schaar

Keywords Abstract Paper

generative model

Factorized Higher-Order CNNs With an Application to Spatio-Temporal Emotion Estimation

Jean Kossaifi, Antoine Toisoul, Adrian Bulat and Yannis Panagakis, Timothy M. Hospedales, Maja Pantic

Keywords Abstract Paper

tensor methods, deep learning, spatiotemporal, emotion, cnn, tensor decomposition, low-rank, valence, arousal

Grounding inductive biases in natural images: invariance stems from variations in data

Diane Bouchacourt, Mark Ibrahim, Ari Morcos

Keywords Abstract Paper

machine learning, transformers

AugMax: Adversarial Composition of Random Augmentations for Robust Training

Haotao Wang, Chaowei Xiao, Jean Kossaifi and Zhiding Yu, Anima Anandkumar, Zhangyang Wang

Keywords Abstract Paper

deep learning, robustness, adversarial robustness and security

Effective Unsupervised Domain Adaptation with Adversarially Trained Language Models

Thuy-Trang Vu, Dinh Phung, Gholamreza Haffari

Linfeng Zhang, Muzhou Yu, Tong Chen and
Zuoqiang Shi, Chenglong Bao, Kaisheng Ma

Keywords Paper

Keywords Paper

Chiyuan Zhang, Samy Bengio, Moritz Hardt and
Michael C. Mozer, Yoram Singer

Keywords Paper

Itay Hubara, Brian Chmiel, Moshe Island and
Ron Banner, Joseph Naor, Daniel Soudry

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Steven K. Esser, Jeffrey L. McKinstry, Deepika Bablani and
Rathinakumar Appuswamy, Dharmendra S. Modha

Keywords Paper

Ferran Alet, Maria Bauza, Kenji Kawaguchi and
Nurullah Giray Kuru, Tomás Lozano-Pérez, Leslie Kaelbling

Keywords Paper

Keywords Paper

Keywords Paper

Xiaofeng Ruan, Yufan Liu, Bing Li and
Chunfeng Yuan, Weiming Hu

Keywords Paper

Keywords Paper

Keywords Paper

Haotao Wang, Tianlong Chen, Shupeng Gui and
TingKuei Hu, Ji Liu, Zhangyang Wang

Keywords Paper

Keywords Paper

Keywords Paper

Jean Kossaifi, Antoine Toisoul, Adrian Bulat and
Yannis Panagakis, Timothy M. Hospedales, Maja Pantic

Keywords Paper

Keywords Paper

Haotao Wang, Chaowei Xiao, Jean Kossaifi and
Zhiding Yu, Anima Anandkumar, Zhangyang Wang

Keywords Paper

Keywords Paper

Jianhao Wang, Zhizhou Ren, Terry Liu and
Yang Yu, Chongjie Zhang

Keywords Paper

Xiang Gu, Xi Yu, yan yang and
Jian Sun, Zongben Xu

Keywords Paper

Keywords Paper

Tianzhe Wang, Kuan Wang, Han Cai and
Ji Lin, Zhijian Liu, Hanrui Wang, Yujun Lin, Song Han

Keywords Paper

Manli Zhang, Jianhong Zhang, Zhiwu Lu and
Tao Xiang, Mingyu Ding, Songfang Huang

Keywords Paper

Huaxiu Yao, Long-Kai Huang, Linjun Zhang and
Ying WEI, Li Tian, James Zou, Junzhou Huang, Zhenhui (Jessie) Li

Keywords Paper

Chuhan Wu, Fangzhao Wu, Tao Qi and
Xiaohui Cui, Yongfeng Huang

Keywords Paper

Keywords Paper

Hrayr Harutyunyan, Alessandro Achille, Giovanni Paolini and
Orchid Majumder, Avinash Ravichandran, Rahul Bhotika, Stefano Soatto

Keywords Paper

Keywords Paper

Aniruddh Raghu, Jonathan Lorraine, Simon Kornblith and
Matthew McDermott, David Duvenaud

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper