How to Characterize The Landscape of Overparameterized Convolutional Neural Networks

Abstract: For many initialization schemes, parameters of two randomly initialized deep neural networks (DNNs) can be quite different, but feature distributions of the hidden nodes are similar at each layer. With the help of a new technique called {\it neural network grafting}, we demonstrate that even during the entire training process, feature distributions of differently initialized networks remain similar at each layer. In this paper, we present an explanation of this phenomenon. Specifically, we consider the loss landscape of an overparameterized convolutional neural network (CNN) in the continuous limit, where the numbers of channels/hidden nodes in the hidden layers go to infinity. Although the landscape of the overparameterized CNN is still non-convex with respect to the trainable parameters, we show that very surprisingly, it can be reformulated as a convex function with respect to the feature distributions in the hidden layers. Therefore by reparameterizing neural networks in terms of feature distributions, we obtain a much simpler characterization of the landscape of overparameterized CNNs. We further argue that training with respect to network parameters leads to a fixed trajectory in the feature distributions.

18/07/2021

How to Characterize The Landscape of Overparameterized Convolutional Neural Networks

Yihong Gu, Weizhong Zhang, Cong Fang, Jason Lee, Tong Zhang

Comments

Similar Papers

Tensor Programs IV: Feature Learning in Infinite-Width Neural Networks

Greg Yang, Edward Hu

Keywords Abstract Paper

Theory, Deep learning Theory

Deeply Shared Filter Bases for Parameter-Efficient Convolutional Neural Networks

Woochul Kang, Daeyeon Kim

Keywords Abstract Paper

deep learning, machine learning, vision

Multi-Proxy Wasserstein Classifier for Image Classification

Benlin Liu, Yongming Rao, Jiwen Lu and Jie Zhou, Cho-Jui Hsieh

Keywords Abstract Paper

A self consistent theory of Gaussian Processes captures feature learning effects in finite CNNs

Gadi Naveh, Zohar Ringel

Keywords Abstract Paper

theory, deep learning, optimization, kernel methods

Gradient Descent on Two-layer Nets: Margin Maximization and Simplicity Bias

Kaifeng Lyu, Zhiyuan Li, Runzhe Wang, Sanjeev Arora

Keywords Abstract Paper

deep learning, optimization, machine learning

Enhancing Transformation-Based Defenses Against Adversarial Attacks with a Distribution Classifier

Connie Kou, Hwee Kuan Lee, Ee-Chien Chang, Teck Khim Ng

Keywords Abstract Paper

adversarial attack, transformation defenses, distribution classifier

Discrete Model Compression With Resource Constraint for Deep Neural Networks

Shangqian Gao, Feihu Huang, Jian Pei, Heng Huang

Keywords Abstract Paper

covutional neural networks, model compression, channel pruning, discrete optimization

Phase-Wise Parameter Aggregation for Improving SGD Optimization

Takumi Kobayashi

Keywords Abstract Paper

Large Norms of CNN Layers Do Not Hurt Adversarial Robustness

Youwei Liang, Dong Huang

Keywords Abstract Paper

Group Knowledge Transfer: Federated Learning of Large CNNs at the Edge

Chaoyang He, Murali Annavaram, Salman Avestimehr

Keywords Abstract Paper

Group Softmax Loss With Discriminative Feature Grouping

Takumi Kobayashi

Keywords Abstract Paper

Adversarial Examples in Multi-Layer Random ReLU Networks

Peter Bartlett, Sebastien Bubeck, Yeshwanth Cherapanamjeri

Keywords Abstract Paper

theory, adversarial robustness and security

Meta-Transfer Learning for Zero-Shot Super-Resolution

Jae Woong Soh, Sunwoo Cho, Nam Ik Cho

Keywords Abstract Paper

zero-shot super-resolution, meta learning, transfer learning

Associative convolutional layers

Hamed Omidvar, Vahideh Akhlaghi, Hao Su and Massimo Franceschetti, Rajesh Gupta

Keywords Abstract Paper

Interactive Multi-Label CNN Learning With Partial Labels

Dat Huynh, Ehsan Elhamifar

Keywords Abstract Paper

multi-label learning, partial label learning, large scale dataset, end-to-end training

UWC: Unit-wise Calibration Towards Rapid Network Compression

Chen Lin, Zheyang Li, Bo Peng and Wenming Tan, Ye Ren, Shiliang Pu

Keywords Abstract Paper

post training quantization

Regularizing CNN Transfer Learning With Randomised Regression

Yang Zhong, Atsuto Maki

Keywords Abstract Paper

transfer learning, network regularization, randomised regression, pseudo task regularization, limited samples

HRank: Filter Pruning Using High-Rank Feature Map

Mingbao Lin, Rongrong Ji, Yan Wang and Yichen Zhang, Baochang Zhang, Yonghong Tian, Ling Shao

Keywords Abstract Paper

network pruning, neural network compression and acceleration, high-rank feature map, efficient deep learning computing

Near Lossless Transfer Learning for Spiking Neural Networks

Zhanglu Yan, Jun Zhou, Weng-Fai Wong

Keywords Abstract Paper

When Are Solutions Connected in Deep Networks?

Quynh Nguyen, Pierre Bréchet, Marco Mondelli

Keywords Abstract Paper

theory, deep learning, optimization

Why Do Better Loss Functions Lead to Less Transferable Features?

Simon Kornblith, Ting Chen, Honglak Lee, Mohammad Norouzi

Keywords Abstract Paper

Keywords Paper

Keywords Paper

Benlin Liu, Yongming Rao, Jiwen Lu and
Jie Zhou, Cho-Jui Hsieh

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Hamed Omidvar, Vahideh Akhlaghi, Hao Su and
Massimo Franceschetti, Rajesh Gupta

Keywords Paper

Keywords Paper

Chen Lin, Zheyang Li, Bo Peng and
Wenming Tan, Ye Ren, Shiliang Pu

Keywords Paper

Keywords Paper

Mingbao Lin, Rongrong Ji, Yan Wang and
Yichen Zhang, Baochang Zhang, Yonghong Tian, Ling Shao

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Baifeng Shi, Dinghuai Zhang, Qi Dai and
Jingdong Wang, Zhanxing Zhu, Yadong Mu

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Chiyuan Zhang, Samy Bengio, Moritz Hardt and
Michael C. Mozer, Yoram Singer

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Sreyas Mohan, Joshua L Vincent, Ramon Manzorro and
Peter Crozier, Carlos Fernandez-Granda, Eero P Simoncelli

Keywords Paper

Keywords Paper

Keywords Paper