Generalization bounds via distillation

Abstract: This paper theoretically investigates the following empirical phenomenon: given a high-complexity network with poor generalization bounds, one can distill it into a network with nearly identical predictions but low complexity and vastly smaller generalization bounds. The main contribution is an analysis showing that the original network inherits this good generalization bound from its distillation, assuming the use of well-behaved data augmentation. This bound is presented both in an abstract and in a concrete form, the latter complemented by a reduction technique to handle modern computation graphs featuring convolutional layers, fully-connected layers, and skip connections, to name a few. To round out the story, a (looser) classical uniform convergence analysis of compression is also presented, as well as a variety of experiments on cifar and mnist demonstrating similar generalization performance between the original network and its distillation.

14/06/2020

Generalization bounds via distillation

Daniel Hsu, Ziwei Ji, Matus Telgarsky, Lan Wang

Comments

Similar Papers

Group Sparsity: The Hinge Between Filter Pruning and Decomposition for Network Compression

Yawei Li, Shuhang Gu, Christoph Mayer and Luc Van Gool, Radu Timofte

Keywords Abstract Paper

filter pruning, low-rank decomposition, sparsity-inducing matrix, network compression, proximal gradient.

Directional convergence and alignment in deep learning

Ziwei Ji, Matus Telgarsky

Keywords Abstract Paper

The continuous categorical: a novel simplex-valued exponential family

Elliott Gordon-Rodriguez, Gabriel Loaiza-Ganem, John Cunningham

Keywords Abstract Paper

Probabilistic Inference - Models and Probabilistic Programming

Group Fisher Pruning for Practical Network Compression

Liyang Liu, Shilong Zhang, Zhanghui Kuang and Aojun Zhou, Jing-Hao Xue, Xinjiang Wang, Yimin Chen, Wenming Yang, Qingmin Liao, Wayne Zhang

Keywords Abstract Paper

Applications, Computer Vision

Stochastic Sign Descent Methods: New Algorithms and Better Theory

Mher Safaryan, Peter Richtarik

Keywords Abstract Paper

Optimization, Distributed and Parallel Optimization

Heavy Tails in SGD and Compressibility of Overparametrized Neural Networks

Melih Barsbey, Milad Sefidgaran, Murat Erdogdu and Gaël Richard, Umut Simsekli

Keywords Abstract Paper

theory, deep learning, optimization

Lagrangian Decomposition for Neural Network Verification

Rudy Bunel, Alessandro De Palma, Alban Desmaison and Krishnamurthy Dvijotham, Pushmeet Kohli, Philip Torr, M. Pawan Kumar

Keywords Abstract Paper

Monotone operator equilibrium networks

Ezra Winston, J. Zico Kolter

Keywords Abstract Paper

A Unified Theory of Decentralized SGD with Changing Topology and Local Updates

Anastasiia Koloskova, Nicolas Loizou, Sadra Boreiri and Martin Jaggi, Sebastian Stich

Keywords Abstract Paper

Optimization - Large Scale, Parallel and Distributed

Learned Extragradient ISTA with Interpretable Residual Structures for Sparse Coding

Yangyang Li, Lin Kong, Fanhua Shang and Yuanyuan Liu, Hongying Liu, Zhouchen Lin

Keywords Abstract Paper

Learning disconnected manifolds: a no GAN's land

Ugo Tanielian, Thibaut Issenhuth, Elvis Dohmatob, Jeremie Mary

Keywords Abstract Paper

Deep Learning - General

Invertible DenseNets with Concatenated LipSwish

Yura Perugachi-Diaz, Jakub M. Tomczak, Sandjai Bhulai

Keywords Abstract Paper

generative model

Data-Independent Neural Pruning via Coresets

Ben Mussay, Margarita Osadchy, Vladimir Braverman and Samson Zhou, Dan Feldman

Keywords Abstract Paper

coresets, neural pruning, network compression

Orthogonalizing Convolutional Layers with the Cayley Transform

Asher Trockman, Zico Kolter

Keywords Abstract Paper

Lipschitz constrained networks, orthogonal layers, adversarial robustness

Finite Versus Infinite Neural Networks: an Empirical Study

Jaehoon Lee, Sam Schoenholz, Jeffrey Pennington and Ben Adlam, Lechao Xiao, Roman Novak, Jascha Sohl-Dickstein

Keywords Abstract Paper

Multi-Proxy Wasserstein Classifier for Image Classification

Benlin Liu, Yongming Rao, Jiwen Lu and Jie Zhou, Cho-Jui Hsieh

Keywords Abstract Paper

STQ-Nets: Unifying Network Binarization and Structured Pruning

Aurobindo Munagala, Ameya Prabhu, Anoop Namboodiri

Keywords Abstract Paper

quantization, binary networks, binarization, pruning, compression, inference

Convolutional Normalization: Improving Deep Convolutional Network Robustness and Training

Sheng Liu, Xiao Li, Yuexiang Zhai and Chong You, Zhihui Zhu, Carlos Fernandez-Granda, Qing Qu

Keywords Abstract Paper

deep learning, machine learning, robustness, generative model

Beyond Linearization: On Quadratic and Higher-Order Approximation of Wide Neural Networks

Yu Bai, Jason D. Lee

Keywords Abstract Paper

Neural Tangent Kernels, over-parametrized neural networks, deep learning theory

Revisiting Co-Occurring Directions: Sharper Analysis and Efficient Algorithm for Sparse Matrices

Luo Luo, Cheng Chen, Guangzeng Xie, Haishan Ye

Keywords Abstract Paper

Explicit regularization of stochastic gradient methods through duality

Anant Raj, Francis Bach

Keywords Abstract Paper

Yawei Li, Shuhang Gu, Christoph Mayer and
Luc Van Gool, Radu Timofte

Keywords Paper

Keywords Paper

Keywords Paper

Liyang Liu, Shilong Zhang, Zhanghui Kuang and
Aojun Zhou, Jing-Hao Xue, Xinjiang Wang, Yimin Chen, Wenming Yang, Qingmin Liao, Wayne Zhang

Keywords Paper

Keywords Paper

Melih Barsbey, Milad Sefidgaran, Murat Erdogdu and
Gaël Richard, Umut Simsekli

Keywords Paper

Rudy Bunel, Alessandro De Palma, Alban Desmaison and
Krishnamurthy Dvijotham, Pushmeet Kohli, Philip Torr, M. Pawan Kumar

Keywords Paper

Keywords Paper

Anastasiia Koloskova, Nicolas Loizou, Sadra Boreiri and
Martin Jaggi, Sebastian Stich

Keywords Paper

Yangyang Li, Lin Kong, Fanhua Shang and
Yuanyuan Liu, Hongying Liu, Zhouchen Lin

Keywords Paper

Keywords Paper

Keywords Paper

Ben Mussay, Margarita Osadchy, Vladimir Braverman and
Samson Zhou, Dan Feldman

Keywords Paper

Keywords Paper

Jaehoon Lee, Sam Schoenholz, Jeffrey Pennington and
Ben Adlam, Lechao Xiao, Roman Novak, Jascha Sohl-Dickstein

Keywords Paper

Benlin Liu, Yongming Rao, Jiwen Lu and
Jie Zhou, Cho-Jui Hsieh

Keywords Paper

Keywords Paper

Sheng Liu, Xiao Li, Yuexiang Zhai and
Chong You, Zhihui Zhu, Carlos Fernandez-Granda, Qing Qu

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Murat Kocaoglu, Sanjay Shakkottai, Alex Dimakis and
Constantine Caramanis, Sriram Vishwanath

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Rianne van den Berg, Alexey Gritsenko, Mostafa Dehghani and
Casper Sønderby, Tim Salimans

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Lucas Liebenwein, Cenk Baykal, Harry Lang and
Dan Feldman, Daniela Rus

Keywords Paper

Keywords Paper