Self-Distillation Amplifies Regularization in Hilbert Space

Abstract: Knowledge distillation introduced in the deep learning context is a method to transfer knowledge from one architecture to another. In particular, when the architectures are identical, this is called self-distillation. The idea is to feed in predictions of the trained model as new target values for retraining (and iterate this loop possibly a few times). It has been empirically observed that the self-distilled model often achieves higher accuracy on held out data. Why this happens, however, has been a mystery: the self-distillation dynamics does not receive any new information about the task and solely evolves by looping over training. To the best of our knowledge, there is no rigorous understanding of why this happens. This work provides the first theoretical analysis of self-distillation. We focus on fitting a nonlinear function to training data, where the model space is Hilbert space and fitting is subject to L2 regularization in this function space. We show that self-distillation iterations modify regularization by progressively limiting the number of basis functions that can be used to represent the solution. This implies (as we also verify empirically) that while a few rounds of self-distillation may reduce over-fitting, further rounds may lead to under-fitting and thus worse performance.

03/05/2021

Self-Distillation Amplifies Regularization in Hilbert Space

Hossein Mobahi, Mehrdad Farajtabar, Peter Bartlett

Comments

Similar Papers

Meta-learning with negative learning rates

Alberto Bernacchia

Keywords Abstract Paper

Meta-learning

Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations

Jiayao Zhang, Hua Wang, Weijie Su

Keywords Abstract Paper

deep learning, optimization

A theoretical characterization of semi-supervised learning with self-training for gaussian mixture models

Samet Oymak, Talha Cihad Gulcu

Keywords Abstract Paper

Dash: Semi-Supervised Learning with Dynamic Thresholding

Yi Xu, Lei Shang, Jinxing Ye and Qi Qian, Yufeng Li, Baigui Sun, Hao Li, rong jin

Keywords Abstract Paper

Algorithms, Semi-Supervised Learning

Leveraging Recursive Gumbel-Max Trick for Approximate Inference in Combinatorial Spaces

Kirill Struminsky, Artyom Gadetsky, Denis Rakitin and Danil Karpushkin, Dmitry Vetrov

Keywords Abstract Paper

deep learning, optimization

The Autoencoding Variational Autoencoder

Taylan Cemgil, Sumedh Ghaisas, Krishnamurthy Dvijotham and Sven Gowal, Pushmeet Kohli

Keywords Abstract Paper

Exploratory Machine Learning with Unknown Unknowns

Peng Zhao, Yu-Jie Zhang, Zhi-Hua Zhou

Keywords Abstract Paper

Few-Shot Bayesian Optimization with Deep Kernel Surrogates

Martin Wistuba, Josif Grabocka

Keywords Abstract Paper

automl, bayesian optimization, metalearning, few-shot learning

Reinforcement Learning of Implicit and Explicit Control Flow Instructions

Ethan Brooks, Janarthanan Rajendran, Richard Lewis, Satinder Singh

Keywords Abstract Paper

Optimization, Optimization, Combinatorial Optimization, Reinforcement Learning and Planning, Deep RL

Training Data Subset Selection for Regression with Controlled Generalization Error

Durga S, Rishabh Iyer, Ganesh Ramakrishnan, Abir De

Keywords Abstract Paper

, Algorithms, Online Learning, Algorithms, Supervised Learning

Linear Mode Connectivity in Multitask and Continual Learning

Seyed Iman Mirzadeh, Mehrdad Farajtabar, Dilan Gorur and Razvan Pascanu, Hassan Ghasemzadeh

Keywords Abstract Paper

multitask learning, mode connectivity, continual learning, catastrophic forgetting

Meta-Learning with Neural Tangent Kernels

Yufan Zhou, Zhenyi Wang, Jiayi Xian and Changyou Chen, Jinhui Xu

Keywords Abstract Paper

neural tangent kernel, meta-learning

Self-Supervised Self-Supervision by Combining Deep Learning and Probabilistic Logic

Hunter Lang, Hoifung Poon

Keywords Abstract Paper

DDPNOpt: Differential Dynamic Programming Neural Optimizer

Guan-Horng Liu, Tianrong Chen, Evangelos Theodorou

Keywords Abstract Paper

differential dynamica programming, trajectory optimization, deep learning training, optimal control

Meta-Learning with Warped Gradient Descent

Sebastian Flennerhag, Andrei A. Rusu, Razvan Pascanu and Francesco Visin, Hujun Yin, Raia Hadsell

Keywords Abstract Paper

meta-learning, transfer learning

Any-Precision Deep Neural Networks

Haichao Yu, Haoxiang Li, Humphrey Shi and Thomas S. Huang, Gang Hua

Keywords Abstract Paper

One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL

Saurabh Kumar, Aviral Kumar, Sergey Levine, Chelsea Finn

Keywords Abstract Paper

Revisiting Self-Training for Neural Sequence Generation

Junxian He, Jiatao Gu, Jiajun Shen, Marc'Aurelio Ranzato

Keywords Abstract Paper

self-training, semi-supervised learning, neural sequence generatioin

No Fear of Heterogeneity: Classifier Calibration for Federated Learning with Non-IID Data

Mi Luo, Fei Chen, Dapeng Hu and Yifan Zhang, Jian Liang, Jiashi Feng

Keywords Abstract Paper

optimization, machine learning, federated learning

Offline Meta-Reinforcement Learning with Advantage Weighting

Eric Mitchell, Rafael Rafailov, Xue Bin Peng and Sergey Levine, Chelsea Finn

Keywords Abstract Paper

Algorithms, Multitask, Transfer, and Meta Learning

DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction

Aviral Kumar, Abhishek Gupta, Sergey Levine

Keywords Paper

Keywords Paper

Keywords Paper

Yi Xu, Lei Shang, Jinxing Ye and
Qi Qian, Yufeng Li, Baigui Sun, Hao Li, rong jin

Keywords Paper

Kirill Struminsky, Artyom Gadetsky, Denis Rakitin and
Danil Karpushkin, Dmitry Vetrov

Keywords Paper

Taylan Cemgil, Sumedh Ghaisas, Krishnamurthy Dvijotham and
Sven Gowal, Pushmeet Kohli

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Seyed Iman Mirzadeh, Mehrdad Farajtabar, Dilan Gorur and
Razvan Pascanu, Hassan Ghasemzadeh

Keywords Paper

Yufan Zhou, Zhenyi Wang, Jiayi Xian and
Changyou Chen, Jinhui Xu

Keywords Paper

Keywords Paper

Keywords Paper

Sebastian Flennerhag, Andrei A. Rusu, Razvan Pascanu and
Francesco Visin, Hujun Yin, Raia Hadsell

Keywords Paper

Haichao Yu, Haoxiang Li, Humphrey Shi and
Thomas S. Huang, Gang Hua

Keywords Paper

Keywords Paper

Keywords Paper

Mi Luo, Fei Chen, Dapeng Hu and
Yifan Zhang, Jian Liang, Jiashi Feng

Keywords Paper

Eric Mitchell, Rafael Rafailov, Xue Bin Peng and
Sergey Levine, Chelsea Finn

Keywords Paper

Keywords Paper

Keywords Paper

Haoming Jiang, Zhehui Chen, Yuyang Shi and
Bo Dai, Tuo Zhao

Keywords Paper

Xinshi Chen, Hanjun Dai, Yu Li and
Xin Gao, Le Song

Keywords Paper

Keywords Paper

Keywords Paper

Mengdi Xu, Wenhao Ding, Jiacheng Zhu and
ZUXIN LIU, Baiming Chen, Ding Zhao

Keywords Paper

Davide Abati, Jakub Tomczak, Tijmen Blankevoort and
Simone Calderara, Rita Cucchiara, Babak Ehteshami Bejnordi

Keywords Paper

Keywords Paper

Hung-Yu Tseng, Yi-Wen Chen, Yi-Hsuan Tsai and
Sifei Liu, Yen-Yu Lin, Ming-Hsuan Yang

Keywords Paper

Mingqing Xiao, Qingyan Meng, Zongpeng Zhang and
Yisen Wang, Zhouchen Lin

Keywords Paper

Keywords Paper

Alexander Robey, Luiz Chamon, George J. Pappas and
Hamed Hassani, Alejandro Ribeiro

Keywords Paper

Zhining Liu, Pengfei Wei, Jing Jiang and
Wei Cao, Jiang Bian, Yi Chang

Keywords Paper

Dibya Ghosh, Abhishek Gupta, Ashwin D Reddy and
Justin Fu, Coline M Devin, Ben Eysenbach, Sergey Levine

Keywords Paper

Keywords Paper