On Learnability via Gradient Method for Two-Layer ReLU Neural Networks in Teacher-Student Setting

Abstract: Deep learning empirically achieves high performance in many applications, but its training dynamics has not been fully understood theoretically. In this paper, we explore theoretical analysis on training two-layer ReLU neural networks in a teacher-student regression model, in which a student network learns an unknown teacher network through its outputs. We show that with a specific regularization and sufficient over-parameterization, the student network can identify the parameters of the teacher network with high probability via gradient descent with a norm dependent stepsize even though the objective function is highly non-convex. The key theoretical tool is the measure representation of the neural networks and a novel application of a dual certificate argument for sparse estimation on a measure space. We analyze the global minima and global convergence property in the measure space.

18/07/2021

Algorithms -> Model Selection and Structure Learning; Algorithms -> Representation Learning; Theory -> Computational Complexity, Reinforcement Learning and Planning -> Markov Decision Processes

3:22

06/12/2020

On Learnability via Gradient Method for Two-Layer ReLU Neural Networks in Teacher-Student Setting

Shunta Akiyama, Taiji Suzuki

Comments

Similar Papers

Zero-Shot Knowledge Distillation from a Decision-Based Black-Box Model

Zi Wang

Keywords Abstract Paper

Deep Learning

Learning curves of generic features maps for realistic datasets with a teacher-student model

Bruno Loureiro, Cedric Gerbelot, Hugo Cui and Sebastian Goldt, Florent Krzakala, Marc Mezard, Lenka Zdeborová

Keywords Abstract Paper

deep learning, machine learning, kernel methods

Stochastic Precision Ensemble: Self-Knowledge Distillation for Quantized Deep Neural Networks

Yoonho Boo, Sungho Shin, Jungwook Choi, Wonyong Sung

Keywords Abstract Paper

SEED: Self-supervised Distillation For Visual Representation

Jacob Zhiyuan Fang, Jianfeng Wang, Lijuan Wang and Lei Zhang, 'YZ' Yezhou Yang, Zicheng Liu

Keywords Abstract Paper

Representation Learning, Self Supervised Learning, Knowledge Distillation

Introspective Learning by Distilling Knowledge from Online Self-explanation

Jindong Gu, Zhiliang Wu, Volker Tresp

Keywords Abstract Paper

Understanding robustness in teacher-student setting: A new perspective

Zhuolin Yang, Zhaoxi Chen, Tiffany Cai and Xinyun Chen, Bo Li, Yuandong Tian

Keywords Abstract Paper

Optimization and Generalization of Shallow Neural Networks with Quadratic Activation Functions

Stefano Sarao Mannelli, Eric Vanden-Eijnden, Lenka Zdeborová

Keywords Abstract Paper

Algorithms -> Model Selection and Structure Learning; Algorithms -> Representation Learning; Theory -> Computational Complexity, Reinforcement Learning and Planning -> Markov Decision Processes

Knowledge Distillation in Wide Neural Networks: Risk Bound, Data Efficiency and Imperfect Teacher

Guangda Ji, Zhanxing Zhu

Keywords Abstract Paper

Black-Box Ripper: Copying black-box models using generative evolutionary algorithms

Antonio Barbalau, Adrian Cosma, Radu Tudor Ionescu, Marius Popescu

Keywords Abstract Paper

Data-Free Knowledge Distillation with Soft Targeted Transfer Set Synthesis

Zi Wang

Keywords Abstract Paper

Unsupervised Representation Transfer for Small Networks: I Believe I Can Distill On-the-Fly

Hee Min Choi, Hyoa Kang, Dokwan Oh

Keywords Abstract Paper

self-supervised learning, representation learning

Effectiveness of Arbitrary Transfer Sets for Data-Free Knowledge Distillation

Gaurav Kumar Nayak, Konda Reddy Mopuri, Anirban Chakraborty

Keywords Abstract Paper

Few Sample Knowledge Distillation for Efficient Network Compression

Tianhong Li, Jianguo Li, Zhuang Liu, Changshui Zhang

Keywords Abstract Paper

efficient network compression, few samples, knowledge distillation

Heterogeneous Knowledge Distillation Using Information Flow Modeling

Nikolaos Passalis, Maria Tzelepi, Anastasios Tefas

Keywords Abstract Paper

neural network distillation, lightweight learning, information flow

Teacher Guided Neural Architecture Search for Face Recognition

Xiaobo Wang

Keywords Abstract Paper

Iterative Teacher-Aware Learning

Luyao Yuan, Dongruo Zhou, Junhong Shen and Jingdong Gao, Jeffrey L Chen, Quanquan Gu, Ying Nian Wu, Song-Chun Zhu

Keywords Abstract Paper

theory, optimization, reinforcement learning and planning, machine learning

Semi-Online Knowledge Distillation

Zhiqiang Liu, Yanxia Liu, Chengkai Huang

Keywords Abstract Paper

Knowledge Distillation, Model Compression

Amalgamating Knowledge from Two Teachers for Task-oriented Dialogue System with Adversarial Training

Wanwei He, Min Yang, Rui Yan and Chengming Li, Ying Shen, Ruifeng Xu

Keywords Abstract Paper

task completion, generating responses, task-oriented dialogue, task-oriented systems

The Sample Complexity of Teaching by Reinforcement on Q-Learning

Xuezhou Zhang, Shubham Bharti, Yuzhe Ma and Adish Singla, Xiaojin Zhu

Keywords Abstract Paper

Query Distillation: BERT-based Distillation for Ensemble Ranking

Wangshu Zhang, Junhong Liu, Zujie Wen and Yafang Wang, Gerard de Melo

Keywords Abstract Paper

Learning to Reweight with Deep Interactions

Yang Fan, Yingce Xia, Lijun Wu and Shufang Xie, Weiqing Liu, Jiang Bian, Tao Qin, Xiang-Yang Li

Keywords Abstract Paper

Knowledge Transfer in Multi-Task Deep Reinforcement Learning for Continuous Control

Zhiyuan Xu, Kun Wu, Zhengping Che and Jian Tang, Jieping Ye

Keywords Abstract Paper

Keywords Paper

Bruno Loureiro, Cedric Gerbelot, Hugo Cui and
Sebastian Goldt, Florent Krzakala, Marc Mezard, Lenka Zdeborová

Keywords Paper

Keywords Paper

Jacob Zhiyuan Fang, Jianfeng Wang, Lijuan Wang and
Lei Zhang, 'YZ' Yezhou Yang, Zicheng Liu

Keywords Paper

Keywords Paper

Zhuolin Yang, Zhaoxi Chen, Tiffany Cai and
Xinyun Chen, Bo Li, Yuandong Tian

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Luyao Yuan, Dongruo Zhou, Junhong Shen and
Jingdong Gao, Jeffrey L Chen, Quanquan Gu, Ying Nian Wu, Song-Chun Zhu

Keywords Paper

Keywords Paper

Wanwei He, Min Yang, Rui Yan and
Chengming Li, Ying Shen, Ruifeng Xu

Keywords Paper

Xuezhou Zhang, Shubham Bharti, Yuzhe Ma and
Adish Singla, Xiaojin Zhu

Keywords Paper

Wangshu Zhang, Junhong Liu, Zujie Wen and
Yafang Wang, Gerard de Melo

Keywords Paper

Yang Fan, Yingce Xia, Lijun Wu and
Shufang Xie, Weiqing Liu, Jiang Bian, Tao Qin, Xiang-Yang Li

Keywords Paper

Zhiyuan Xu, Kun Wu, Zhengping Che and
Jian Tang, Jieping Ye

Keywords Paper

Liangchen Song, Jialian Wu, Ming Yang and
Qian Zhang, Yuan Li, Junsong Yuan

Keywords Paper

Keywords Paper

Ashraful Islam, Chun-Fu (Richard) Chen, Rameswar Panda and
Leonid Karlinsky, Rogerio Feris, Richard J. Radke

Keywords Paper

Keywords Paper

Taehyeon Kim, Jaehoon Oh, Nak Yil Kim and
Sangwook Cho, Se-Young Yun

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Jingwen Ye, Yixin Ji, Xinchao Wang and
Xin Gao, Mingli Song

Keywords Paper

Qiushan Guo, Xinjiang Wang, Yichao Wu and
Zhipeng Yu, Ding Liang, Xiaolin Hu, Ping Luo

Keywords Paper

Keywords Paper

Keywords Paper

Dae Young Park, Moon-Hyun Cha, changwook jeong and
Daesin Kim, Bohyung Han

Keywords Paper

Keywords Paper