Learning to Mutate with Hypergradient Guided Population

06/12/2020

Learning to Mutate with Hypergradient Guided Population

Zhiqiang Tao, Yaliang Li, Bolin Ding, Ce Zhang, Jingren Zhou, Yun Fu

Keywords:

Abstract Paper Similar Papers

Abstract: Computing the gradient of model hyperparameters, i.e., hypergradient, enables a promising and natural way to solve the hyperparameter optimization task. However, gradient-based methods could lead to suboptimal solutions due to the non-convex nature of optimization in a complex hyperparameter space. In this study, we propose a hyperparameter mutation (HPM) algorithm to explicitly consider a learnable trade-off between using global and local search, where we adopt a population of student models to simultaneously explore the hyperparameter space guided by hypergradient and leverage a teacher model to mutate the underperforming students by exploiting the top ones. The teacher model is implemented with an attention mechanism and is used to learn a mutation schedule for different hyperparameters on the fly. Empirical evidence on synthetic functions is provided to show that HPM outperforms hypergradient significantly. Experiments on two benchmark datasets are also conducted to validate the effectiveness of the proposed HPM algorithm for training deep neural networks compared with several strong baselines.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2021

Teaching an Active Learner with Contrastive Examples

Chaoqi Wang, Adish Singla, Yuxin Chen

Keywords Paper

optimization, active learning

0

0

0

0

10:10

03/05/2021

SEED: Self-supervised Distillation For Visual Representation

Jacob Zhiyuan Fang, Jianfeng Wang, Lijuan Wang and
Lei Zhang, 'YZ' Yezhou Yang, Zicheng Liu

Keywords Paper

Representation Learning, Self Supervised Learning, Knowledge Distillation

0

0

0

0

5:09

06/12/2021

Iterative Teacher-Aware Learning

Luyao Yuan, Dongruo Zhou, Junhong Shen and
Jingdong Gao, Jeffrey L Chen, Quanquan Gu, Ying Nian Wu, Song-Chun Zhu

Keywords Paper

theory, optimization, reinforcement learning and planning, machine learning

0

0

0

0

6:40

14/06/2020

Online Knowledge Distillation via Collaborative Learning

Qiushan Guo, Xinjiang Wang, Yichao Wu and
Zhipeng Yu, Ding Liang, Xiaolin Hu, Ping Luo

Keywords Paper

knowledge distillation, collaborative learning, transfer learning, deep neural network

0

0

0

0

4:37

12/07/2020

Teaching with Limited Information on the Learner's Behaviour

Ferdinando Cicalese, Francisco Sergio de Freitas Filho, Eduardo Laber, Marco Molinaro

Keywords Paper

Learning Theory

0

0

0

0

15:07

13/04/2021

Understanding robustness in teacher-student setting: A new perspective

Zhuolin Yang, Zhaoxi Chen, Tiffany Cai and
Xinyun Chen, Bo Li, Yuandong Tian

Keywords Paper

0

0

0

0

3:03

02/02/2021

Adaptive Teaching of Temporal Logic Formulas to Preference-based Learners

Zhe Xu, Yuxin Chen, Ufuk Topcu

Keywords Paper

0

0

0

0

19:42

12/07/2020

A Sample Complexity Separation between Non-Convex and Convex Meta-Learning

Nikunj Umesh Saunshi, Yi Zhang, Mikhail Khodak, Sanjeev Arora

Keywords Paper

Deep Learning - Theory

0

0

0

0

15:03

14/06/2020

Neural Networks Are More Productive Teachers Than Human Raters: Active Mixup for Data-Efficient Knowledge Distillation From a Blackbox Model

Dongdong Wang, Yandong Li, Liqiang Wang, Boqing Gong

Keywords Paper

blackbox knowledge distillation, data-efficient learning, active learning, mixup

0

0

0

0

4:59

22/11/2021

Teacher-Class Network: A Neural Network Compression Mechanism

Shaiq Munir Malik, Fnu Mohbat, Muhammad Umair Haider and
Muhammad Musab Rasheed, Murtaza Taj

Keywords Paper

model compression, knowledge distillation, teacher-student network

0

0

0

0

3:17

06/12/2020

Optimization and Generalization of Shallow Neural Networks with Quadratic Activation Functions

Stefano Sarao Mannelli, Eric Vanden-Eijnden, Lenka Zdeborová

Keywords Paper

Algorithms -> Model Selection and Structure Learning; Algorithms -> Representation Learning; Theory -> Computational Complexity, Reinforcement Learning and Planning -> Markov Decision Processes

0

0

0

0

3:22

03/05/2021

Few-Shot Bayesian Optimization with Deep Kernel Surrogates

Martin Wistuba, Josif Grabocka

Keywords Paper

automl, bayesian optimization, metalearning, few-shot learning

0

0

0

0

5:18

06/12/2021

Unsupervised Representation Transfer for Small Networks: I Believe I Can Distill On-the-Fly

Hee Min Choi, Hyoa Kang, Dokwan Oh

Keywords Paper

self-supervised learning, representation learning

0

0

0

0

3:35

12/07/2020

Generative Teaching Networks: Accelerating Neural Architecture Search by Learning to Generate Synthetic Training Data

Felipe Petroski Such, Aditya Rawal, Joel Lehman and
Kenneth Stanley, Jeffrey Clune

Keywords Paper

Transfer, Multitask and Meta-learning

0

0

0

0

7:25

14/09/2020

Companion Guided Soft Margin for Face Recognition

Yingcheng Su, Yichao Wu, Zhenmao Li and
Ken Chen, Ding Liang, Xiaolin Hu, Junjie Yan

Keywords Paper

face recognition, companion guided soft margin, sample-wise adaptive margin

0

0

0

0

15:44

13/04/2021

Curriculum learning by optimizing learning dynamics

Tianyi Zhou, Shengjie Wang, Jeff Bilmes

Keywords Paper

0

0

0

0

3:03

03/05/2021

Learning the Pareto Front with Hypernetworks

Aviv Navon, Aviv Shamsian, Ethan Fetaya, Gal Chechik

Keywords Paper

multi-task learning, Multi-objective optimization

0

0

0

0

5:19

02/02/2021

Learning to Augment for Data-scarce Domain BERT Knowledge Distillation

Lingyun Feng, Minghui Qiu, Yaliang Li and
Hai-Tao Zheng, Ying Shen

Keywords Paper

0

0

0

0

17:11

02/02/2021

Stochastic Precision Ensemble: Self-Knowledge Distillation for Quantized Deep Neural Networks

Yoonho Boo, Sungho Shin, Jungwook Choi, Wonyong Sung

Keywords Paper

0

0

0

0

19:03

16/11/2020

Amalgamating Knowledge from Two Teachers for Task-oriented Dialogue System with Adversarial Training

Wanwei He, Min Yang, Rui Yan and
Chengming Li, Ying Shen, Ruifeng Xu

Keywords Paper

task completion, generating responses, task-oriented dialogue, task-oriented systems

0

0

0

0

9:15

06/12/2021

Iterative Teaching by Label Synthesis

Weiyang Liu, Zhen Liu, Hanchen Wang and
Liam Paull, Bernhard Schölkopf, Adrian Weller

Keywords Paper

reinforcement learning and planning

0

0

0

0

7:12

19/04/2021

Annealing knowledge distillation

Aref Jafari, Mehdi Rezagholizadeh, Pranav Sharma, Ali Ghodsi

Keywords Paper

0

0

0

0

12:38

06/12/2021

Dynamic Distillation Network for Cross-Domain Few-Shot Recognition with Unlabeled Data

Ashraful Islam, Chun-Fu (Richard) Chen, Rameswar Panda and
Leonid Karlinsky, Rogerio Feris, Richard J. Radke

Keywords Paper

machine learning, meta learning, few shot learning

0

0

0

0

10:10

14/06/2020

Uninformed Students: Student-Teacher Anomaly Detection With Discriminative Latent Embeddings

Paul Bergmann, Michael Fauser, David Sattlegger, Carsten Steger

Keywords Paper

anomaly detection, unsupervised learning, defect segmentation, student-teacher learning, uncertainty, novelty detection

0

0

0

0

1:00

02/02/2021

Learning to Reweight with Deep Interactions

Yang Fan, Yingce Xia, Lijun Wu and
Shufang Xie, Weiqing Liu, Jiang Bian, Tao Qin, Xiang-Yang Li

Keywords Paper

0

0

0

0

14:06

02/02/2021

The Sample Complexity of Teaching by Reinforcement on Q-Learning

Xuezhou Zhang, Shubham Bharti, Yuzhe Ma and
Adish Singla, Xiaojin Zhu

Keywords Paper

0

0

0

0

14:48

12/07/2020

Self-PU: Self Boosted and Calibrated Positive-Unlabeled Training

Xuxi Chen, Wuyang Chen, Tianlong Chen and
Ye Yuan, Chen Gong, Kewei Chen, Zhangyang Wang

Keywords Paper

Supervised Learning

0

0

0

0

7:05

06/12/2021

Towards Enabling Meta-Learning from Target Models

Su Lu, Han-Jia Ye, Le Gan, De-Chuan Zhan

Keywords Paper

meta learning, few shot learning

0

0

0

0

11:12

06/12/2021

Training Over-parameterized Models with Non-decomposable Objectives

Harikrishna Narasimhan, Aditya Menon

Keywords Paper

optimization, machine learning, fairness

0

0

0

0

8:28

02/02/2021

Teaching Active Human Learners

Zizhe Wang, Hailong Sun

Keywords Paper

0

0

0

0

17:00

18/07/2021

Training Data Subset Selection for Regression with Controlled Generalization Error

Durga S, Rishabh Iyer, Ganesh Ramakrishnan, Abir De

Keywords Paper

, Algorithms, Online Learning, Algorithms, Supervised Learning

0

0

0

0

4:15

14/09/2020

A Taxonomy of Interactive Online Machine Learning Strategies

Agnes Tegen, Paul Davidsson, Jan A. Persson

Keywords Paper

interactive machine learning, online learning, active learning

0

0

0

0

14:20

06/12/2021

Learning Student-Friendly Teacher Networks for Knowledge Distillation

Dae Young Park, Moon-Hyun Cha, changwook jeong and
Daesin Kim, Bohyung Han

Keywords Paper

deep learning, transfer learning

0

0

0

0

13:41

18/07/2021

On Learnability via Gradient Method for Two-Layer ReLU Neural Networks in Teacher-Student Setting

Shunta Akiyama, Taiji Suzuki

Keywords Paper

Theory, Deep learning Theory

0

0

0

0

5:17

22/11/2021

Object Re-identification Using Teacher-Like and Light Students

Yi Xie, Hanxiao Wu, Fei Shen and
Jianqing Zhu, Huanqiang Zeng

Keywords Paper

object re-identification, knowledge distillation, pruning, re-parameterization

0

0

0

0

3:19

22/11/2021

Class-Balanced Distillation for Long-Tailed Visual Recognition

Ahmet Iscen, Andre Araujo, Boqing Gong, Cordelia Schmid

Keywords Paper

Long tailed recognition, dataset imbalance

0

0

0

0

3:02

03/05/2021

Knowledge Distillation as Semiparametric Inference

Tri Dao, Govinda Kamath, Vasilis Syrgkanis, Lester Mackey

Keywords Paper

generalization bounds, knowledge distillation, model compression, loss correction, orthogonal machine learning, cross-fitting, semiparametric inference

0

0

0

0

5:10

06/12/2020

Knowledge Transfer in Multi-Task Deep Reinforcement Learning for Continuous Control

Zhiyuan Xu, Kun Wu, Zhengping Che and
Jian Tang, Jieping Ye

Keywords Paper

0

0

0

0

2:45

04/08/2021

A Local Convergence Theory for Mildly Over-Parameterized Two-Layer Neural Network

Mo Zhou, Rong Ge, Chi Jin

Keywords Paper

0

0

0

0

17:47

06/12/2021

Exponential Separation between Two Learning Models and Adversarial Robustness

Grzegorz Gluch, Ruediger Urbanke

Keywords Paper

theory, robustness, adversarial robustness and security

0

0

0

0

15:11