Optimizing Millions of Hyperparameters by Implicit Differentiation

26/08/2020

Optimizing Millions of Hyperparameters by Implicit Differentiation

Jonathan Lorraine, Paul Vicol, David Duvenaud

Keywords:

Abstract Paper Similar Papers

Abstract: We propose an algorithm for inexpensive gradient-based hyperparameter optimization that combines the implicit function theorem (IFT) with efficient inverse Hessian approximations. We present results about the relationship between the IFT and differentiating through optimization, motivating our algorithm. We use the proposed approach to train modern network architectures with millions of weights and millions of hyper-parameters. For example, we learn a data-augmentation network—where every weight is a hyperparameter tuned for validation performance—outputting augmented training examples. Jointly tuning weights and hyper-parameters is only a few times more costly in memory and compute than standard training.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at AISTATS 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2020

Efficient Learning of Generative Models via Finite-Difference Score Matching

Tianyu Pang, Kun Xu, Chongxuan LI and
Yang Song, Stefano Ermon, Jun Zhu

Keywords Paper

0

0

0

0

2:59

12/07/2020

On the Iteration Complexity of Hypergradient Computations

Riccardo Grazzi, Saverio Salzo, Massimiliano Pontil, Luca Franceschi

Keywords Paper

Transfer, Multitask and Meta-learning

0

0

0

0

15:10

02/02/2021

GLISTER: Generalization based Data Subset Selection for Efficient and Robust Learning

Krishnateja Killamsetty, Durga Sivasubramanian, Ganesh Ramakrishnan, Rishabh Iyer

Keywords Paper

0

0

0

0

19:14

18/07/2021

A Value-Function-based Interior-point Method for Non-convex Bi-level Optimization

Risheng Liu, Xuan Liu, Xiaoming Yuan and
Shangzhi Zeng, Jin Zhang

Keywords Paper

Optimization, Non-Convex Optimization

0

0

0

0

5:12

18/07/2021

Implicit rate-constrained optimization of non-decomposable objectives

Abhishek Kumar, Harikrishna Narasimhan, Andrew Cotter

Keywords Paper

Algorithms, Supervised Learning

0

0

0

0

3:48

06/12/2020

Dual-Free Stochastic Decentralized Optimization with Variance Reduction

Hadrien Hendrikx, Francis Bach, Laurent Massoulié

Keywords Paper

0

0

0

0

3:28

06/12/2020

Bayesian Optimization for Iterative Learning

Vu Nguyen, Sebastian Schulze, Michael A Osborne

Keywords Paper

0

0

0

0

3:19

12/07/2020

Extrapolation for Large-batch Training in Deep Learning

Tao LIN, Lingjing Kong, Sebastian Stich, Martin Jaggi

Keywords Paper

Deep Learning - Algorithms

0

0

0

0

13:21

26/04/2020

Learning to Guide Random Search

Ozan Sener, Vladlen Koltun

Keywords Paper

Random search, Derivative-free optimization, Learning continuous control

0

0

0

0

4:58

06/12/2021

Dynamic Trace Estimation

Prathamesh Dharangutte, Christopher Musco

Keywords Paper

theory, deep learning, optimization, machine learning, graph learning

0

0

0

0

14:13

06/12/2020

Delta-STN: Efficient Bilevel Optimization for Neural Networks using Structured Response Jacobians

Juhan Bae, Roger Grosse

Keywords Paper

0

0

0

0

3:20

26/08/2020

One Sample Stochastic Frank-Wolfe

Mingrui Zhang, Zebang Shen, Aryan Mokhtari and
Hamed Hassani, Amin Karbasi

Keywords Paper

0

0

0

0

6:05

26/04/2020

Kernelized Wasserstein Natural Gradient

M Arbel, A Gretton, W Li, G Montufar

Keywords Paper

kernel methods, natural gradient, information geometry, Wasserstein metric

0

0

0

0

4:56

12/07/2020

Improving Transformer Optimization Through Better Initialization

Xiao Shi Huang, Felipe Perez, Jimmy Ba, Maksims Volkovs

Keywords Paper

Sequential, Network, and Time-Series Modeling

0

0

0

0

14:52

03/05/2021

Few-Shot Bayesian Optimization with Deep Kernel Surrogates

Martin Wistuba, Josif Grabocka

Keywords Paper

automl, bayesian optimization, metalearning, few-shot learning

0

0

0

0

5:18

18/07/2021

Training Data Subset Selection for Regression with Controlled Generalization Error

Durga S, Rishabh Iyer, Ganesh Ramakrishnan, Abir De

Keywords Paper

, Algorithms, Online Learning, Algorithms, Supervised Learning

0

0

0

0

4:15

14/06/2020

Learning to Optimize on SPD Manifolds

Zhi Gao, Yuwei Wu, Yunde Jia, Mehrtash Harandi

Keywords Paper

riemannian optimization, symmetric positive definite (spd) manifolds, optimization-based meta-learning, automatical spd optimizer design, learning to optimize, gradiend-based spd optimization, optimization problems with spd constraints

0

0

0

0

0:50

26/08/2020

A Rule for Gradient Estimator Selection, with an Application to Variational Inference

Tomas Geffner, Justin Domke

Keywords Paper

0

0

0

0

8:36

26/04/2020

Meta-Learning Acquisition Functions for Transfer Learning in Bayesian Optimization

Michael Volpp, Lukas P. Fröhlich, Kirsten Fischer and
Andreas Doerr, Stefan Falkner, Frank Hutter, Christian Daniel

Keywords Paper

Transfer Learning, Meta Learning, Bayesian Optimization, Reinforcement Learning

0

0

0

0

5:05

06/12/2021

Model Selection for Bayesian Autoencoders

Ba-Hien Tran, Simone Rossi, Dimitrios Milios and
Pietro Michiardi, Edwin Bonilla, Maurizio Filippone

Keywords Paper

optimization, self-supervised learning, generative model, representation learning

0

0

0

0

10:49

06/12/2020

A Contour Stochastic Gradient Langevin Dynamics Algorithm for Simulations of Multi-modal Distributions

Wei Deng, Guang Lin, Faming Liang

Keywords Paper

0

0

0

0

3:26

12/07/2020

Conditional gradient methods for stochastically constrained convex minimization

Maria-Luiza Vladarean, Ahmet Alacaoglu, Ya-Ping Hsieh, Volkan Cevher

Keywords Paper

Optimization - Convex

0

0

0

0

14:50

02/02/2021

Frugal Optimization for Cost-related Hyperparameters

Qingyun Wu, Chi Wang, Silu Huang

Keywords Paper

0

0

0

0

16:07

26/04/2020

Reinforced Genetic Algorithm Learning for Optimizing Computation Graphs

Aditya Paliwal, Felix Gimeno, Vinod Nair and
Yujia Li, Miles Lubin, Pushmeet Kohli, Oriol Vinyals

Keywords Paper

reinforcement learning, learning to optimize, combinatorial optimization, computation graphs, model parallelism, learning for systems

0

0

0

0

4:21

06/12/2020

Relative gradient optimization of the Jacobian term in unsupervised deep learning

Luigi Gresele, Giancarlo Fissore, Adrián Javaloy and
Bernhard Schölkopf, Aapo Hyvarinen

Keywords Paper

0

0

0

0

3:15

06/12/2021

Learned Robust PCA: A Scalable Deep Unfolding Approach for High-Dimensional Outlier Detection

HanQin Cai, Jialin Liu, Wotao Yin

Keywords Paper

deep learning, machine learning

0

0

0

0

8:07

06/12/2020

Dense Correspondences between Human Bodies via Learning Transformation Synchronization on Graphs

Xiangru Huang, Haitao Yang, Etienne Vouga, Qixing Huang

Keywords Paper

0

0

0

0

3:00

18/07/2021

Bilevel Optimization: Convergence Analysis and Enhanced Design

Kaiyi Ji, Junjie Yang, Yingbin LIANG

Keywords Paper

Optimization, Non-Convex Optimization

0

0

0

0

5:02

18/11/2020

Deep-n-cheap: An automated search framework for low complexity deep learning

Sourya Dey, Saikrishna C. Kanala, Keith M. Chugg, Peter A. Beerel

Keywords Paper

0

0

0

0

11:59

06/12/2021

How Data Augmentation affects Optimization for Linear Regression

Boris Hanin, Yi Sun

Keywords Paper

optimization, machine learning

0

0

0

0

13:38

26/08/2020

'Bring Your Own Greedy'+Max: Near-Optimal 1/2-Approximations for Submodular Knapsack

Grigory Yaroslavtsev, Samson Zhou, Dmitrii Avdiukhin

Keywords Paper

0

0

0

0

13:14

13/04/2021

Faster & more reliable tuning of neural networks: Bayesian optimization with importance sampling

Setareh Ariafar, Zelda Mariet, Dana Brooks and
Jennifer Dy, Jasper Snoek

Keywords Paper

0

0

0

0

3:01

12/07/2020

Obtaining Adjustable Regularization for Free via Iterate Averaging

Jingfeng Wu, Vladimir Braverman, Lin Yang

Keywords Paper

Optimization - General

0

0

0

0

12:07

06/12/2020

Interpolation Technique to Speed Up Gradients Propagation in Neural ODEs

Talgat Daulbaev, Alexandr Katrutsa, Larisa Markeeva and
Julia Gusak, Andrzej Cichocki, Ivan Oseledets

Keywords Paper

0

0

0

0

3:18

26/08/2020

A Double Residual Compression Algorithm for Efficient Distributed Learning

Xiaorui Liu, Yao Li, Jiliang Tang, Ming Yan

Keywords Paper

0

0

0

0

10:47

06/12/2020

Adaptive Gradient Quantization for Data-Parallel SGD

Fartash Faghri, Iman Tabrizian, Ilia Markov and
Dan Alistarh, Dan Roy, Ali Ramezani-Kebrya

Keywords Paper

0

0

0

0

3:20

02/02/2021

Harmonized Dense Knowledge Distillation Training for Multi-Exit Architectures

Xinglu Wang, Yingming Li

Keywords Paper

0

0

0

0

15:12

12/07/2020

Model Fusion with Kullback--Leibler Divergence

Sebastian Claici, Mikhail Yurochkin, Soumya Ghosh, Justin Solomon

Keywords Paper

Probabilistic Inference - Approximate, Monte Carlo, and Spectral Methods

0

0

0

0

9:58

06/12/2021

Fast Training Method for Stochastic Compositional Optimization Problems

Hongchang Gao, Heng Huang

Keywords Paper

optimization, machine learning, meta learning

0

0

0

0

14:00

06/12/2021

SUPER-ADAM: Faster and Universal Framework of Adaptive Gradients

Feihu Huang, Junyi Li, Heng Huang

Keywords Paper

deep learning, optimization, machine learning

0

0

0

0

13:13