Improved Penalty Method via Doubly Stochastic Gradients for Bilevel Hyperparameter Optimization

Abstract: Hyperparameter optimization (HO) is an important problem in machine learning which is normally formulated as a bilevel optimization problem. Gradient-based methods are dominant in bilevel optimization due to their high scalability to the number of hyperparameters, especially in a deep learning problem. However, traditional gradient-based bilevel optimization methods need intermediate steps to obtain the exact or approximate gradient of hyperparameters, namely hypergradient, for the upper-level objective, whose complexity is high especially for high dimensional datasets. Recently, a penalty method has been proposed to avoid the computation of the hypergradient, which speeds up the gradient-based BHO methods. However, the penalty method may result in a very large number of constraints, which greatly limits the efficiency of this method, especially for high dimensional data problems. To address this limitation, in this paper, we propose a doubly stochastic gradient descent algorithm (DSGPHO) to improve the efficiency of the penalty method. Importantly, we not only prove the proposed method can converge to the KKT condition of the original problem in a convex setting, but also provide the convergence rate of DSGPHO which is the first result in the references of gradient-based bilevel optimization as far as we know. We compare our method with three state-of-the-art gradient-based methods in three tasks, i.e., data denoising, few-shot learning, and training data poisoning, using several large-scale benchmark datasets. All the results demonstrate that our method outperforms or is comparable to the existing methods in terms of accuracy and efficiency.

13/04/2021

Improved Penalty Method via Doubly Stochastic Gradients for Bilevel Hyperparameter Optimization

Wanli Shi, Bin Gu

Comments

Similar Papers

Convergence properties of stochastic hypergradients

Riccardo Grazzi, Massimiliano Pontil, Saverio Salzo

Keywords Abstract Paper

Non-convex Distributionally Robust Optimization: Non-asymptotic Analysis

Jikai Jin, Bohang Zhang, Haiyang Wang, Liwei Wang

Keywords Abstract Paper

optimization

Three Operator Splitting with Subgradients, Stochastic Gradients, and Adaptive Learning Rates

Alp Yurtsever, Alex Gu, Suvrit Sra

Keywords Abstract Paper

optimization, machine learning

Extrapolation for Large-batch Training in Deep Learning

Tao LIN, Lingjing Kong, Sebastian Stich, Martin Jaggi

Keywords Abstract Paper

Deep Learning - Algorithms

Escaping Saddle-Point Faster under Interpolation-like Conditions

Abhishek Roy, Krishnakumar Balasubramanian, Saeed Ghadimi, Prasant Mohapatra

Keywords Abstract Paper

MFES-HB: Efficient Hyperband with Multi-Fidelity Quality Measurements

Yang Li, Yu Shen, Jiawei Jiang and Jinyang Gao, Ce Zhang, Bin Cui

Keywords Abstract Paper

Self Normalizing Flows

T. Anderson Keller, Jorn Peters, Priyank Jaini and Emiel Hoogeboom, Patrick Forré, Max Welling

Keywords Abstract Paper

Deep Learning, Generative Models

Gradient Descent Averaging and Primal-dual Averaging for Strongly Convex Optimization

Wei Tao, Wei Li, Zhisong Pan, Qing Tao

Keywords Abstract Paper

High-Dimensional Gaussian Process Inference with Derivatives

Filip de Roos, Alexandra Gessner, Philipp Hennig

Keywords Abstract Paper

Probabilistic Methods, Gaussian Processes and Bayesian non-parametrics

One Ring to Rule Them All: Certifiably Robust Geometric Perception with Outliers

Heng Yang, Luca Carlone

Keywords Abstract Paper

GLISTER: Generalization based Data Subset Selection for Efficient and Robust Learning

Krishnateja Killamsetty, Durga Sivasubramanian, Ganesh Ramakrishnan, Rishabh Iyer

Keywords Abstract Paper

Implicit differentiation of Lasso-type models for hyperparameter optimization

Quentin Bertrand, Quentin Klopfenstein, Mathieu Blondel and Samuel Vaiter, Alexandre Gramfort, Joseph Salmon

Keywords Abstract Paper

Optimization - General

Reusing Combinatorial Structure: Faster Iterative Projections over Submodular Base Polytopes

Jai Moondra, Hassan Mortagy, Swati Gupta

Keywords Abstract Paper

optimization, online learning

High-dimensional Bayesian optimization using low-dimensional feature spaces

Riccardo Moriconi, Marc Deisenroth, K. S. Sesh Kumar

Keywords Abstract Paper

Leveraging Non-uniformity in First-order Non-convex Optimization

Jincheng Mei, Yue Gao, Bo Dai and Csaba Szepesvari, Dale Schuurmans

Keywords Abstract Paper

Theory, RL, Decisions and Control Theory

Large-Scale Methods for Distributionally Robust Optimization

Daniel Levy, Yair Carmon, John Duchi, Aaron Sidford

Keywords Abstract Paper

CAQL: Continuous Action Q-Learning

Moonkyung Ryu, Yinlam Chow, Ross Anderson and Christian Tjandraatmadja, Craig Boutilier

Keywords Abstract Paper

Reinforcement learning (RL), DQN, Continuous control, Mixed-Integer Programming (MIP)

Faster Randomized Infeasible Interior Point Methods for Tall/Wide Linear Programs

Agniva Chowdhury, Palma London, Haim Avron, Petros Drineas

Keywords Abstract Paper

Ordered SGD: A New Stochastic Optimization Framework for Empirical Risk Minimization

Kenji Kawaguchi, Haihao Lu

Keywords Abstract Paper

A Riemannian Block Coordinate Descent Method for Computing the Projection Robust Wasserstein Distance

Minhui Huang, Shiqian Ma, Lifeng Lai

Keywords Abstract Paper

Algorithms, Optimal Transport

Stochastic Gradient Descent in Correlated Settings: A Study on Gaussian Processes

Hao Chen, Lili Zheng, Raed AL Kontar, Garvesh Raskutti

Keywords Abstract Paper

The Role of Momentum Parameters in the Optimal Convergence of Adaptive Polyak's Heavy-ball Methods

Wei Tao, sheng long, Gaowei Wu, Qing Tao

Keywords Abstract Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Yang Li, Yu Shen, Jiawei Jiang and
Jinyang Gao, Ce Zhang, Bin Cui

Keywords Paper

T. Anderson Keller, Jorn Peters, Priyank Jaini and
Emiel Hoogeboom, Patrick Forré, Max Welling

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Quentin Bertrand, Quentin Klopfenstein, Mathieu Blondel and
Samuel Vaiter, Alexandre Gramfort, Joseph Salmon

Keywords Paper

Keywords Paper

Keywords Paper

Jincheng Mei, Yue Gao, Bo Dai and
Csaba Szepesvari, Dale Schuurmans

Keywords Paper

Keywords Paper

Moonkyung Ryu, Yinlam Chow, Ross Anderson and
Christian Tjandraatmadja, Craig Boutilier

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Shion Takeno, Hitoshi Fukuoka, Yuhki Tsukada and
Toshiyuki Koyama, Motoki Shiga, Ichiro Takeuchi, Masayuki Karasuyama

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Jincheng Mei, Chenjun Xiao, Bo Dai and
Lihong Li, Csaba Szepesvari, Dale Schuurmans

Keywords Paper

Keywords Paper

Keywords Paper

Jiawei Huang, Ruomin Huang, wenjie liu and
Nikolaos Freris, Hu Ding

Keywords Paper