Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve Optimism, Embrace Virtual Curvature

06/12/2021

Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve Optimism, Embrace Virtual Curvature

Kefan Dong, Jiaqi Yang, Tengyu Ma

Keywords: theory, reinforcement learning and planning, bandits, online learning

Abstract Paper Similar Papers

Abstract: This paper studies model-based bandit and reinforcement learning (RL) with nonlinear function approximations. We propose to study convergence to approximate local maxima because we show that global convergence is statistically intractable even for one-layer neural net bandit with a deterministic reward. For both nonlinear bandit and RL, the paper presents a model-based algorithm, Virtual Ascent with Online Model Learner (ViOlin), which provably converges to a local maximum with sample complexity that only depends on the sequential Rademacher complexity of the model class. Our results imply novel global or local regret bounds on several concrete settings such as linear bandit with finite or sparse model class, and two-layer neural net bandit. A key algorithmic insight is that optimism may lead to over-exploration even for two-layer neural net model class. On the other hand, for convergence to local maxima, it suffices to maximize the virtual return if the model can also reasonably predict the gradient and Hessian of the real return.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2021

Optimal Gradient-based Algorithms for Non-concave Bandit Optimization

Baihe Huang, Kaixuan Huang, Sham Kakade and
Jason Lee, Qi Lei, Runzhe Wang, Jiaqi Yang

Keywords Paper

theory, deep learning, optimization, generative model, bandits

0

0

0

0

10:53

14/06/2020

A Graduated Filter Method for Large Scale Robust Estimation

Huu Le, Christopher Zach

Keywords Paper

robust fitting, bundle adjustment, non-convex, poor local minima, non-linear least squares, graduated non-convexity.

0

0

0

0

1:01

06/12/2021

Efficient First-Order Contextual Bandits: Prediction, Allocation, and Triangular Discrimination

Dylan J Foster, Akshay Krishnamurthy

Keywords Paper

theory, reinforcement learning and planning, bandits, online learning

0

0

0

0

19:34

06/12/2020

POLY-HOOT: Monte-Carlo Planning in Continuous Space MDPs with Non-Asymptotic Analysis

Weichao Mao, Kaiqing Zhang, Qiaomin Xie, Tamer Basar

Keywords Paper

Algorithms -> Meta-Learning; Algorithms -> Multitask and Transfer Learning, Reinforcement Learning and Planning -> Reinforcement Learning

0

0

0

0

3:02

06/12/2021

Bayesian decision-making under misspecified priors with applications to meta-learning

Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy and
Daniel Hsu, Thodoris Lykouris, Miro Dudik, Robert E Schapire

Keywords Paper

meta learning, bandits

0

0

0

0

14:58

06/12/2021

Understanding End-to-End Model-Based Reinforcement Learning Methods as Implicit Parameterization

Clement Gehring, Kenji Kawaguchi, Jiaoyang Huang, Leslie Kaelbling

Keywords Paper

theory, deep learning, optimization, reinforcement learning and planning

0

0

0

0

13:08

12/07/2020

The k-tied Normal Distribution: A Compact Parameterization of Gaussian Mean Field Posteriors in Bayesian Neural Networks

Jakub Swiatkowski, Kevin Roth, Bastiaan Veeling and
Linh Tran, Joshua Dillon, Jasper Snoek, Stephan Mandt, Tim Salimans, Rodolphe Jenatton, Sebastian Nowozin

Keywords Paper

Deep Learning - General

0

0

0

0

13:13

12/07/2020

Neural Contextual Bandits with UCB-based Exploration

Dongruo Zhou, Lihong Li, Quanquan Gu

Keywords Paper

Online Learning, Active Learning, and Bandits

0

0

0

0

15:12

06/12/2020

A Contour Stochastic Gradient Langevin Dynamics Algorithm for Simulations of Multi-modal Distributions

Wei Deng, Guang Lin, Faming Liang

Keywords Paper

0

0

0

0

3:26

02/02/2021

Disposable Linear Bandits for Online Recommendations

Melda Korkut, Andrew Li

Keywords Paper

0

0

0

0

17:20

06/12/2020

Markovian Score Climbing: Variational Inference with KL(p||q)

Christian Naesseth, Fredrik Lindsten, David Blei

Keywords Paper

0

0

0

0

2:30

03/05/2021

Exploring the Uncertainty Properties of Neural Networks’ Implicit Priors in the Infinite-Width Limit

Ben Adlam, Jaehoon Lee, Lechao Xiao and
Jeffrey Pennington, Jasper Snoek

Keywords Paper

Deep Learning, Bayesian Neural Networks, Neural Network Gaussian Process, Infinite-Width Limit, Uncertainty, Gaussian Process

0

0

0

0

4:34

12/07/2020

Structure Adaptive Algorithms for Stochastic Bandits

Rémy Degenne, Han Shao, Wouter Koolen

Keywords Paper

Online Learning, Active Learning, and Bandits

0

0

0

0

16:05

12/07/2020

Tightening Exploration in Upper Confidence Reinforcement Learning

Hippolyte Bourel, Odalric-Ambrym Maillard, Mohammad Sadegh Talebi

Keywords Paper

Reinforcement Learning - General

0

0

0

0

16:14

06/12/2021

Gradient Descent on Two-layer Nets: Margin Maximization and Simplicity Bias

Kaifeng Lyu, Zhiyuan Li, Runzhe Wang, Sanjeev Arora

Keywords Paper

deep learning, optimization, machine learning

0

0

0

0

14:56

06/12/2020

Provably Efficient Reinforcement Learning with Kernel and Neural Function Approximations

Zhuoran Yang, Chi Jin, Zhaoran Wang and
Mengdi Wang, Michael Jordan

Keywords Paper

0

0

0

0

3:42

12/07/2020

Fractional Underdamped Langevin Dynamics: Retargeting SGD with Momentum under Heavy-Tailed Gradient Noise

Umut Simsekli, Lingjiong Zhu, Yee Whye Teh, Mert Gurbuzbalaban

Keywords Paper

Deep Learning - Theory

0

0

0

0

15:37

06/12/2020

Robustness Analysis of Non-Convex Stochastic Gradient Descent using Biased Expectations

Kevin Scaman, Cedric Malherbe

Keywords Paper

0

0

0

0

3:09

13/04/2021

Experimental design for regret minimization in linear bandits

Andrew Wagenmaker, Julian Katz-Samuels, Kevin Jamieson

Keywords Paper

0

0

0

0

3:05

06/12/2021

Risk Minimization from Adaptively Collected Data: Guarantees for Supervised and Policy Learning

Aurelien Bibaut, Nathan Kallus, Maria Dimakopoulou and
Antoine Chambaz, Mark van der Laan

Keywords Paper

theory, reinforcement learning and planning, machine learning, bandits

0

0

0

0

16:07

12/07/2020

Dissecting Non-Vacuous Generalization Bounds based on the Mean-Field Approximation

Konstantinos Pitas

Keywords Paper

Deep Learning - Theory

0

0

0

0

12:37

06/12/2021

Breaking the Moments Condition Barrier: No-Regret Algorithm for Bandits with Super Heavy-Tailed Payoffs

Han Zhong, Jiayi Huang, Lin Yang, Liwei Wang

Keywords Paper

machine learning, bandits

0

0

0

0

3:25

06/12/2021

Last iterate convergence of SGD for Least-Squares in the Interpolation regime.

Aditya Vardhan Varre, Loucas Pillaud-Vivien, Nicolas Flammarion

Keywords Paper

deep learning, optimization

0

0

0

0

4:17

06/12/2021

MADE: Exploration via Maximizing Deviation from Explored Regions

Tianjun Zhang, Paria Rashidinejad, Jiantao Jiao and
Yuandong Tian, Joseph Gonzalez, Stuart Russell

Keywords Paper

reinforcement learning and planning

0

0

0

0

13:09

06/12/2020

SLIP: Learning to predict in unknown dynamical systems with long-term memory

Paria Rashidinejad, Jiantao Jiao, Stuart Russell

Keywords Paper

Algorithms -> Online Learning; Theory -> Learning Theory, Algorithms -> Bandit Algorithms

0

0

0

0

3:22

26/04/2020

Beyond Linearization: On Quadratic and Higher-Order Approximation of Wide Neural Networks

Yu Bai, Jason D. Lee

Keywords Paper

Neural Tangent Kernels, over-parametrized neural networks, deep learning theory

0

0

0

0

5:25

12/07/2020

Improved Optimistic Algorithms for Logistic Bandits

Louis Faury, Marc Abeille, Clément Calauzènes, Olivier Fercoq

Keywords Paper

Online Learning, Active Learning, and Bandits

0

0

0

0

15:22

18/07/2021

Wasserstein Distributional Normalization For Robust Distributional Certification of Noisy Labeled Data

Sung Woo Park, Junseok Kwon

Keywords Paper

Deep Learning, Generative Models, Algorithms, Representation Learning; Optimization, Submodular Optimization, Probabilistic Methods, Robust statistics

0

0

0

0

5:20

06/12/2020

Distributionally Robust Federated Averaging

Yuyang Deng, Mohammad Mahdi Kamani, Mehrdad Mahdavi

Keywords Paper

0

0

0

0

3:11

06/12/2020

Spike and slab variational Bayes for high dimensional logistic regression

Kolyan Ray, Botond Szabo, Gabriel Clara

Keywords Paper

0

0

0

0

3:16

06/12/2021

Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions

Mathias Niepert, Pasquale Minervini, Luca Franceschi

Keywords Paper

deep learning, optimization

0

0

0

0

15:02

06/12/2020

Analysis and Design of Thompson Sampling for Stochastic Partial Monitoring

Taira Tsuchiya, Junya Honda, Masashi Sugiyama

Keywords Paper

0

0

0

0

3:21

12/07/2020

Low Bias Low Variance Gradient Estimates for Hierarchical Boolean Stochastic Networks

Adeel Pervez, Taco Cohen, Efstratios Gavves

Keywords Paper

Deep Learning - Generative Models and Autoencoders

0

0

0

0

14:28

06/12/2020

Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms

Dheeraj Nagaraj, Xian Wu, Guy Bresler and
Prateek Jain, Praneeth Netrapalli

Keywords Paper

0

0

0

0

3:34

06/12/2021

Neural Active Learning with Performance Guarantees

Zhilei Wang, Pranjal Awasthi, Christoph Dann and
Ayush Sekhari, Claudio Gentile

Keywords Paper

deep learning, active learning

0

0

0

0

10:43

13/04/2021

One-round communication efficient distributed m-estimation

Yajie Bao, Weijia Xiong

Keywords Paper

0

0

0

0

3:00

06/12/2020

Flexible mean field variational inference using mixtures of non-overlapping exponential families

Jeffrey Spence

Keywords Paper

0

0

0

0

2:23

13/04/2021

Fundamental limits of ridge-regularized empirical risk minimization in high dimensions

Hossein Taheri, Ramtin Pedarsani, Christos Thrampoulidis

Keywords Paper

0

0

0

0

3:33

19/08/2021

Thompson Sampling for Bandits with Clustered Arms

Emil Carlsson, Devdatt Dubhashi, Fredrik D. Johansson

Keywords Paper

Machine Learning, Online Learning, Learning Theory, Reinforcement Learning

0

0

0

0

14:27

06/12/2021

Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses

Haipeng Luo, Chen-Yu Wei, Chung-Wei Lee

Keywords Paper

optimization, reinforcement learning and planning, bandits

0

0

0

0

15:17