Model-Based Reinforcement Learning with Value-Targeted Regression

12/07/2020

Model-Based Reinforcement Learning with Value-Targeted Regression

Zeyu Jia, Lin Yang, Csaba Szepesvari, Mengdi Wang, Alex Ayoub

Keywords: Reinforcement Learning - Theory

Abstract Paper Similar Papers

Abstract: Reinforcement learning (RL) applies to control problems with large state and action spaces, hence it is natural to consider RL with a parametric model. In this paper we focus on finite-horizon episodic RL where the transition model admits a nonlinear parametrization $P_{\theta}$, a special case of which is the linear parameterization: $P_{\theta} = \sum_{i=1}^{d} (\theta)_{i}P_{i}$. We propose an upper confidence model-based RL algorithm with value-targeted model parameter estimation. The algorithm updates the estimate of $\theta$ by solving a nonlinear regression problem using the latest value estimate as the target. We demonstrate the efficiency of our algorithm by proving its expected regret bound which, in the special case of linear parameterization takes the form $\tilde{\mathcal{O}}(d\sqrt{H^{3}T})$, where $H, T, d$ are the horizon, total number of steps and dimension of $\theta$. This regret bound is independent of the total number of states or actions, and is close to a lower bound $\Omega(\sqrt{HdT})$. In the general nonlinear case, we handle the regret analysis by using the concept of Eluder dimension proposed by \citet{RuVR14}.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICML 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

26/08/2020

A Reduction from Reinforcement Learning to No-Regret Online Learning

Ching-An Cheng, Remi Tachet des Combes, Byron Boots, Geoff Gordon

Keywords Paper

0

0

0

0

14:33

04/08/2021

Nearly Minimax Optimal Reinforcement Learning for Linear Mixture MDPs

Dongruo Zhou, Quanquan Gu, Csaba Szepesvari

Keywords Paper

0

0

0

0

16:33

18/07/2021

Model-based Reinforcement Learning for Continuous Control with Posterior Sampling

Ying Fan, Yifei Ming

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

18:34

18/07/2021

Risk-Sensitive Reinforcement Learning with Function Approximation: A Debiasing Approach

Tom Fei, Zhuoran Yang, Zhaoran Wang

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

17:05

18/07/2021

Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping

Dongruo Zhou, Jiafan He, Quanquan Gu

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

5:20

06/12/2020

Geometric Exploration for Online Control

Orestis Plevrakis, Elad Hazan

Keywords Paper

0

0

0

0

3:21

06/12/2020

Provably Efficient Reinforcement Learning with Kernel and Neural Function Approximations

Zhuoran Yang, Chi Jin, Zhaoran Wang and
Mengdi Wang, Michael Jordan

Keywords Paper

0

0

0

0

3:42

12/07/2020

Naive Exploration is Optimal for Online LQR

Max Simchowitz, Dylan Foster

Keywords Paper

Reinforcement Learning - Theory

0

0

0

0

15:12

18/07/2021

Near-Optimal Representation Learning for Linear Bandits and Linear RL

Jiachen Hu, Xiaoyu Chen, Chi Jin and
Lihong Li, Liwei Wang

Keywords Paper

Theory, Online Learning Theory

0

0

0

0

5:13

06/12/2021

A Near-Optimal Algorithm for Stochastic Bilevel Optimization via Double-Momentum

Prashant Khanduri, Siliang Zeng, Mingyi Hong and
Hoi-To Wai, Zhaoran Wang, Zhuoran Yang

Keywords Paper

optimization

0

0

0

0

9:47

06/12/2021

Improved Variance-Aware Confidence Sets for Linear Bandits and Linear Mixture MDP

Zihan Zhang, Jiaqi Yang, Xiangyang Ji, Simon Du

Keywords Paper

theory, reinforcement learning and planning, bandits

0

0

0

0

15:14

09/07/2020

Provably Efficient Reinforcement Learning with Linear Function Approximation

Chi Jin, Zhuoran Yang, Zhaoran Wang, Michael Jordan

Keywords Paper

Reinforcement learning,

0

0

0

0

13:04

18/07/2021

Logarithmic Regret for Reinforcement Learning with Linear Function Approximation

Jiafan He, Dongruo Zhou, Quanquan Gu

Keywords Paper

Algorithms, Classification, Deep Learning, CNN Architectures; Deep Learning, Visualization or Exposition Techniques for Deep Networks, Theory, RL, Decisions and Control Theory

0

0

0

0

5:15

06/12/2020

Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation

Devavrat Shah, Dogyoon Song, Zhi Xu, Yuzhe Yang

Keywords Paper

0

0

0

0

3:22

13/04/2021

On information gain and regret bounds in gaussian process bandits

Sattar Vakili, Kia Khezeli, Victor Picheny

Keywords Paper

0

0

0

0

2:52

03/05/2021

Optimism in Reinforcement Learning with Generalized Linear Function Approximation

Yining Wang, Ruosong Wang, Simon Du, Akshay Krishnamurthy

Keywords Paper

reinforcement learning, theory, exploration, function approximation, provable sample efficiency, regret analysis, optimism

0

0

0

0

4:51

04/08/2021

Non-stationary Reinforcement Learning without Prior Knowledge: an Optimal Black-box Approach

Chen-Yu Wei, Haipeng Luo

Keywords Paper

0

0

0

0

17:25

09/07/2020

Smooth Contextual Bandits: Bridging the Parametric and Non-differentiable Regret Regimes

YICHUN HU, Nathan Kallus, Xiaojie Mao

Keywords Paper

Bandit problems,

0

0

0

0

14:35

06/12/2020

Dynamic Regret of Policy Optimization in Non-Stationary Environments

Yingjie Fei, Zhuoran Yang, Zhaoran Wang, Qiaomin Xie

Keywords Paper

0

0

0

0

2:41

04/08/2021

Fast Rates for the Regret of Offline Reinforcement Learning

Yichun Hu, Nathan Kallus, Masatoshi Uehara

Keywords Paper

0

0

0

0

17:53

18/07/2021

Near-Optimal Model-Free Reinforcement Learning in Non-Stationary Episodic MDPs

Weichao Mao, Kaiqing Zhang, Ruihao Zhu and
David Simchi-Levi, Tamer Basar

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

4:12

13/04/2021

Q-learning with logarithmic regret

Kunhe Yang, Lin Yang, Simon Du

Keywords Paper

0

0

0

0

3:25

02/02/2021

Combinatorial Pure Exploration with Full-Bandit or Partial Linear Feedback

Yihan Du, Yuko Kuroki, Wei Chen

Keywords Paper

0

0

0

0

17:13

13/04/2021

Low-rank generalized linear bandit problems

Yangyi Lu, Amirhossein Meisami, Ambuj Tewari

Keywords Paper

0

0

0

0

2:49

26/08/2020

Ordered SGD: A New Stochastic Optimization Framework for Empirical Risk Minimization

Kenji Kawaguchi, Haihao Lu

Keywords Paper

0

0

0

0

14:10

06/12/2020

Making Non-Stochastic Control (Almost) as Easy as Stochastic

Max Simchowitz

Keywords Paper

0

0

0

0

2:51

12/07/2020

Tightening Exploration in Upper Confidence Reinforcement Learning

Hippolyte Bourel, Odalric-Ambrym Maillard, Mohammad Sadegh Talebi

Keywords Paper

Reinforcement Learning - General

0

0

0

0

16:14

18/07/2021

On Lower Bounds for Standard and Robust Gaussian Process Bandit Optimization

Xu Cai, Jonathan Scarlett

Keywords Paper

Applications, Natural Language Processing, Applications, Network Analysis, Reinforcement Learning and Planning, Bandits

0

0

0

0

4:19

06/12/2021

Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses

Haipeng Luo, Chen-Yu Wei, Chung-Wei Lee

Keywords Paper

optimization, reinforcement learning and planning, bandits

0

0

0

0

15:17

13/04/2021

Combinatorial gaussian process bandits with probabilistically triggered arms

Ilker Demirel, Cem Tekin

Keywords Paper

0

0

0

0

3:01

18/07/2021

One-sided Frank-Wolfe algorithms for saddle problems

Vladimir Kolmogorov, Thomas Pock

Keywords Paper

Optimization, Convex Optimization

0

0

0

0

5:07

09/07/2020

Estimating Principal Components under Adversarial Perturbations

Pranjal Awasthi, Xue Chen, Aravindan Vijayaraghavan

Keywords Paper

Unsupervised and semi-supervised learning, Adversarial learning and robustness

0

0

0

0

15:40

26/04/2020

Sample Efficient Policy Gradient Methods with Recursive Variance Reduction

Pan Xu, Felicia Gao, Quanquan Gu

Keywords Paper

Policy Gradient, Reinforcement Learning, Sample Efficiency

0

0

0

0

4:40

13/04/2021

Reinforcement learning in parametric MDPs with exponential families

Sayak Ray Chowdhury, Aditya Gopalan, Odalric-Ambrym Maillard

Keywords Paper

0

0

0

0

3:22

18/07/2021

Objective Bound Conditional Gaussian Process for Bayesian Optimization

Taewon Jeong, Heeyoung Kim

Keywords Paper

Probabilistic Methods, Gaussian Processes and Bayesian non-parametrics

0

0

0

0

5:12

06/12/2021

Best-case lower bounds in online learning

Cristóbal Guzmán, Nishant Mehta, Ali Mortazavi

Keywords Paper

theory, optimization, online learning, fairness

0

0

0

0

14:58

06/12/2020

Dynamic Regret of Convex and Smooth Functions

Peng Zhao, Yu-Jie Zhang, Lijun Zhang, Zhi-Hua Zhou

Keywords Paper

1

1

1

1

3:09

13/04/2021

Logistic q-learning

Joan Bas-Serrano, Sebastian Curi, Andreas Krause, Gergely Neu

Keywords Paper

0

0

0

0

2:44

26/08/2020

Bandit optimisation of functions in the Mat\'ern kernel RKHS

David Janz, David Burt, Javier Gonzalez

Keywords Paper

0

0

0

0

14:50

06/12/2020

Stochastic Recursive Gradient Descent Ascent for Stochastic Nonconvex-Strongly-Concave Minimax Problems

Luo Luo, Haishan Ye, Zhichao Huang, Tong Zhang

Keywords Paper

0

0

0

0

2:00