12/07/2020

Model-Based Reinforcement Learning with Value-Targeted Regression

Zeyu Jia, Lin Yang, Csaba Szepesvari, Mengdi Wang, Alex Ayoub

Keywords: Reinforcement Learning - Theory

Abstract: Reinforcement learning (RL) applies to control problems with large state and action spaces, hence it is natural to consider RL with a parametric model. In this paper we focus on finite-horizon episodic RL where the transition model admits a nonlinear parametrization $P_{\theta}$, a special case of which is the linear parameterization: $P_{\theta} = \sum_{i=1}^{d} (\theta)_{i}P_{i}$. We propose an upper confidence model-based RL algorithm with value-targeted model parameter estimation. The algorithm updates the estimate of $\theta$ by solving a nonlinear regression problem using the latest value estimate as the target. We demonstrate the efficiency of our algorithm by proving its expected regret bound which, in the special case of linear parameterization takes the form $\tilde{\mathcal{O}}(d\sqrt{H^{3}T})$, where $H, T, d$ are the horizon, total number of steps and dimension of $\theta$. This regret bound is independent of the total number of states or actions, and is close to a lower bound $\Omega(\sqrt{HdT})$. In the general nonlinear case, we handle the regret analysis by using the concept of Eluder dimension proposed by \citet{RuVR14}.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at ICML 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd Characters remaining: 140

Similar Papers