09/07/2020

Model-Based Reinforcement Learning with a Generative Model is Minimax Optimal

Alekh Agarwal, Sham Kakade, Lin Yang

Keywords: Reinforcement learning, Sampling algorithms

Abstract: This work considers the sample and computational complexity of obtaining an $\eps$-optimal policy in a discounted Markov Decision Process (MDP), given only access to a generative model. In this model, the learner accesses the underlying transition model via a sampling oracle that provides a sample of the next state, when given any state-action pair as input. We are interested in a basic and\nunresolved question in model based planning: is this na\"ive ``plug-in'' approach --- where we build the maximum likelihood estimate of the transition model in the MDP from observations and then find an optimal policy in this empirical MDP --- non-asymptotically, minimax optimal? Our main result answers this question positively. With regards to computation, our result provides a simpler\napproach towards minimax optimal planning: in comparison to prior model-free results, we show that using \emph{any} high accuracy, black-box planning oracle in the empirical model suffices to obtain the minimax error rate. The key proof technique uses a leave-one-out analysis, in a novel ``absorbing MDP'' construction, to decouple the statistical dependency issues that arise in the analysis of model-based planning; this construction may be helpful more generally.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at COLT 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd Characters remaining: 140

Similar Papers