12/07/2020

Efficiently Solving MDPs with Stochastic Mirror Descent

Yujia Jin, Aaron Sidford

Keywords: Optimization - Convex

Abstract: In this paper we present a unified framework based on primal-dual stochastic mirror descent for approximately solving infinite-horizon Markov decision processes (MDPs) given a generative model. When applied to an average-reward MDP with \A total actions and mixing time bound \tmix our method computes an \eps-optimal policy with an expected \Otil(\tmix^2 \A \eps^{-2}) samples from the state-transition matrix, removing the ergodicity dependence of prior art. When applied to a \gamma-discounted MDP with A total actions our method computes an eps-optimal policy with an expected \Otil((1-\gamma)^{-4} \A \eps^{-2}) samples, improving over the best-known primal-dual methods while matching the state-of-the-art up to a (1-\gamma)^{-1} factor. Both methods are model-free, update state values and policy simultaneously, and run in time linear in the number of samples taken.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at ICML 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd Characters remaining: 140

Similar Papers