Navigating to the Best Policy in Markov Decision Processes

06/12/2021

Navigating to the Best Policy in Markov Decision Processes

Aymen Al Marjani, Aurélien Garivier, Alexandre Proutiere

Keywords: theory, reinforcement learning and planning

Abstract Paper Similar Papers

Abstract: We investigate the classical active pure exploration problem in Markov Decision Processes, where the agent sequentially selects actions and, from the resulting system trajectory, aims at identifying the best policy as fast as possible. We propose a problem-dependent lower bound on the average number of steps required before a correct answer can be given with probability at least $1-\delta$. We further provide the first algorithm with an instance-specific sample complexity in this setting. This algorithm addresses the general case of communicating MDPs; we also propose a variant with a reduced exploration rate (and hence faster convergence) under an additional ergodicity assumption. This work extends previous results relative to the \emph{generative setting}~\cite{pmlr-v139-marjani21a}, where the agent could at each step query the random outcome of any (state, action) pair. In contrast, we show here how to deal with the \emph{navigation constraints}, induced by the \emph{online setting}. Our analysis relies on an ergodic theorem for non-homogeneous Markov chains which we consider of wide interest in the analysis of Markov Decision Processes.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

04/08/2021

Adaptivity in Adaptive Submodularity

Hossein Esfandiari, Amin Karbasi, Vahab Mirrokni

Keywords Paper

0

0

0

0

13:54

06/12/2021

Efficient Online Estimation of Causal Effects by Deciding What to Observe

Shantanu Gupta, Zachary Lipton, David Childers

Keywords Paper

reinforcement learning and planning, graph learning, causality

0

0

0

0

14:18

09/07/2020

Root-n-Regret for Learning in Markov Decision Processes with Function Approximation and Low Bellman Rank

Kefan Dong, Jian Peng, Yining Wang, Yuan Zhou

Keywords Paper

Reinforcement learning,

0

0

0

0

12:46

06/12/2021

Counterfactual Explanations in Sequential Decision Making Under Uncertainty

Stratis Tsirtsis, Abir De, Manuel Rodriguez

Keywords Paper

causality, interpretability

0

0

0

0

13:10

02/02/2021

Evolution Strategies for Approximate Solution of Bayesian Games

Zun Li, Michael P. Wellman

Keywords Paper

0

0

0

0

18:18

26/10/2020

Learning Neural Search Policies for Classical Planning

Pawel Gomoluch, Dalal Alrajeh, Alessandra Russo, Antonio Bucchiarone

Keywords Paper

classical planning, policy search, machine learning, cross-entropy method

0

0

0

0

9:55

26/08/2020

Finite-Time Error Bounds for Biased Stochastic Approximation with Applications to Q-Learning

Gang Wang, Georgios B. Giannakis

Keywords Paper

0

0

0

0

14:03

18/07/2021

A Sharp Analysis of Model-based Reinforcement Learning with Self-Play

Qinghua Liu, Tiancheng Yu, Yu Bai, Chi Jin

Keywords Paper

Algorithms, Multitask and Transfer Learning, Algorithms, Meta-Learning; Applications, Object Recognition; Data, Challenges, Implementations, and Software, Benchmarks;, Theory, RL, Decisions and Control Theory

0

0

0

0

4:49

04/08/2021

Minimax Regret for Stochastic Shortest Path with Adversarial Costs and Known Transition

Liyu Chen, Haipeng Luo, Chen-Yu Wei

Keywords Paper

0

0

0

0

14:48

06/12/2021

Adaptive Online Packing-guided Search for POMDPs

Chenyang Wu, Guoyu Yang, Zongzhang Zhang and
Yang Yu, Dong Li, Wulong Liu, Jianye Hao

Keywords Paper

0

0

0

0

13:30

18/07/2021

Provably Correct Optimization and Exploration with Non-linear Policies

Fei Feng, Wotao Yin, Alekh Agarwal, Lin Yang

Keywords Paper

Deep Learning, Adversarial Networks, Applications, Fairness, Accountability, and Transparency, Theory, RL, Decisions and Control Theory

0

0

0

0

5:03

12/07/2020

Reward-Free Exploration for Reinforcement Learning

Chi Jin, Akshay Krishnamurthy, Max Simchowitz, Tiancheng Yu

Keywords Paper

Reinforcement Learning - Theory

0

0

0

0

14:37

02/02/2021

Combinatorial Pure Exploration with Full-Bandit or Partial Linear Feedback

Yihan Du, Yuko Kuroki, Wei Chen

Keywords Paper

0

0

0

0

17:13

02/02/2021

An Efficient Algorithm for Deep Stochastic Contextual Bandits

Tan Zhu, Guannan Liang, Chunjiang Zhu and
Haining Li, Jinbo Bi

Keywords Paper

0

0

0

0

14:36

03/08/2020

No-regret Exploration in Contextual Reinforcement Learning

Aditya Modi, Ambuj Tewari

Keywords Paper

0

0

0

0

8:19

18/07/2021

Finding the Stochastic Shortest Path with Low Regret: the Adversarial Cost and Unknown Transition Case

Liyu Chen, Haipeng Luo

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

5:08

26/08/2020

Solving Discounted Stochastic Two-Player Games with Near-Optimal Time and Sample Complexity

Aaron Sidford, Mengdi Wang, Lin Yang, Yinyu Ye

Keywords Paper

0

0

0

0

14:51

18/07/2021

Towards Tight Bounds on the Sample Complexity of Average-reward MDPs

Yujia Jin, Aaron Sidford

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

5:05

06/12/2021

On the Convergence Theory of Debiased Model-Agnostic Meta-Reinforcement Learning

Alireza Fallah, Kristian Georgiev, Aryan Mokhtari, Asuman Ozdaglar

Keywords Paper

theory, optimization, reinforcement learning and planning, meta learning

0

1

1

0

12:25

18/07/2021

Provably Efficient Algorithms for Multi-Objective Competitive RL

Tiancheng Yu, Yi Tian, Jingzhao Zhang, Suvrit Sra

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

17:04

06/12/2021

Continuous Mean-Covariance Bandits

Yihan Du, Siwei Wang, Zhixuan Fang, Longbo Huang

Keywords Paper

bandits

0

0

0

0

11:33

06/12/2020

Geometric Exploration for Online Control

Orestis Plevrakis, Elad Hazan

Keywords Paper

0

0

0

0

3:21

18/07/2021

Robust Reinforcement Learning using Least Squares Policy Iteration with Provable Performance Guarantees

Kishan Panaganti, Dileep Kalathil

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

5:15

13/04/2021

Contextual blocking bandits

Soumya Basu, Orestis Papadigenopoulos, Constantine Caramanis, Sanjay Shakkottai

Keywords Paper

0

0

0

0

2:47

18/07/2021

Adaptive Sampling for Best Policy Identification in Markov Decision Processes

Aymen Al Marjani, Alexandre Proutiere

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

5:35

06/12/2020

Robust Multi-Agent Reinforcement Learning with Model Uncertainty

Kaiqing Zhang, TAO SUN, Yunzhe Tao and
Sahika Genc, Sunil Mallya, Tamer Basar

Keywords Paper

0

0

0

0

3:11

06/12/2021

Accommodating Picky Customers: Regret Bound and Exploration Complexity for Multi-Objective Reinforcement Learning

Jingfeng Wu, Vladimir Braverman, Lin Yang

Keywords Paper

theory, reinforcement learning and planning

0

0

0

0

14:22

06/12/2020

Is Long Horizon RL More Difficult Than Short Horizon RL?

Ruosong Wang, Simon Du, Lin Yang, Sham Kakade

Keywords Paper

0

0

0

0

3:20

06/12/2021

Taming Communication and Sample Complexities in Decentralized Policy Evaluation for Cooperative Multi-Agent Reinforcement Learning

Xin Zhang, Zhuqing Liu, Jia Liu and
Zhengyuan Zhu, Songtao Lu

Keywords Paper

theory, optimization, reinforcement learning and planning

0

0

0

0

14:54

18/07/2021

Online Selection Problems against Constrained Adversary

Zhihao Jiang, Pinyan Lu, Zhihao Gavin Tang, Yuhao Zhang

Keywords Paper

Algorithms

0

0

0

0

5:16

26/04/2020

SVQN: Sequential Variational Soft Q-Learning Networks

Shiyu Huang, Hang Su, Jun Zhu, Ting Chen

Keywords Paper

reinforcement learning, POMDP, variational inference, generative model

0

0

0

0

4:52

06/12/2020

Preference-based Reinforcement Learning with Finite-Time Guarantees

Yichong Xu, Ruosong Wang, Lin Yang and
Aarti Singh, Artur Dubrawski

Keywords Paper

0

0

0

0

3:04

06/12/2020

Variational Policy Gradient Method for Reinforcement Learning with General Utilities

Junyu Zhang, Alec Koppel, Amrit Bedi and
Csaba Szepesvari, Mengdi Wang

Keywords Paper

0

0

0

0

3:20

26/04/2020

Posterior sampling for multi-agent reinforcement learning: solving extensive games with imperfect information

Yichi Zhou, Jialian Li, Jun Zhu

Keywords Paper

0

0

0

0

12:55

06/12/2021

Regime Switching Bandits

Xiang Zhou, Yi Xiong, Ningyuan Chen, Xuefeng GAO

Keywords Paper

reinforcement learning and planning, bandits, online learning

0

0

0

0

13:47

18/11/2020

Constrained reinforcement learning via policy splitting

Haoxian Chen, Henry Lam, Fengpei Li, Amirhossein Meisami

Keywords Paper

0

0

0

0

10:34

06/12/2021

Local policy search with Bayesian optimization

Sarah Müller, Alexander von Rohr, Sebastian Trimpe

Keywords Paper

theory, optimization, reinforcement learning and planning, active learning

0

0

0

0

11:42

03/05/2021

Fidelity-based Deep Adiabatic Scheduling

Eli Ovits, Lior Wolf

Keywords Paper

0

0

0

0

9:45

18/07/2021

Dynamic Planning and Learning under Recovering Rewards

David Simchi-Levi, Zeyu Zheng, Feng Zhu

Keywords Paper

Reinforcement Learning and Planning, Bandits

0

0

0

0

4:53

03/05/2021

Variational Intrinsic Control Revisited

Taehwan Kwon

Keywords Paper

Information theory, Unsupervised reinforcement learning

0

0

0

0

5:56