Root-n-Regret for Learning in Markov Decision Processes with Function Approximation and Low Bellman Rank

09/07/2020

Root-n-Regret for Learning in Markov Decision Processes with Function Approximation and Low Bellman Rank

Kefan Dong, Jian Peng, Yining Wang, Yuan Zhou

Keywords: Reinforcement learning,

Abstract Paper Similar Papers

Abstract: In this paper, we consider the problem of online learning of Markov decision processes (MDPs) with very\nlarge state spaces. Under the assumptions of realizable function approximation and low Bellman ranks, we\ndevelop an online learning algorithm that learns the optimal value function while at the same time achieving\nvery low cumulative regret during the learning process. Our learning algorithm, Adaptive Value-function\nElimination (AVE), is inspired by the policy elimination algorithm proposed in Jiang et al. (2017), known\nas OLIVE. One of our key technical contributions in AVE is to formulate the elimination steps in OLIVE as\ncontextual bandit problems. This technique enables us to apply the active elimination and expert weighting\nmethods from Dudik et al. (2011), instead of the random action exploration scheme used in the original\nOLIVE algorithm, for more efficient exploration and better control of the regret incurred in each policy\n elimination step. To the best of our knowledge, this is the first in stochastic MDPs with general value function approximation.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at COLT 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

04/08/2021

Online Markov Decision Processes with Aggregate Bandit Feedback

Alon Cohen, Haim Kaplan, Tomer Koren, Yishay Mansour

Keywords Paper

0

0

0

0

13:07

04/08/2021

Minimax Regret for Stochastic Shortest Path with Adversarial Costs and Known Transition

Liyu Chen, Haipeng Luo, Chen-Yu Wei

Keywords Paper

0

0

0

0

14:48

06/12/2020

Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition

Tiancheng Jin, Haipeng Luo

Keywords Paper

0

0

0

0

3:39

06/12/2021

Learning-to-learn non-convex piecewise-Lipschitz functions

Maria-Florina Balcan, Mikhail Khodak, Dravyansh Sharma, Ameet S Talwalkar

Keywords Paper

optimization, machine learning, robustness, meta learning, online learning

0

0

0

0

14:13

18/07/2021

Improved Regret Bound and Experience Replay in Regularized Policy Iteration

Nevena Lazic, Dong Yin, Yasin Abbasi-Yadkori, Csaba Szepesvari

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

20:03

04/08/2021

Efficient Bandit Convex Optimization: Beyond Linear Losses

Arun Sai Suggala, Pradeep Ravikumar, Praneeth Netrapalli

Keywords Paper

0

0

0

0

20:29

06/12/2021

Neural Active Learning with Performance Guarantees

Zhilei Wang, Pranjal Awasthi, Christoph Dann and
Ayush Sekhari, Claudio Gentile

Keywords Paper

deep learning, active learning

0

0

0

0

10:43

06/12/2020

Stateful Posted Pricing with Vanishing Regret via Dynamic Deterministic Markov Decision Processes

Yuval Emek, Ron Lavi, Rad Niazadeh, Yangguang Shi

Keywords Paper

0

0

0

0

3:10

06/12/2020

POMDPs in Continuous Time and Discrete Spaces

Bastian Alt, Matthias Schultheis, Heinz Koeppl

Keywords Paper

0

0

0

0

3:01

02/02/2021

Projection-free Online Learning in Dynamic Environments

Yuanyu Wan, Bo Xue, Lijun Zhang

Keywords Paper

0

0

0

0

15:41

26/08/2020

Improved Regret Bounds for Projection-free Bandit Convex Optimization

Dan Garber, Ben Kretzu

Keywords Paper

0

0

0

0

6:03

18/07/2021

EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL

Seyed Kamyar Seyed Ghasemipour, Dale Schuurmans, Shixiang Gu

Keywords Paper

Reinforcement Learning and Planning

0

0

0

1

5:54

06/12/2020

Delay and Cooperation in Nonstochastic Linear Bandits

Shinji Ito, Daisuke Hatano, Hanna Sumita and
Kei Takemura, Takuro Fukunaga, Naonori Kakimura, Ken-Ichi Kawarabayashi

Keywords Paper

0

0

0

0

3:19

06/12/2020

Follow the Perturbed Leader: Optimism and Fast Parallel Algorithms for Smooth Minimax Games

Arun Suggala, Praneeth Netrapalli

Keywords Paper

1

1

0

0

3:29

26/08/2020

Derivative-Free & Order-Robust Optimisation

Haitham Ammar, Victor Gabillon, Rasul Tutunov, Michal Valko

Keywords Paper

0

0

0

0

13:21

06/12/2020

Geometric Exploration for Online Control

Orestis Plevrakis, Elad Hazan

Keywords Paper

0

0

0

0

3:21

13/04/2021

Online k-means clustering

Vincent Cohen-Addad, Benjamin Guedj, Varun Kanade, Guy Rom

Keywords Paper

0

0

0

0

2:52

26/08/2020

A Reduction from Reinforcement Learning to No-Regret Online Learning

Ching-An Cheng, Remi Tachet des Combes, Byron Boots, Geoff Gordon

Keywords Paper

0

0

0

0

14:33

04/08/2021

Optimal Dynamic Regret in Exp-Concave Online Learning

Dheeraj Baby, Yu-Xiang Wang

Keywords Paper

1

1

0

1

16:30

06/12/2020

A Bandit Learning Algorithm and Applications to Auction Design

Kim Thang Nguyen

Keywords Paper

0

0

0

0

2:43

06/12/2021

Online learning in MDPs with linear function approximation and bandit feedback.

Gergely Neu, Julia Olkhovskaya

Keywords Paper

reinforcement learning and planning, bandits, online learning

0

0

0

0

13:24

13/04/2021

Experimental design for regret minimization in linear bandits

Andrew Wagenmaker, Julian Katz-Samuels, Kevin Jamieson

Keywords Paper

0

0

0

0

3:05

02/02/2021

Policy Optimization as Online Learning with Mediator Feedback

Alberto Maria Metelli, Matteo Papini, Pierluca D'Oro, Marcello Restelli

Keywords Paper

0

0

0

0

16:44

12/07/2020

Projection-free Distributed Online Convex Optimization with $O(\sqrt{T})$ Communication Complexity

Yuanyu Wan, Wei-Wei Tu, Lijun Zhang

Keywords Paper

Online Learning, Active Learning, and Bandits

0

0

0

0

11:48

12/07/2020

Learning Adversarial Markov Decision Processes with Bandit Feedback and Unknown Transition

Chi Jin, Tiancheng Jin, Haipeng Luo and
Suvrit Sra, Tiancheng Yu

Keywords Paper

Reinforcement Learning - Theory

0

0

0

0

15:14

06/12/2021

Near-Optimal Offline Reinforcement Learning via Double Variance Reduction

Ming Yin, Yu Bai, Yu-Xiang Wang

Keywords Paper

theory, optimization, reinforcement learning and planning

0

0

0

0

8:57

18/07/2021

Learning Online Algorithms with Distributional Advice

Ilias Diakonikolas, Vasilis Kontonis, Christos Tzamos and
Ali Vakilian, Nikos Zarifis

Keywords Paper

Algorithms

0

0

0

0

5:45

06/12/2020

Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs

Chung-Wei Lee, Haipeng Luo, Chen-Yu Wei, Mengxiao Zhang

Keywords Paper

0

0

0

0

3:16

02/02/2021

Model-Free Online Learning in Unknown Sequential Decision Making Problems and Games

Gabriele Farina, Tuomas Sandholm

Keywords Paper

0

0

0

0

17:09

12/07/2020

Online Learning with Imperfect Hints

Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, Manish Purohit

Keywords Paper

Online Learning, Active Learning, and Bandits

1

1

1

1

13:17

18/07/2021

Provably Correct Optimization and Exploration with Non-linear Policies

Fei Feng, Wotao Yin, Alekh Agarwal, Lin Yang

Keywords Paper

Deep Learning, Adversarial Networks, Applications, Fairness, Accountability, and Transparency, Theory, RL, Decisions and Control Theory

0

0

0

0

5:03

06/12/2021

Bayesian decision-making under misspecified priors with applications to meta-learning

Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy and
Daniel Hsu, Thodoris Lykouris, Miro Dudik, Robert E Schapire

Keywords Paper

meta learning, bandits

0

0

0

0

14:58

13/04/2021

Federated multi-armed bandits with personalization

Chengshuai Shi, Cong Shen, Jing Yang

Keywords Paper

0

0

0

0

2:52

09/07/2020

Online Learning with Vector Costs and Bandits with Knapsacks

Thomas Kesselheim, Sahil Singla

Keywords Paper

Online learning, Approximation algorithms, Bandit problems

0

0

0

0

15:18

06/12/2021

Continuous Mean-Covariance Bandits

Yihan Du, Siwei Wang, Zhixuan Fang, Longbo Huang

Keywords Paper

bandits

0

0

0

0

11:33

06/12/2021

Best-case lower bounds in online learning

Cristóbal Guzmán, Nishant Mehta, Ali Mortazavi

Keywords Paper

theory, optimization, online learning, fairness

0

0

0

0

14:58

09/07/2020

Tsallis-INF for Decoupled Exploration and Exploitation in Multi-armed Bandits

Chloé Rouyer , Yevgeny Seldin

Keywords Paper

Bandit problems, Online learning

0

0

0

0

15:30

06/12/2020

An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits

Andrea Tirinzoni, Matteo Pirotta, Marcello Restelli, Alessandro Lazaric

Keywords Paper

0

0

0

0

3:13

18/07/2021

Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping

Dongruo Zhou, Jiafan He, Quanquan Gu

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

5:20

18/07/2021

Improved Corruption Robust Algorithms for Episodic Reinforcement Learning

Yifang Chen, Simon Du, Kevin Jamieson

Keywords Paper

, Optimization, Non-Convex Optimization, Theory, Online Learning Theory

0

0

0

0

5:20