Learning Policies with Zero or Bounded Constraint Violation for Constrained MDPs

06/12/2021

Learning Policies with Zero or Bounded Constraint Violation for Constrained MDPs

Tao Liu, Ruida Zhou, Dileep Kalathil, Panganamala Kumar, Chao Tian

Keywords: reinforcement learning and planning

Abstract Paper Similar Papers

Abstract: We address the issue of safety in reinforcement learning. We pose the problem in an episodic framework of a constrained Markov decision process. Existing results have shown that it is possible to achieve a reward regret of $\tilde{\mathcal{O}}(\sqrt{K})$ while allowing an $\tilde{\mathcal{O}}(\sqrt{K})$ constraint violation in $K$ episodes. A critical question that arises is whether it is possible to keep the constraint violation even smaller. We show that when a strictly safe policy is known, then one can confine the system to zero constraint violation with arbitrarily high probability while keeping the reward regret of order $\tilde{\mathcal{O}}(\sqrt{K})$. The algorithm which does so employs the principle of optimistic pessimism in the face of uncertainty to achieve safe exploration. When no strictly safe policy is known, though one is known to exist, then it is possible to restrict the system to bounded constraint violation with arbitrarily high probability. This is shown to be realized by a primal-dual algorithm with an optimistic primal estimate and a pessimistic dual update.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

13/04/2021

Provably efficient safe exploration via primal-dual policy optimization

Dongsheng Ding, Xiaohan Wei, Zhuoran Yang and
Zhaoran Wang, Mihailo Jovanovic

Keywords Paper

0

0

0

0

3:07

18/07/2021

Safe Reinforcement Learning with Linear Function Approximation

Sanae Amani, Christos Thrampoulidis, Lin Yang

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

5:03

02/02/2021

WCSAC: Worst-Case Soft Actor Critic for Safety-Constrained Reinforcement Learning

Qisong Yang, Thiago D. Simão, Simon H Tindemans, Matthijs T. J. Spaan

Keywords Paper

0

0

0

0

17:28

06/12/2020

Stage-wise Conservative Linear Bandits

Ahmadreza Moradipari, Christos Thrampoulidis, Mahnoosh Alizadeh

Keywords Paper

0

0

0

0

3:18

19/08/2021

Model-Based Reinforcement Learning for Infinite-Horizon Discounted Constrained Markov Decision Processes

Aria HasanzadeZonuzy, Dileep Kalathil, Srinivas Shakkottai

Keywords Paper

Machine Learning, Reinforcement Learning, Markov Decisions Processes

0

0

0

0

13:26

12/07/2020

Constrained Markov Decision Processes via Backward Value Functions

Harsh Satija, Philip Amortila, Joelle Pineau

Keywords Paper

Reinforcement Learning - General

0

0

0

0

10:40

18/07/2021

Safe Reinforcement Learning Using Advantage-Based Intervention

Nolan Wagener, Byron Boots, Ching-An Cheng

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

5:09

06/12/2021

Safe Reinforcement Learning by Imagining the Near Future

Garrett Thomas, Yuping Luo, Tengyu Ma

Keywords Paper

reinforcement learning and planning

2

1

0

0

6:50

12/07/2020

Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism

Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu

Keywords Paper

Online Learning, Active Learning, and Bandits

0

0

0

0

13:26

06/12/2020

Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret

Yingjie Fei, Zhuoran Yang, Yudong Chen and
Zhaoran Wang, Qiaomin Xie

Keywords Paper

0

0

0

0

3:13

06/12/2021

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

Jean Tarbouriech, Runlong Zhou, Simon Du and
Matteo Pirotta, Michal Valko, Alessandro Lazaric

Keywords Paper

theory, reinforcement learning and planning

0

0

0

0

13:47

06/12/2021

Learning One Representation to Optimize All Rewards

Ahmed Touati, Yann Ollivier

Keywords Paper

deep learning, reinforcement learning and planning, representation learning

0

0

0

0

14:52

09/07/2020

Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium

Qiaomin Xie, Yudong Chen, Zhaoran Wang, Zhuoran Yang

Keywords Paper

Reinforcement learning, Planning and control

0

0

0

0

15:16

06/12/2021

Bayesian decision-making under misspecified priors with applications to meta-learning

Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy and
Daniel Hsu, Thodoris Lykouris, Miro Dudik, Robert E Schapire

Keywords Paper

meta learning, bandits

0

0

0

0

14:58

06/12/2021

Time Discretization-Invariant Safe Action Repetition for Policy Gradient Methods

Seohong Park, Jaekyeom Kim, Gunhee Kim

Keywords Paper

reinforcement learning and planning

0

0

0

0

8:53

18/07/2021

Shortest-Path Constrained Reinforcement Learning for Sparse Reward Tasks

Sungryull Sohn, Sungtae Lee, Jongwook Choi and
Harm van Seijen, Mehdi Fatemi, Honglak Lee

Keywords Paper

Reinforcement Learning and Planning, Deep RL

0

0

0

0

5:19

06/12/2021

Infinite Time Horizon Safety of Bayesian Neural Networks

Mathias Lechner, Đorđe Žikelić, Krishnendu Chatterjee, Thomas Henzinger

Keywords Paper

deep learning, reinforcement learning and planning

0

0

0

0

14:05

06/12/2021

Online Selective Classification with Limited Feedback

Aditya Gangrade, Anil Kag, Ashok Cutkosky, Venkatesh Saligrama

Keywords Paper

machine learning, online learning

0

0

0

0

15:14

06/12/2021

Adversarial Robustness with Semi-Infinite Constrained Learning

Alexander Robey, Luiz Chamon, George J. Pappas and
Hamed Hassani, Alejandro Ribeiro

Keywords Paper

theory, deep learning, optimization, robustness, adversarial robustness and security

0

0

0

0

14:48

13/04/2021

Provably safe PAC-MDP exploration using analogies

Melrose Roderick, Vaishnavh Nagarajan, Zico Kolter

Keywords Paper

0

0

0

0

2:51

06/12/2021

Learning in Non-Cooperative Configurable Markov Decision Processes

Giorgia Ramponi, Alberto Maria Metelli, Alessandro Concetti, Marcello Restelli

Keywords Paper

reinforcement learning and planning, online learning

0

0

0

0

14:14

18/07/2021

Randomized Exploration in Reinforcement Learning with General Value Function Approximation

Haque Ishfaq, Qiwen Cui, Viet Nguyen and
Alex Ayoub, Zhuoran Yang, Zhaoran Wang, Doina Precup, Lin Yang

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

5:22

06/12/2020

Generalization Bound of Gradient Descent for Non-Convex Metric Learning

MINGZHI DONG, Xiaochen Yang, Rui Zhu and
Yujiang Wang, Jing-Hao Xue

Keywords Paper

0

0

0

0

3:18

06/12/2021

Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble

Gaon An, Seungyong Moon, Jang-Hyun Kim, Hyun Oh Song

Keywords Paper

deep learning, reinforcement learning and planning

1

0

0

0

13:50

13/04/2021

Experimental design for regret minimization in linear bandits

Andrew Wagenmaker, Julian Katz-Samuels, Kevin Jamieson

Keywords Paper

0

0

0

0

3:05

26/04/2020

Optimistic Exploration even with a Pessimistic Initialisation

Tabish Rashid, Bei Peng, Wendelin Boehmer, Shimon Whiteson

Keywords Paper

Reinforcement Learning, Exploration, Optimistic Initialisation

0

0

0

0

5:06

13/04/2021

Instance-wise minimax-optimal algorithms for logistic bandits

Marc Abeille, Louis Faury, Clement Calauzenes

Keywords Paper

0

0

0

0

3:06

06/12/2020

Certifiably Adversarially Robust Detection of Out-of-Distribution Data

Julian Bitterwolf, Alexander Meinke, Matthias Hein

Keywords Paper

0

0

0

0

3:20

06/12/2021

RL for Latent MDPs: Regret Guarantees and a Lower Bound

Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor

Keywords Paper

reinforcement learning and planning

0

0

0

0

13:24

06/12/2020

Is Plug-in Solver Sample-Efficient for Feature-based Reinforcement Learning?

Qiwen Cui, Lin Yang

Keywords Paper

Algorithms -> Semi-Supervised Learning; Deep Learning -> Deep Autoencoders; Deep Learning -> Generative Models, Probabilistic Methods -> Variational Inference

0

0

0

0

3:25

06/12/2021

An Efficient Pessimistic-Optimistic Algorithm for Stochastic Linear Bandits with General Constraints

Xin Liu, Bin Li, Pengyi Shi, Lei Ying

Keywords Paper

optimization, bandits

0

0

0

0

12:44

18/07/2021

A Sharp Analysis of Model-based Reinforcement Learning with Self-Play

Qinghua Liu, Tiancheng Yu, Yu Bai, Chi Jin

Keywords Paper

Algorithms, Multitask and Transfer Learning, Algorithms, Meta-Learning; Applications, Object Recognition; Data, Challenges, Implementations, and Software, Benchmarks;, Theory, RL, Decisions and Control Theory

0

0

0

0

4:49

13/04/2021

Provably eﬃcient actor-critic for risk-sensitive and robust adversarial RL: A linear-quadratic case

Yufeng Zhang, Zhuoran Yang, Zhaoran Wang

Keywords Paper

0

0

0

0

2:53

13/04/2021

Finite-sample regret bound for distributionally robust offline tabular reinforcement learning

Zhengqing Zhou, Zhengyuan Zhou, Qinxun Bai and
Linhai Qiu, Jose Blanchet, Peter Glynn

Keywords Paper

0

0

0

0

3:02

02/02/2021

Decentralized Policy Gradient Descent Ascent for Safe Multi-Agent Reinforcement Learning

Songtao Lu, Kaiqing Zhang, Tianyi Chen and
Tamer Başar, Lior Horesh

Keywords Paper

0

0

0

0

16:54

06/12/2020

Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model

Gen Li, Yuting Wei, Yuejie Chi and
Yuantao Gu, Yuxin Chen

Keywords Paper

0

0

0

0

3:09

06/12/2021

Relaxing Local Robustness

Klas Leino, Matt Fredrikson

Keywords Paper

deep learning, optimization, machine learning, robustness, adversarial robustness and security

0

0

0

0

15:06

12/07/2020

Adaptive Reward-Poisoning Attacks against Reinforcement Learning

Xuezhou Zhang, Yuzhe Ma, Adish Singla, Jerry Zhu

Keywords Paper

Trustworthy Machine Learning

0

0

0

0

14:19

26/08/2020

Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning

Ming Yin, Yu-Xiang Wang

Keywords Paper

0

0

0

0

14:17

02/02/2021

Robust Reinforcement Learning: A Case Study in Linear Quadratic Regulation

Bo Pang, Zhong-Ping Jiang

Keywords Paper

0

0

0

0

20:01