Safe Reinforcement Learning Using Advantage-Based Intervention

Abstract: Many sequential decision problems involve finding a policy that maximizes total reward while obeying safety constraints. Although much recent research has focused on the development of safe reinforcement learning (RL) algorithms that produce a safe policy after training, ensuring safety during training as well remains an open problem. A fundamental challenge is performing exploration while still satisfying constraints in an unknown Markov decision process (MDP). In this work, we address this problem for the chance-constrained setting.We propose a new algorithm, SAILR, that uses an intervention mechanism based on advantage functions to keep the agent safe throughout training and optimizes the agent's policy using off-the-shelf RL algorithms designed for unconstrained MDPs. Our method comes with strong guarantees on safety during "both" training and deployment (i.e., after training and without the intervention mechanism) and policy performance compared to the optimal safety-constrained policy. In our experiments, we show that SAILR violates constraints far less during training than standard safe RL and constrained MDP approaches and converges to a well-performing policy that can be deployed safely without intervention. Our code is available at https://github.com/nolanwagener/safe_rl.

03/05/2021

Safe Reinforcement Learning Using Advantage-Based Intervention

Nolan Wagener, Byron Boots, Ching-An Cheng

Comments

Similar Papers

Conservative Safety Critics for Exploration

Homanga Bharadhwaj, Aviral Kumar, Nicholas Rhinehart and Sergey Levine, Florian Shkurti, Animesh Garg

Keywords Abstract Paper

Safe exploration, Reinforcement Learning

Constrained Markov Decision Processes via Backward Value Functions

Harsh Satija, Philip Amortila, Joelle Pineau

Keywords Abstract Paper

Provably safe PAC-MDP exploration using analogies

Melrose Roderick, Vaishnavh Nagarajan, Zico Kolter

Keywords Abstract Paper

WCSAC: Worst-Case Soft Actor Critic for Safety-Constrained Reinforcement Learning

Qisong Yang, Thiago D. Simão, Simon H Tindemans, Matthijs T. J. Spaan

Keywords Abstract Paper

Learning Barrier Certificates: Towards Safe Reinforcement Learning with Zero Training-time Violations

Yuping Luo, Tengyu Ma

Keywords Abstract Paper

reinforcement learning and planning, adversarial robustness and security

Safe Reinforcement Learning by Imagining the Near Future

Garrett Thomas, Yuping Luo, Tengyu Ma

Keywords Abstract Paper

Neurosymbolic Reinforcement Learning with Formally Verified Exploration

Greg Anderson, Abhinav Verma, Isil Dillig, Swarat Chaudhuri

Keywords Abstract Paper

Conservative Offline Distributional Reinforcement Learning

Yecheng Ma, Dinesh Jayaraman, Osbert Bastani

Keywords Abstract Paper

High Confidence Generalization for Reinforcement Learning

James Kostas, Yash Chandak, Scott Jordan and Georgios Theocharous, Philip Thomas

Keywords Abstract Paper

Algorithms, AutoML, Probabilistic Methods, Gaussian Processes, Reinforcement Learning and Planning

Learning Policies with Zero or Bounded Constraint Violation for Constrained MDPs

Tao Liu, Ruida Zhou, Dileep Kalathil and Panganamala Kumar, Chao Tian

Keywords Abstract Paper

Safe Policy Optimization with Local Generalized Linear Function Approximations

Akifumi Wachi, Yunyue Wei, Yanan Sui

Keywords Abstract Paper

theory, optimization, reinforcement learning and planning

Safe Reinforcement Learning with Linear Function Approximation

Sanae Amani, Christos Thrampoulidis, Lin Yang

Keywords Abstract Paper

Security Analysis of Safe and Seldonian Reinforcement Learning Algorithms

Pinar Ozisik, Philip Thomas

Keywords Abstract Paper

Deep Verifier Networks: Verification of Deep Discriminative Models with Deep Generative Models

Tong Che, Xiaofeng Liu, Site Li and Yubin Ge, Ruixiang Zhang, Caiming Xiong, Yoshua Bengio

Keywords Abstract Paper

PAC Confidence Predictions for Deep Neural Network Classifiers

Sangdon Park, Shuo Li, Insup Lee, Osbert Bastani

Keywords Abstract Paper

classification, fast DNN inference, probably approximated correct guarantee, calibration, safe planning

Provably efficient safe exploration via primal-dual policy optimization

Dongsheng Ding, Xiaohan Wei, Zhuoran Yang and Zhaoran Wang, Mihailo Jovanovic

Keywords Abstract Paper

Model-Based Reinforcement Learning for Infinite-Horizon Discounted Constrained Markov Decision Processes

Aria HasanzadeZonuzy, Dileep Kalathil, Srinivas Shakkottai

Keywords Abstract Paper

Machine Learning, Reinforcement Learning, Markov Decisions Processes

Safe Policy Learning for Continuous Control

Yinlam Chow, Ofir Nachum, Aleksandra Faust and Edgar Dueñez-Guzman, Mohammad Ghavamzadeh

Keywords Abstract Paper

Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs

harsh satija, Philip S. Thomas, Joelle Pineau, Romain Laroche

Keywords Abstract Paper

Infinite Time Horizon Safety of Bayesian Neural Networks

Mathias Lechner, Đorđe Žikelić, Krishnendu Chatterjee, Thomas Henzinger

Keywords Abstract Paper

deep learning, reinforcement learning and planning

Towards Safe Policy Improvement for Non-Stationary MDPs

Yash Chandak, Scott Jordan, Georgios Theocharous and Martha White, Philip Thomas

Keywords Abstract Paper

Applications -> Computer Vision; Deep Learning -> Attention Models, Deep Learning

Gaussian Process-Based Real-Time Learning for Safety Critical Applications

Armin Lederer, Alejandro Ordóñez Conejo, Korbinian Maier and Wenxin Xiao, Jonas Umlauft, Sandra Hirche

Keywords Abstract Paper

Probabilistic Methods, Gaussian Processes and Bayesian non-parametrics

Adversarial Robustness with Semi-Infinite Constrained Learning

Homanga Bharadhwaj, Aviral Kumar, Nicholas Rhinehart and
Sergey Levine, Florian Shkurti, Animesh Garg

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

James Kostas, Yash Chandak, Scott Jordan and
Georgios Theocharous, Philip Thomas

Keywords Paper

Tao Liu, Ruida Zhou, Dileep Kalathil and
Panganamala Kumar, Chao Tian

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Tong Che, Xiaofeng Liu, Site Li and
Yubin Ge, Ruixiang Zhang, Caiming Xiong, Yoshua Bengio

Keywords Paper

Keywords Paper

Dongsheng Ding, Xiaohan Wei, Zhuoran Yang and
Zhaoran Wang, Mihailo Jovanovic

Keywords Paper

Keywords Paper

Yinlam Chow, Ofir Nachum, Aleksandra Faust and
Edgar Dueñez-Guzman, Mohammad Ghavamzadeh

Keywords Paper

Keywords Paper

Keywords Paper

Yash Chandak, Scott Jordan, Georgios Theocharous and
Martha White, Philip Thomas

Keywords Paper

Armin Lederer, Alejandro Ordóñez Conejo, Korbinian Maier and
Wenxin Xiao, Jonas Umlauft, Sandra Hirche

Keywords Paper

Alexander Robey, Luiz Chamon, George J. Pappas and
Hamed Hassani, Alejandro Ribeiro

Keywords Paper

Tsung-Yen Yang, Michael Y Hu, Yinlam Chow and
Peter J Ramadge, Karthik Narasimhan

Keywords Paper

Chen Chen, Hongyao Tang, Jianye Hao and
Wulong Liu, Zhaopeng Meng

Keywords Paper

Mansur Arief, Zhiyuan Huang, Guru Koushik Senthil Kumar and
Yuanlu Bai, Shengyi He, Wenhao Ding, Henry Lam, Ding Zhao

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Yue Wu, Shuangfei Zhai, Nitish Srivastava and
Josh Susskind, Jian Zhang, Russ Salakhutdinov, Hanlin Goh

Keywords Paper

Keywords Paper

Keywords Paper

Zhengqing Zhou, Zhengyuan Zhou, Qinxun Bai and
Linhai Qiu, Jose Blanchet, Peter Glynn

Keywords Paper

Keywords Paper