Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs

Abstract: We study the problem of Safe Policy Improvement (SPI) under constraints in the offline Reinforcement Learning (RL) setting. We consider the scenario where: (i) we have a dataset collected under a known baseline policy, (ii) multiple reward signals are received from the environment inducing as many objectives to optimize. We present an SPI formulation for this RL setting that takes into account the preferences of the algorithm’s user for handling the trade-offs for different reward signals while ensuring that the new policy performs at least as well as the baseline policy along each individual objective. We build on traditional SPI algorithms and propose a novel method based on Safe Policy Iteration with Baseline Bootstrapping (SPIBB, Laroche et al., 2019) that provides high probability guarantees on the performance of the agent in the true environment. We show the effectiveness of our method on a synthetic grid-world safety task as well as in a real-world critical care context to learn a policy for the administration of IV fluids and vasopressors to treat sepsis.

03/05/2021

Applied computing, Life and medical sciences, Health care information systems, Computing methodologies, Machine learning, Learning paradigms, Reinforcement learning, Sequential decision making, Learning settings, Batch learning

7:45

06/12/2021

Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs

harsh satija, Philip S. Thomas, Joelle Pineau, Romain Laroche

Comments

Similar Papers

Conservative Safety Critics for Exploration

Homanga Bharadhwaj, Aviral Kumar, Nicholas Rhinehart and Sergey Levine, Florian Shkurti, Animesh Garg

Keywords Abstract Paper

Safe exploration, Reinforcement Learning

Safe Policy Learning for Continuous Control

Yinlam Chow, Ofir Nachum, Aleksandra Faust and Edgar Dueñez-Guzman, Mohammad Ghavamzadeh

Keywords Abstract Paper

Security Analysis of Safe and Seldonian Reinforcement Learning Algorithms

Pinar Ozisik, Philip Thomas

Keywords Abstract Paper

Model-Based Reinforcement Learning for Infinite-Horizon Discounted Constrained Markov Decision Processes

Aria HasanzadeZonuzy, Dileep Kalathil, Srinivas Shakkottai

Keywords Abstract Paper

Machine Learning, Reinforcement Learning, Markov Decisions Processes

Infinite Time Horizon Safety of Bayesian Neural Networks

Mathias Lechner, Đorđe Žikelić, Krishnendu Chatterjee, Thomas Henzinger

Keywords Abstract Paper

deep learning, reinforcement learning and planning

Provably efficient safe exploration via primal-dual policy optimization

Dongsheng Ding, Xiaohan Wei, Zhuoran Yang and Zhaoran Wang, Mihailo Jovanovic

Keywords Abstract Paper

Generalized Proximal Policy Optimization with Sample Reuse

James Queeney, Yannis Paschalidis, Christos G Cassandras

Keywords Abstract Paper

optimization, reinforcement learning and planning

Safe Reinforcement Learning Using Advantage-Based Intervention

Nolan Wagener, Byron Boots, Ching-An Cheng

Keywords Abstract Paper

Theory, RL, Decisions and Control Theory

First Order Constrained Optimization in Policy Space

Yiming Zhang, Quan Vuong, Keith Ross

Keywords Abstract Paper

Defining admissible rewards for high-confidence policy evaluation in batch reinforcement learning

Niranjani Prasad, Barbara Engelhardt, Finale Doshi-Velez

Keywords Abstract Paper

Applied computing, Life and medical sciences, Health care information systems, Computing methodologies, Machine learning, Learning paradigms, Reinforcement learning, Sequential decision making, Learning settings, Batch learning

Counterexample Guided RL Policy Refinement Using Bayesian Optimization

Briti Gangopadhyay, Pallab Dasgupta

Keywords Abstract Paper

optimization, reinforcement learning and planning

Provably safe PAC-MDP exploration using analogies

Melrose Roderick, Vaishnavh Nagarajan, Zico Kolter

Keywords Abstract Paper

High Confidence Generalization for Reinforcement Learning

James Kostas, Yash Chandak, Scott Jordan and Georgios Theocharous, Philip Thomas

Keywords Abstract Paper

Algorithms, AutoML, Probabilistic Methods, Gaussian Processes, Reinforcement Learning and Planning

WCSAC: Worst-Case Soft Actor Critic for Safety-Constrained Reinforcement Learning

Qisong Yang, Thiago D. Simão, Simon H Tindemans, Matthijs T. J. Spaan

Keywords Abstract Paper

CRPO: A New Approach for Safe Reinforcement Learning with Convergence Guarantee

Tengyu Xu, Yingbin LIANG, Guanghui Lan

Keywords Abstract Paper

Theory, RL, Decisions and Control Theory

Balancing Learning Speed and Stability in Policy Gradient via Adaptive Exploration

Matteo Papini, Andrea Battistello, Marcello Restelli

Keywords Abstract Paper

Safe Policy Optimization with Local Generalized Linear Function Approximations

Akifumi Wachi, Yunyue Wei, Yanan Sui

Keywords Abstract Paper

theory, optimization, reinforcement learning and planning

Towards Safe Policy Improvement for Non-Stationary MDPs

Yash Chandak, Scott Jordan, Georgios Theocharous and Martha White, Philip Thomas

Keywords Abstract Paper

Applications -> Computer Vision; Deep Learning -> Attention Models, Deep Learning

Guaranteeing Safety of Learned Perception Modules via Measurement-Robust Control Barrier Functions

Sarah Dean, Andrew Taylor, Ryan Cosner and Benjamin Recht, Aaron Ames

Keywords Abstract Paper

Neurosymbolic Reinforcement Learning with Formally Verified Exploration

Greg Anderson, Abhinav Verma, Isil Dillig, Swarat Chaudhuri

Keywords Abstract Paper

Off-Dynamics Reinforcement Learning: Training for Transfer with Domain Classifiers

Ben Eysenbach, Shreyas Chaudhari, Swapnil Asawa and Sergey Levine, Ruslan Salakhutdinov

Keywords Abstract Paper

reinforcement learning, domain adaptation, transfer learning

Efficient Planning under Partial Observability with Unnormalized Q Functions and Spectral Learning

Homanga Bharadhwaj, Aviral Kumar, Nicholas Rhinehart and
Sergey Levine, Florian Shkurti, Animesh Garg

Keywords Paper

Yinlam Chow, Ofir Nachum, Aleksandra Faust and
Edgar Dueñez-Guzman, Mohammad Ghavamzadeh

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Dongsheng Ding, Xiaohan Wei, Zhuoran Yang and
Zhaoran Wang, Mihailo Jovanovic

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

James Kostas, Yash Chandak, Scott Jordan and
Georgios Theocharous, Philip Thomas

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Yash Chandak, Scott Jordan, Georgios Theocharous and
Martha White, Philip Thomas

Keywords Paper

Sarah Dean, Andrew Taylor, Ryan Cosner and
Benjamin Recht, Aaron Ames

Keywords Paper

Keywords Paper

Ben Eysenbach, Shreyas Chaudhari, Swapnil Asawa and
Sergey Levine, Ruslan Salakhutdinov

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Andreea-Ioana Deac, Petar Veličković, Ognjen Milinkovic and
Pierre-Luc Bacon, Jian Tang, Mladen Nikolic

Keywords Paper

Keywords Paper

Zhaohan Guo, Bernardo Avila Pires, Mohammad Gheshlaghi Azar and
Bilal Piot, Florent Altché, Jean-Bastien Grill, Remi Munos

Keywords Paper

Keywords Paper

Jun Yamada, Youngwoon Lee, Gautam Salhotra and
Karl Pertsch, Max Pflueger, Gaurav Sukhatme, Joseph Lim, Peter Englert

Keywords Paper

Keywords Paper

Keywords Paper

Zengyi Qin, Kaiqing Zhang, chenyx Chen and
Jingkai Chen, Chuchu Fan

Keywords Paper

Keywords Paper

Aaron Sonabend, Junwei Lu, Leo Anthony Celi and
Tianxi Cai, Peter Szolovits

Keywords Paper

Keywords Paper

Keywords Paper