Accelerating Safe Reinforcement Learning with Constraint-mismatched Baseline Policies

Abstract: We consider the problem of reinforcement learning when provided with (1) a baseline control policy and (2) a set of constraints that the learner must satisfy. The baseline policy can arise from demonstration data or a teacher agent and may provide useful cues for learning, but it might also be sub-optimal for the task at hand, and is not guaranteed to satisfy the specified constraints, which might encode safety, fairness or other application-specific requirements. In order to safely learn from baseline policies, we propose an iterative policy optimization algorithm that alternates between maximizing expected return on the task, minimizing distance to the baseline policy, and projecting the policy onto the constraint-satisfying set. We analyze our algorithm theoretically and provide a finite-time convergence guarantee. In our experiments on five different control tasks, our algorithm consistently outperforms several state-of-the-art baselines, achieving 10 times fewer constraint violations and 40% higher reward on average.

26/04/2020

Accelerating Safe Reinforcement Learning with Constraint-mismatched Baseline Policies

Jimmy Yang, Justinian Rosca, Karthik Narasimhan, Peter Ramadge

Comments

Similar Papers

Maxmin Q-learning: Controlling the Estimation Bias of Q-learning

Qingfeng Lan, Yangchen Pan, Alona Fyshe, Martha White

Keywords Abstract Paper

reinforcement learning, bias and variance reduction

Density Constrained Reinforcement Learning

Zengyi Qin, Yuxiao Chen, Chuchu Fan

Keywords Abstract Paper

Reinforcement Learning and Planning

Adaptive Discretization for Model-Based Reinforcement Learning

Sean Sinclair, Tianyu Wang, Gauri Jain and Sid Banerjee, Christina Yu

Keywords Abstract Paper

Adversarial Intrinsic Motivation for Reinforcement Learning

Ishan Durugkar, Mauricio Tec, Scott Niekum, Peter Stone

Keywords Abstract Paper

reinforcement learning and planning, generative model

Model-based Adversarial Meta-Reinforcement Learning

Zichuan Lin, Garrett Thomas, Guangwen Yang, Tengyu Ma

Keywords Abstract Paper

Finite-sample regret bound for distributionally robust offline tabular reinforcement learning

Zhengqing Zhou, Zhengyuan Zhou, Qinxun Bai and Linhai Qiu, Jose Blanchet, Peter Glynn

Keywords Abstract Paper

Learning Routines for Effective Off-Policy Reinforcement Learning

Edoardo Cetin, Oya Celiktutan

Keywords Abstract Paper

Reinforcement Learning and Planning, Deep RL

Leveraging Recursive Gumbel-Max Trick for Approximate Inference in Combinatorial Spaces

Kirill Struminsky, Artyom Gadetsky, Denis Rakitin and Danil Karpushkin, Dmitry Vetrov

Keywords Abstract Paper

deep learning, optimization

MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven Reinforcement Learning

Kevin Li, Abhishek Gupta, Ashwin D Reddy and Vitchyr Pong, Aurick Zhou, Justin Yu, Sergey Levine

Keywords Abstract Paper

Reinforcement Learning and Planning, Deep RL

Time-Consistent Self-Supervision for Semi-Supervised Learning

Tianyi Zhou, Shengjie Wang, Jeff Bilmes

Keywords Abstract Paper

Unsupervised and Semi-Supervised Learning

Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration

Andrea Zanette, Alessandro Lazaric, Mykel J Kochenderfer, Emma Brunskill

Keywords Abstract Paper

The Advantage of Conditional Meta-Learning for Biased Regularization and Fine Tuning

Giulia Denevi, Massimiliano Pontil, Carlo Ciliberto

Keywords Abstract Paper

Sequential Transfer in Reinforcement Learning with a Generative Model

Andrea Tirinzoni, Riccardo Poiani, Marcello Restelli

Keywords Abstract Paper

Reinforcement Learning - General

Efficient Training of Retrieval Models using Negative Cache

Erik Lindgren, Sashank Reddi, Ruiqi Guo, Sanjiv Kumar

Keywords Abstract Paper

deep learning, machine learning

Adaptive Sampling for Minimax Fair Classification

Shubhanshu Shekhar, Greg Fields, Mohammad Ghavamzadeh, Tara Javidi

Keywords Abstract Paper

deep learning, machine learning, fairness

Inverse Reinforcement Learning from a Gradient-based Learner

Giorgia Ramponi, Gianluca Drappo, Marcello Restelli

Keywords Abstract Paper

Temporal-Logic-Based Reward Shaping for Continuing Reinforcement Learning Tasks

Yuqian Jiang, Suda Bharadwaj, Bo Wu and Rishi Shah, Ufuk Topcu, Peter Stone

Keywords Abstract Paper

Structured Prediction for Conditional Meta-Learning

Ruohan Wang, Yiannis Demiris, Carlo Ciliberto

Keywords Abstract Paper

Submodular Meta-Learning

Arman Adibi, Aryan Mokhtari, Hamed Hassani

Keywords Abstract Paper

An Identifiable Double VAE For Disentangled Representations

Graziano Mita, Maurizio Filippone, Pietro Michiardi

Keywords Abstract Paper

Deep Learning, Adversarial Networks, Deep Learning, Generative Models

Hierarchical Reinforcement Learning with Timed Subgoals

Nico Gürtler, Dieter Büchler, Georg Martius

Keywords Abstract Paper

reinforcement learning and planning

Time-series Generation by Contrastive Imitation

Keywords Paper

Keywords Paper

Sean Sinclair, Tianyu Wang, Gauri Jain and
Sid Banerjee, Christina Yu

Keywords Paper

Keywords Paper

Keywords Paper

Zhengqing Zhou, Zhengyuan Zhou, Qinxun Bai and
Linhai Qiu, Jose Blanchet, Peter Glynn

Keywords Paper

Keywords Paper

Kirill Struminsky, Artyom Gadetsky, Denis Rakitin and
Danil Karpushkin, Dmitry Vetrov

Keywords Paper

Kevin Li, Abhishek Gupta, Ashwin D Reddy and
Vitchyr Pong, Aurick Zhou, Justin Yu, Sergey Levine

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Yuqian Jiang, Suda Bharadwaj, Bo Wu and
Rishi Shah, Ufuk Topcu, Peter Stone

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Fei Feng, Ruosong Wang, Wotao Yin and
Simon Du, Lin Yang

Keywords Paper

Keywords Paper

Keywords Paper

MINGZHI DONG, Xiaochen Yang, Rui Zhu and
Yujiang Wang, Jing-Hao Xue

Keywords Paper

Keywords Paper

Keywords Paper

Hiroki Furuta, Tatsuya Matsushima, Tadashi Kozuno and
Yutaka Matsuo, Sergey Levine, Ofir Nachum, Shixiang Gu

Keywords Paper

Baifeng Shi, Judy Hoffman, Kate Saenko and
Trevor Darrell, Huijuan Xu

Keywords Paper

Keywords Paper