Differentiable Trust Region Layers for Deep Reinforcement Learning

03/05/2021

Differentiable Trust Region Layers for Deep Reinforcement Learning

Fabian Otto, Philipp Becker, Vien A Ngo, Hanna Ziesche, Gerhard Neumann

Keywords: reinforcement learning, Wasserstein distance, Frobenius norm, Kullback-Leibler divergence, trust region, policy gradient, projection

Abstract Paper Similar Papers

Abstract: Trust region methods are a popular tool in reinforcement learning as they yield robust policy updates in continuous and discrete action spaces. However, enforcing such trust regions in deep reinforcement learning is difficult. Hence, many approaches, such as Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO), are based on approximations. Due to those approximations, they violate the constraints or fail to find the optimal solution within the trust region. Moreover, they are difficult to implement, often lack sufficient exploration, and have been shown to depend on seemingly unrelated implementation choices. In this work, we propose differentiable neural network layers to enforce trust regions for deep Gaussian policies via closed-form projections. Unlike existing methods, those layers formalize trust regions for each state individually and can complement existing reinforcement learning algorithms. We derive trust region projections based on the Kullback-Leibler divergence, the Wasserstein L2 distance, and the Frobenius norm for Gaussian distributions. We empirically demonstrate that those projection layers achieve similar or better results than existing methods while being almost agnostic to specific implementation choices. The code is available at https://git.io/Jthb0.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICLR 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

12/07/2020

Representations for Stable Off-Policy Reinforcement Learning

Dibya Ghosh, Marc Bellemare

Keywords Paper

Reinforcement Learning - General

0

0

0

0

14:38

26/08/2020

Nested-Wasserstein Self-Imitation Learning for Sequence Generation

Ruiyi Zhang, Changyou Chen, Zhe Gan and
Zheng Wen, Wenlin Wang, Lawrence Carin

Keywords Paper

0

0

0

0

11:18

03/08/2020

Stable Policy Optimization via Off-Policy Divergence Regularization

Ahmed Touati, Amy Zhang, Joelle Pineau, Pascal Vincent

Keywords Paper

0

0

0

0

8:30

06/12/2021

Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble

Gaon An, Seungyong Moon, Jang-Hyun Kim, Hyun Oh Song

Keywords Paper

deep learning, reinforcement learning and planning

1

0

0

0

13:50

06/12/2021

When Is Generalizable Reinforcement Learning Tractable?

Dhruv Malik, Yuanzhi Li, Pradeep Ravikumar

Keywords Paper

reinforcement learning and planning, generative model, representation learning

0

0

0

0

12:38

06/12/2021

COMBO: Conservative Offline Model-Based Policy Optimization

Tianhe Yu, Aviral Kumar, Rafael Rafailov and
Aravind Rajeswaran, Sergey Levine, Chelsea Finn

Keywords Paper

deep learning, optimization, reinforcement learning and planning

0

0

0

0

12:35

06/12/2020

Robust Deep Reinforcement Learning against Adversarial Perturbations on State Observations

Huan Zhang, Hongge Chen, Chaowei Xiao and
Bo Li, Mingyan Liu, Duane Boning, Cho-Jui Hsieh

Keywords Paper

0

0

0

0

3:18

02/02/2021

Uncertainty-Aware Policy Optimization: A Robust, Adaptive Trust Region Approach

James Queeney, Ioannis Ch. Paschalidis, Christos G. Cassandras

Keywords Paper

0

0

0

0

16:52

06/12/2021

Deep Extended Hazard Models for Survival Analysis

Qixian Zhong, Jonas Mueller, Jane-Ling Wang

Keywords Paper

deep learning

0

0

0

0

11:54

02/02/2021

Stabilizing Q Learning Via Soft Mellowmax Operator

Yaozhong Gan, Zhe Zhang, Xiaoyang Tan

Keywords Paper

0

0

0

0

18:39

06/12/2020

Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning

Sebastian Curi, Felix Berkenkamp, Andreas Krause

Keywords Paper

0

0

0

0

3:23

12/07/2020

Gradient Temporal-Difference Learning with Regularized Corrections

Sina Ghiassian, Andrew Patterson, Shivam Garg and
Dhawal Gutpa, Adam White, Martha White

Keywords Paper

Reinforcement Learning - General

0

0

0

0

10:56

06/12/2021

Learning to Predict Trustworthiness with Steep Slope Loss

Yan Luo, Yongkang Wong, Mohan Kankanhalli, Qi Zhao

Keywords Paper

deep learning, machine learning, transformers

0

0

0

0

12:22

03/05/2021

What are the Statistical Limits of Offline RL with Linear Function Approximation?

Ruosong Wang, Dean Foster, Sham M Kakade

Keywords Paper

batch reinforcement learning, representation, function approximation, lower bound

0

0

0

0

9:02

06/12/2020

PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning

Alekh Agarwal, Mikael Henaff, Sham Kakade, Wen Sun

Keywords Paper

0

0

0

0

3:13

06/12/2020

Provably Good Batch Off-Policy Reinforcement Learning Without Great Exploration

Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill

Keywords Paper

0

0

0

0

3:17

06/12/2021

Time-series Generation by Contrastive Imitation

Daniel Jarrett, Ioana Bica, Mihaela van der Schaar

Keywords Paper

generative model

0

0

0

0

8:47

06/12/2021

Adversarial Robustness with Semi-Infinite Constrained Learning

Alexander Robey, Luiz Chamon, George J. Pappas and
Hamed Hassani, Alejandro Ribeiro

Keywords Paper

theory, deep learning, optimization, robustness, adversarial robustness and security

0

0

0

0

14:48

12/07/2020

Constrained Markov Decision Processes via Backward Value Functions

Harsh Satija, Philip Amortila, Joelle Pineau

Keywords Paper

Reinforcement Learning - General

0

0

0

0

10:40

03/05/2021

FOCAL: Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization

Lanqing Li, Rui Yang, Dijun Luo

Keywords Paper

distance metric learning, offline/batch reinforcement learning, meta-reinforcement learning, contrastive learning, multi-task reinforcement learning

1

0

0

0

6:21

14/09/2020

Escaping Saddle Points of Empirical Risk Privately and Scalably via DP-Trust Region Method

Di Wang, Jinhui Xu

Keywords Paper

differential privacy, empirical risk minimization, private machine learning

0

0

0

0

15:13

12/07/2020

Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism

Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu

Keywords Paper

Online Learning, Active Learning, and Bandits

0

0

0

0

13:26

06/12/2021

Locally Valid and Discriminative Prediction Intervals for Deep Learning Models

Zhen Lin, Shubhendu Trivedi, Jimeng Sun

Keywords Paper

deep learning

0

0

0

0

12:05

18/07/2021

Combining Pessimism with Optimism for Robust and Efficient Model-Based Deep Reinforcement Learning

Sebastian Curi, Ilija Bogunovic, Andreas Krause

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

4:41

06/12/2020

Bayes Consistency vs. H-Consistency: The Interplay between Surrogate Loss Functions and the Scoring Function Class

Mingyuan Zhang, Shivani Agarwal

Keywords Paper

0

0

0

0

3:19

13/04/2021

Online model selection for reinforcement learning with function approximation

Jonathan Lee, Aldo Pacchiano, Vidya Muthukumar and
Weihao Kong, Emma Brunskill

Keywords Paper

0

0

0

0

3:15

02/02/2021

Improving Sample Efficiency in Model-Free Reinforcement Learning from Images

Denis Yarats, Amy Zhang, Ilya Kostrikov and
Brandon Amos, Joelle Pineau, Rob Fergus

Keywords Paper

0

0

0

0

12:19

02/02/2021

Classification with Strategically Withheld Data

Anilesh K. Krishnaswamy, Haoming Li, David Rein and
Hanrui Zhang, Vincent Conitzer

Keywords Paper

0

0

0

0

17:15

18/07/2021

Offline Contextual Bandits with Overparameterized Models

David Brandfonbrener, Will Whitney, Rajesh Ranganath, Joan Bruna

Keywords Paper

Optimization, Non-Convex Optimization, Reinforcement Learning and Planning, Optimization, Stochastic Optimization

0

0

0

1

6:07

17/08/2020

Learning temporal coherence via self-supervision for GAN-based video generation

Mengyu Chu, You Xie, Jonas Mayer and
Laura Leal-Taixé, Nils Thuerey

Keywords Paper

self-supervision, temporal cycle-consistency, video super-resolution, generative adversarial network, unpaired video translation

0

0

0

0

16:59

06/12/2021

Robust Predictable Control

Ben Eysenbach, Russ Salakhutdinov, Sergey Levine

Keywords Paper

reinforcement learning and planning, robustness

0

0

0

0

11:32

06/12/2021

An Analysis of Constant Step Size SGD in the Non-convex Regime: Asymptotic Normality and Bias

Lu Yu, Krishnakumar Balasubramanian, Stanislav Volgushev, Murat Erdogdu

Keywords Paper

optimization, machine learning

0

0

0

0

10:21

06/12/2020

Byzantine Resilient Distributed Multi-Task Learning

Jiani Li, Waseem Abbas, Xenofon Koutsoukos

Keywords Paper

0

0

0

0

3:19

02/02/2021

Adversarial Robustness through Disentangled Representations

Shuo Yang, Tianyu Guo, Yunhe Wang, Chang Xu

Keywords Paper

0

0

0

0

15:00

06/12/2021

An Exact Characterization of the Generalization Error for the Gibbs Algorithm

Gholamali Aminian, Yuheng Bu, Laura Toni and
Miguel Rodrigues, Gregory Wornell

Keywords Paper

0

0

0

0

15:01

30/11/2020

Bridging Adversarial and Statistical Domain Transfer via Spectral Adaptation Networks

Christoph Raab, Philipp Väth, Peter Meier, Frank-Michael Schleif

Keywords Paper

0

0

0

0

10:07

06/12/2021

Relaxing Local Robustness

Klas Leino, Matt Fredrikson

Keywords Paper

deep learning, optimization, machine learning, robustness, adversarial robustness and security

0

0

0

0

15:06

06/12/2020

Promoting Stochasticity for Expressive Policies via a Simple and Efficient Regularization Method

Qi Zhou, Yufei Kuang, Zherui Qiu and
Houqiang Li, Jie Wang

Keywords Paper

0

0

0

0

3:10

26/04/2020

CAQL: Continuous Action Q-Learning

Moonkyung Ryu, Yinlam Chow, Ross Anderson and
Christian Tjandraatmadja, Craig Boutilier

Keywords Paper

Reinforcement learning (RL), DQN, Continuous control, Mixed-Integer Programming (MIP)

0

0

0

0

5:36

06/12/2021

Leveraging Recursive Gumbel-Max Trick for Approximate Inference in Combinatorial Spaces

Kirill Struminsky, Artyom Gadetsky, Denis Rakitin and
Danil Karpushkin, Dmitry Vetrov

Keywords Paper

deep learning, optimization

0

0

0

0

14:26