Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning

02/02/2021

Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning

Shangtong Zhang, Bo Liu, Shimon Whiteson

Keywords:

Abstract Paper Similar Papers

Abstract: We present a mean-variance policy iteration (MVPI) framework for risk-averse control in a discounted infinite horizon MDP optimizing the variance of a per-step reward random variable. MVPI enjoys great flexibility in that any policy evaluation method and risk-neutral control method can be dropped in for risk-averse control off the shelf, in both on- and off-policy settings. This flexibility reduces the gap between risk-neutral control and risk-averse control and is achieved by working on a novel augmented MDP directly. We propose risk-averse TD3 as an example instantiating MVPI, which outperforms vanilla TD3 and many previous risk-averse control methods in challenging Mujoco robot simulation tasks under a risk-aware performance metric. This risk-averse TD3 is the first to introduce deterministic policies and off-policy learning into risk-averse reinforcement learning, both of which are key to the performance boost we show in Mujoco domains.

The video of this talk cannot be embedded. You can watch it here:

https://slideslive.com/38949046

(Link will open in new window)

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at AAAI 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2021

Conservative Offline Distributional Reinforcement Learning

Yecheng Ma, Dinesh Jayaraman, Osbert Bastani

Keywords Paper

reinforcement learning and planning

1

0

0

0

13:54

06/12/2020

Efficient Exploration of Reward Functions in Inverse Reinforcement Learning via Bayesian Optimization

Sreejith Balakrishnan, Quoc Phong Nguyen, Bryan Kian Hsiang Low, Harold Soh

Keywords Paper

0

0

0

0

3:22

18/07/2021

A Deep Reinforcement Learning Approach to Marginalized Importance Sampling with the Successor Representation

Scott Fujimoto, David Meger, Doina Precup

Keywords Paper

Reinforcement Learning and Planning, Deep RL

1

0

0

0

3:50

14/06/2020

QEBA: Query-Efficient Boundary-Based Blackbox Attack

Huichen Li, Xiaojun Xu, Xiaolu Zhang and
Shuang Yang, Bo Li

Keywords Paper

adversarial machine learning, black-box attack, boundary-based attack, attacking public api

0

0

0

0

1:01

18/07/2021

APS: Active Pretraining with Successor Features

Hao Liu, Pieter Abbeel

Keywords Paper

Reinforcement Learning and Planning, Deep RL

0

0

0

0

14:29

18/07/2021

DriftSurf: Stable-State / Reactive-State Learning under Concept Drift

Ashraf Tahmasbi, Ellango Jothimurugesan, Srikanta Tirthapura, Phil Gibbons

Keywords Paper

Algorithms, Online Learning Algorithms

0

0

0

0

5:07

26/04/2020

Robust Reinforcement Learning for Continuous Control with Model Misspecification

Daniel J. Mankowitz, Nir Levine, Rae Jeong and
Abbas Abdolmaleki, Jost Tobias Springenberg, Yuanyuan Shi, Jackie Kay, Todd Hester, Timothy Mann, Martin Riedmiller

Keywords Paper

reinforcement learning, robustness

0

0

0

0

5:24

02/02/2021

Sequential Generative Exploration Model for Partially Observable Reinforcement Learning

Haiyan Yin, Jianda Chen, Sinno Jialin Pan, Sebastian Tschiatschek

Keywords Paper

0

0

0

0

14:40

03/05/2021

Autoregressive Dynamics Models for Offline Policy Evaluation and Optimization

Michael Zhang, Tom Paine, Ofir Nachum and
Cosmin Paduraru, George Tucker, ziyu wang, Mohammad Norouzi

Keywords Paper

offline reinforcement learning, autoregressive models, Off-policy policy evaluation, policy optimization

0

0

0

0

4:49

18/07/2021

Learning and Planning in Average-Reward Markov Decision Processes

Yi Wan, Abhishek Naik, Richard Sutton

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

5:05

18/07/2021

Average-Reward Off-Policy Policy Evaluation with Function Approximation

Shangtong Zhang, Yi Wan, Richard Sutton, Shimon Whiteson

Keywords Paper

Theory

0

0

0

0

5:14

03/05/2021

Targeted Attack against Deep Neural Networks via Flipping Limited Weight Bits

Jiawang Bai, Baoyuan Wu, Yong Zhang and
Yiming Li, Zhifeng Li, Shu-Tao Xia

Keywords Paper

weight attack, bit-flip, targeted attack

0

0

0

0

5:00

16/11/2020

Positive-Unlabeled Reward Learning

Danfei Xu, Misha Denil

Keywords Paper

0

0

0

0

5:04

06/12/2021

A Max-Min Entropy Framework for Reinforcement Learning

Seungyul Han, Youngchul Sung

Keywords Paper

optimization, reinforcement learning and planning

0

0

0

0

14:35

26/10/2020

Real Time Crowd Navigation from First Principles of Probability Theory

Peter Trautman, Karankumar Patel

Keywords Paper

Human robot interaction, crowd navigation, machine learning for pedestrian prediction

0

0

0

0

9:40

06/12/2020

Munchausen Reinforcement Learning

Nino Vieillard, Olivier Pietquin, Matthieu Geist

Keywords Paper

0

0

0

0

3:19

26/04/2020

Keep Doing What Worked: Behavior Modelling Priors for Offline Reinforcement Learning

Noah Siegel, Jost Tobias Springenberg, Felix Berkenkamp and
Abbas Abdolmaleki, Michael Neunert, Thomas Lampe, Roland Hafner, Nicolas Heess, Martin Riedmiller

Keywords Paper

Reinforcement Learning, Off-policy, Multitask, Continuous Control

0

0

0

0

5:04

26/08/2020

Mixed Strategies for Robust Optimization of Unknown Objectives

Pier Giuseppe Sessa, Ilija Bogunovic, Maryam Kamgarpour, Andreas Krause

Keywords Paper

0

0

0

0

14:13

19/08/2021

Hindsight Trust Region Policy Optimization

Hanbo Zhang, Site Bai, Xuguang Lan and
David Hsu, Nanning Zheng

Keywords Paper

Machine Learning, Deep Reinforcement Learning, Reinforcement Learning

0

0

0

0

13:14

19/08/2021

Inferring Time-delayed Causal Relations in POMDPs from the Principle of Independence of Cause and Mechanism

Junchi Liang, Abdeslam Boularias

Keywords Paper

Knowledge Representation and Reasoning, Action, Change and Causality, Cognitive Robotics

0

0

0

0

13:50

06/12/2020

Non-Crossing Quantile Regression for Distributional Reinforcement Learning

Fan Zhou, Jianing Wang, Xingdong Feng

Keywords Paper

0

0

0

0

3:11

03/05/2021

Data-Efficient Reinforcement Learning with Self-Predictive Representations

Max Schwarzer, Ankesh Anand, Rishab Goel and
R Devon Hjelm, Aaron Courville, Philip Bachman

Keywords Paper

Representation Learning, Self-Supervised Learning, Reinforcement Learning, Sample Efficiency

0

0

0

1

10:04

06/12/2021

Learning State Representations from Random Deep Action-conditional Predictions

Zeyu Zheng, Vivek Veeriah, Risto Vuorio and
Richard L Lewis, Satinder Singh

Keywords Paper

reinforcement learning and planning, representation learning

0

0

0

0

11:44

18/07/2021

PODS: Policy Optimization via Differentiable Simulation

Miguel Angel Zamora Mora, Momchil Peychev, Sehoon Ha and
Martin Vechev, Stelian Coros

Keywords Paper

Reinforcement Learning and Planning, Deep RL

0

0

0

0

4:28

03/05/2021

HalentNet: Multimodal Trajectory Forecasting with Hallucinative Intents

Deyao Zhu, Mohamed Zahran, Li Erran Li, Mohamed Elhoseiny

Keywords Paper

0

0

0

0

5:18

02/02/2021

DeepSynth: Automata Synthesis for Automatic Task Segmentation in Deep Reinforcement Learning

Mohammadhosein Hasanbeig, Natasha Yogananda Jeppu, Alessandro Abate and
Tom Melham, Daniel Kroening

Keywords Paper

0

0

0

0

15:45

06/12/2021

Tactical Optimism and Pessimism for Deep Reinforcement Learning

Ted Moskovitz, Jack Parker-Holder, Aldo Pacchiano and
Michael Arbel, Michael Jordan

Keywords Paper

reinforcement learning and planning, bandits

0

0

0

0

6:30

18/07/2021

Shortest-Path Constrained Reinforcement Learning for Sparse Reward Tasks

Sungryull Sohn, Sungtae Lee, Jongwook Choi and
Harm van Seijen, Mehdi Fatemi, Honglak Lee

Keywords Paper

Reinforcement Learning and Planning, Deep RL

0

0

0

0

5:19

06/12/2021

Replay-Guided Adversarial Environment Design

Minqi Jiang, Michael Dennis, Jack Parker-Holder and
Jakob Foerster, Edward Grefenstette, Tim Rocktäschel

Keywords Paper

theory, reinforcement learning and planning, robustness, self-supervised learning, continual learning

0

0

0

0

12:04

14/06/2020

BiDet: An Efficient Binarized Object Detector

Ziwei Wang, Ziyi Wu, Jiwen Lu, Jie Zhou

Keywords Paper

binary neural networks, object detection, information bottleneck, sparse object priors, false positive elimination

0

0

0

0

1:00

18/07/2021

Quantum algorithms for reinforcement learning with a generative model

Daochen Wang, Aarthi Sundaram, Robin Kothari and
Ashish Kapoor, Martin Roetteler

Keywords Paper

Optimization, Non-Convex Optimization, Algorithms, Collaborative Filtering; Applications, Information Retrieval; Applications, Matrix and Tensor Factorization; , Theory, RL, Decisions and Control Theory

0

0

0

0

4:55

06/12/2020

Multifaceted Uncertainty Estimation for Label-Efficient Deep Learning

Weishi Shi, Xujiang Zhao, Feng Chen, Qi Yu

Keywords Paper

0

0

0

0

3:15

18/07/2021

DFAC Framework: Factorizing the Value Function via Quantile Mixture for Multi-Agent Distributional Q-Learning

Wei-Fang Sun, Cheng-Kuang Lee, Chun-Yi Lee

Keywords Paper

Reinforcement Learning and Planning, Multi-Agent RL

0

0

0

0

5:43

06/12/2021

Adversarial Intrinsic Motivation for Reinforcement Learning

Ishan Durugkar, Mauricio Tec, Scott Niekum, Peter Stone

Keywords Paper

reinforcement learning and planning, generative model

0

0

0

0

13:11

06/12/2021

OpenMatch: Open-Set Semi-supervised Learning with Open-set Consistency Regularization

Kuniaki Saito, Donghyun Kim, Kate Saenko

Keywords Paper

semi-supervised learning

0

0

0

0

11:12

02/02/2021

Exploration by Maximizing Renyi Entropy for Reward-Free RL Framework

Chuheng Zhang, Yuanying Cai, Longbo Huang, Jian Li

Keywords Paper

0

0

0

0

16:03

06/12/2021

Control Variates for Slate Off-Policy Evaluation

Nikos Vlassis, Ashok Chandrashekar, Fernando Amat, Nathan Kallus

Keywords Paper

optimization, bandits

0

0

0

0

12:25

06/12/2020

Causal Shapley Values: Exploiting Causal Knowledge to Explain Individual Predictions of Complex Models

Tom Heskes, Evi Sijben, Ioan Gabriel Bucur, Tom Claassen

Keywords Paper

0

0

0

0

3:07

06/12/2021

Is Bang-Bang Control All You Need? Solving Continuous Control with Bernoulli Policies

Tim Seyde, Igor Gilitschenski, Wilko Schwarting and
Bartolomeo Stellato, Martin Riedmiller, Markus Wulfmeier, Daniela Rus

Keywords Paper

reinforcement learning and planning

0

0

0

0

6:48

06/12/2021

Model Selection for Bayesian Autoencoders

Ba-Hien Tran, Simone Rossi, Dimitrios Milios and
Pietro Michiardi, Edwin Bonilla, Maurizio Filippone

Keywords Paper

optimization, self-supervised learning, generative model, representation learning

0

0

0

0

10:49