Bayesian Distributional Policy Gradients

Abstract: Distributional Reinforcement Learning (RL) maintains the entire probability distribution of the reward-to-go, i.e. the return, providing more learning signals that account for the uncertainty associated with policy performance, which may be beneficial for trading off exploration and exploitation and policy learning in general. Previous works in distributional RL focused mainly on computing the state-action-return distributions, here we model the state-return distributions. This enables us to translate successful conventional RL algorithms that are based on state values into distributional RL. We formulate the distributional Bellman operation as an inference-based auto-encoding process that minimises Wasserstein metrics between target/model return distributions. The proposed algorithm, BDPG (Bayesian Distributional Policy Gradients), uses adversarial training in joint-contrastive learning to estimate a variational posterior from the returns. Moreover, we can now interpret the return prediction uncertainty as an information gain, which allows to obtain a new curiosity measure that helps BDPG steer exploration actively and efficiently. We demonstrate in a suite of Atari 2600 games and MuJoCo tasks, including well known hard-exploration challenges, how BDPG learns generally faster and with higher asymptotic performance than reference distributional RL algorithms.

06/12/2021

Bayesian Distributional Policy Gradients

Luchen Li, A. Aldo Faisal

Comments

Similar Papers

Regret Minimization Experience Replay in Off-Policy Reinforcement Learning

Xu-Hui Liu, Zhenghai Xue, Jingcheng Pang and Shengyi Jiang, Feng Xu, Yang Yu

Keywords Abstract Paper

theory, reinforcement learning and planning

Value-driven Hindsight Modelling

Arthur Guez, Fabio Viola, Theophane Weber and Lars Buesing, Steven Kapturowski, Doina Precup, David Silver, Nicolas Heess

Keywords Abstract Paper

Minimax Weight and Q-Function Learning for Off-Policy Evaluation

Masatoshi Uehara, Jiawei Huang, Nan Jiang

Keywords Abstract Paper

Reinforcement Learning - Theory

Distributional Reinforcement Learning via Moment Matching

Thanh Nguyen-Tang, Sunil Gupta, Svetha Venkatesh

Keywords Abstract Paper

The Value-Improvement Path: Towards Better Representations for Reinforcement Learning

Will Dabney, André Barreto, Mark Rowland and Robert Dadashi, John Quan, Marc G. Bellemare, David Silver

Keywords Abstract Paper

APS: Active Pretraining with Successor Features

Hao Liu, Pieter Abbeel

Keywords Abstract Paper

Reinforcement Learning and Planning, Deep RL

GaussianPath:A Bayesian Multi-Hop Reasoning Framework for Knowledge Graph Reasoning

Guojia Wan, Bo Du

Keywords Abstract Paper

Joint Inference of Reward Machines and Policies for Reinforcement Learning

Zhe Xu, Ivan Gavran, Yousef Ahmad and Rupak Majumdar, Daniel Neider, Ufuk Topcu, Bo Wu

Keywords Abstract Paper

Reward Machines, Automata Learning, Reinforcement Learning

Conservative Offline Distributional Reinforcement Learning

Yecheng Ma, Dinesh Jayaraman, Osbert Bastani

Keywords Abstract Paper

reinforcement learning and planning

Adversarial Intrinsic Motivation for Reinforcement Learning

Ishan Durugkar, Mauricio Tec, Scott Niekum, Peter Stone

Keywords Abstract Paper

reinforcement learning and planning, generative model

A Local Temporal Difference Code for Distributional Reinforcement Learning

Pablo Tano, Peter Dayan, Alexandre Pouget

Keywords Abstract Paper

Data Valuation using Reinforcement Learning

Jinsung Yoon, Sercan Arik, Tomas Pfister

Keywords Abstract Paper

Deep Learning - Algorithms

Provably Efficient Algorithms for Multi-Objective Competitive RL

Tiancheng Yu, Yi Tian, Jingzhao Zhang, Suvrit Sra

Keywords Abstract Paper

Theory, RL, Decisions and Control Theory

Deep Reinforcement and InfoMax Learning

Bogdan Mazoure, Remi Tachet des Combes, Thang Doan and Philip Bachman, R Devon Hjelm

Keywords Abstract Paper

A Deep Reinforcement Learning Approach to Marginalized Importance Sampling with the Successor Representation

Scott Fujimoto, David Meger, Doina Precup

Keywords Abstract Paper

Reinforcement Learning and Planning, Deep RL

C-Learning: Learning to Achieve Goals via Recursive Classification

Ben Eysenbach, Ruslan Salakhutdinov, Sergey Levine

Keywords Abstract Paper

reinforcement learning, goal reaching, density estimation, hindsight relabeling, Q-learning

Robust Deep Reinforcement Learning through Adversarial Loss

Tuomas Oikarinen, Wang Zhang, Alexandre Megretski and Luca Daniel, Tsui-Wei Weng

Keywords Abstract Paper

reinforcement learning and planning, robustness, adversarial robustness and security

Sequential Generative Exploration Model for Partially Observable Reinforcement Learning

Haiyan Yin, Jianda Chen, Sinno Jialin Pan, Sebastian Tschiatschek

Keywords Abstract Paper

A New Representation of Successor Features for Transfer across Dissimilar Environments

Majid Abdolshah, Hung Le, Thommen Karimpanal George and Sunil Gupta, Santu Rana, Svetha Venkatesh

Keywords Abstract Paper

Reinforcement Learning and Planning

Local policy search with Bayesian optimization

Sarah Müller, Alexander von Rohr, Sebastian Trimpe

Keywords Abstract Paper

theory, optimization, reinforcement learning and planning, active learning

Correcting experience replay for multi-agent communication

Sanjeevan Ahilan, Peter Dayan

Keywords Abstract Paper

Xu-Hui Liu, Zhenghai Xue, Jingcheng Pang and
Shengyi Jiang, Feng Xu, Yang Yu

Keywords Paper

Arthur Guez, Fabio Viola, Theophane Weber and
Lars Buesing, Steven Kapturowski, Doina Precup, David Silver, Nicolas Heess

Keywords Paper

Keywords Paper

Keywords Paper

Will Dabney, André Barreto, Mark Rowland and
Robert Dadashi, John Quan, Marc G. Bellemare, David Silver

Keywords Paper

Keywords Paper

Keywords Paper

Zhe Xu, Ivan Gavran, Yousef Ahmad and
Rupak Majumdar, Daniel Neider, Ufuk Topcu, Bo Wu

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Bogdan Mazoure, Remi Tachet des Combes, Thang Doan and
Philip Bachman, R Devon Hjelm

Keywords Paper

Keywords Paper

Keywords Paper

Tuomas Oikarinen, Wang Zhang, Alexandre Megretski and
Luca Daniel, Tsui-Wei Weng

Keywords Paper

Keywords Paper

Majid Abdolshah, Hung Le, Thommen Karimpanal George and
Sunil Gupta, Santu Rana, Svetha Venkatesh

Keywords Paper

Keywords Paper

Keywords Paper

Max Schwarzer, Ankesh Anand, Rishab Goel and
R Devon Hjelm, Aaron Courville, Philip Bachman

Keywords Paper

Philip Ball, Jack Parker-Holder, Aldo Pacchiano and
Krzysztof Choromanski, Stephen Roberts

Keywords Paper

Keywords Paper

Keywords Paper

Lili Chen, Kevin Lu, Aravind Rajeswaran and
Kimin Lee, Aditya Grover, Misha Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch

Keywords Paper

Keywords Paper

Hanbo Zhang, Site Bai, Xuguang Lan and
David Hsu, Nanning Zheng

Keywords Paper

Keywords Paper

Keywords Paper

Guy Lorberbom, Chris J. Maddison, Nicolas Heess and
Tamir Hazan, Daniel Tarlow

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Zichuan Lin, Derek Yang, Li Zhao and
Tao Qin, Guangwen Yang, Tie-Yan Liu

Keywords Paper