Regret Minimization Experience Replay in Off-Policy Reinforcement Learning

06/12/2021

Regret Minimization Experience Replay in Off-Policy Reinforcement Learning

Xu-Hui Liu, Zhenghai Xue, Jingcheng Pang, Shengyi Jiang, Feng Xu, Yang Yu

Keywords: theory, reinforcement learning and planning

Abstract Paper Similar Papers

Abstract: In reinforcement learning, experience replay stores past samples for further reuse. Prioritized sampling is a promising technique to better utilize these samples. Previous criteria of prioritization include TD error, recentness and corrective feedback, which are mostly heuristically designed. In this work, we start from the regret minimization objective, and obtain an optimal prioritization strategy for Bellman update that can directly maximize the return of the policy. The theory suggests that data with higher hindsight TD error, better on-policiness and more accurate Q value should be assigned with higher weights during sampling. Thus most previous criteria only consider this strategy partially. We not only provide theoretical justifications for previous criteria, but also propose two new methods to compute the prioritization weight, namely ReMERN and ReMERT. ReMERN learns an error network, while ReMERT exploits the temporal ordering of states. Both methods outperform previous prioritized sampling algorithms in challenging RL benchmarks, including MuJoCo, Atari and Meta-World.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

02/02/2021

Bayesian Distributional Policy Gradients

Luchen Li, A. Aldo Faisal

Keywords Paper

1

0

0

0

18:06

18/07/2021

Dynamic Balancing for Model Selection in Bandits and RL

Ashok Cutkosky, Christoph Dann, Abhimanyu Das and
Claudio Gentile, Aldo Pacchiano, Manish Purohit

Keywords Paper

Reinforcement Learning and Planning, Bandits

0

0

0

0

5:18

12/07/2020

Revisiting Fundamentals of Experience Replay

William Fedus, Prajit Ramachandran, Rishabh Agarwal and
Yoshua Bengio, Hugo Larochelle, Mark Rowland, Will Dabney

Keywords Paper

Reinforcement Learning - Deep RL

1

0

0

0

13:35

06/12/2020

Rewriting History with Inverse RL: Hindsight Inference for Policy Improvement

Benjamin Eysenbach, XINYANG GENG, Sergey Levine, Russ Salakhutdinov

Keywords Paper

Optimization -> Non-Convex Optimization, Theory -> Statistical Physics of Learning

0

0

0

0

3:19

12/07/2020

Minimax Weight and Q-Function Learning for Off-Policy Evaluation

Masatoshi Uehara, Jiawei Huang, Nan Jiang

Keywords Paper

Reinforcement Learning - Theory

0

0

0

0

14:20

18/07/2021

A New Representation of Successor Features for Transfer across Dissimilar Environments

Majid Abdolshah, Hung Le, Thommen Karimpanal George and
Sunil Gupta, Santu Rana, Svetha Venkatesh

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

5:43

18/07/2021

Representation Matters: Offline Pretraining for Sequential Decision Making

Mengjiao Yang, Ofir Nachum

Keywords Paper

Reinforcement Learning and Planning

1

0

0

0

5:06

12/09/2020

Temporal Logic Monitoring Rewards via Transducers

Giuseppe De Giacomo, Marco Favorito, Luca Iocchi and
Fabio Patrizi, Alessandro Ronca

Keywords Paper

Symbolic reinforcement learning-General, Reasoning about actions and change, action languages-General

0

0

0

0

12:40

26/10/2020

Joint Inference of Reward Machines and Policies for Reinforcement Learning

Zhe Xu, Ivan Gavran, Yousef Ahmad and
Rupak Majumdar, Daniel Neider, Ufuk Topcu, Bo Wu

Keywords Paper

Reward Machines, Automata Learning, Reinforcement Learning

0

0

0

0

9:57

18/07/2021

Improved Corruption Robust Algorithms for Episodic Reinforcement Learning

Yifang Chen, Simon Du, Kevin Jamieson

Keywords Paper

, Optimization, Non-Convex Optimization, Theory, Online Learning Theory

0

0

0

0

5:20

02/02/2021

Theoretically Principled Deep RL Acceleration via Nearest Neighbor Function Approximation

Junhong Shen, Lin F. Yang

Keywords Paper

0

0

0

0

19:12

06/12/2021

Offline Reinforcement Learning as One Big Sequence Modeling Problem

Michael Janner, Qiyang Li, Sergey Levine

Keywords Paper

reinforcement learning and planning, transformers, language

0

0

0

0

9:48

12/07/2020

Data Valuation using Reinforcement Learning

Jinsung Yoon, Sercan Arik, Tomas Pfister

Keywords Paper

Deep Learning - Algorithms

0

0

0

0

14:35

02/02/2021

The Value-Improvement Path: Towards Better Representations for Reinforcement Learning

Will Dabney, André Barreto, Mark Rowland and
Robert Dadashi, John Quan, Marc G. Bellemare, David Silver

Keywords Paper

0

0

0

0

20:06

06/12/2021

Local policy search with Bayesian optimization

Sarah Müller, Alexander von Rohr, Sebastian Trimpe

Keywords Paper

theory, optimization, reinforcement learning and planning, active learning

0

0

0

0

11:42

06/12/2020

A Local Temporal Difference Code for Distributional Reinforcement Learning

Pablo Tano, Peter Dayan, Alexandre Pouget

Keywords Paper

0

0

0

0

3:24

26/08/2020

Revisiting Stochastic Extragradient

Konstantin Mishchenko, Dmitry Kovalev, Egor Shulgin and
Peter Richtarik, Yura Malitsky

Keywords Paper

0

0

0

0

11:24

06/12/2020

DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction

Aviral Kumar, Abhishek Gupta, Sergey Levine

Keywords Paper

0

0

0

0

3:25

06/12/2021

Reward is enough for convex MDPs

Tom Zahavy, Brendan O'Donoghue, Guillaume Desjardins, Satinder Singh

Keywords Paper

reinforcement learning and planning

0

0

0

0

14:12

18/07/2021

Spectral Normalisation for Deep Reinforcement Learning: An Optimisation Perspective

Florin Gogianu, Tudor Berariu, Mihaela Rosca and
Claudia Clopath, Lucian Busoniu, Razvan Pascanu

Keywords Paper

Reinforcement Learning and Planning, Deep RL

0

0

0

0

5:04

03/05/2021

Learning to Sample with Local and Global Contexts in Experience Replay Buffer

Youngmin Oh, Kimin Lee, Jinwoo Shin and
Eunho Yang, Sung Ju Hwang

Keywords Paper

reinforcement learning, off-policy RL, experience replay buffer

1

0

0

0

5:20

02/02/2021

Expected Eligibility Traces

Hado van Hasselt, Sephora Madjiheurem, Matteo Hessel and
David Silver, André Barreto, Diana Borsa

Keywords Paper

0

0

0

0

18:39

19/04/2021

Keep learning: Self-supervised meta-learning for learning from inference

Akhil Kedia, Sai Chetan Chinthakindi

Keywords Paper

0

0

0

0

11:27

19/08/2021

Average-Reward Reinforcement Learning with Trust Region Methods

Xiaoteng Ma, Xiaohang Tang, Li Xia and
Jun Yang, Qianchuan Zhao

Keywords Paper

Machine Learning, Deep Reinforcement Learning, Reinforcement Learning, Markov Decision Processes

0

0

0

0

14:41

03/08/2020

Learning by Repetition: Stochastic Multi-armed Bandits under Priming Effect

Priyank Agrawal, Theja Tulabandula

Keywords Paper

0

0

0

0

7:29

26/08/2020

Discrete Action On-Policy Learning with Action-Value Critic

Yuguang Yue, Yunhao Tang, Mingzhang Yin, Mingyuan Zhou

Keywords Paper

0

0

0

0

14:23

12/07/2020

Ready Policy One: World Building Through Active Learning

Philip Ball, Jack Parker-Holder, Aldo Pacchiano and
Krzysztof Choromanski, Stephen Roberts

Keywords Paper

Reinforcement Learning - Deep RL

0

0

0

0

15:31

18/07/2021

Offline Reinforcement Learning with Fisher Divergence Critic Regularization

Ilya Kostrikov, Rob Fergus, Jonathan Tompson, Ofir Nachum

Keywords Paper

Reinforcement Learning and Planning, Deep RL

0

0

0

0

4:49

26/04/2020

CAQL: Continuous Action Q-Learning

Moonkyung Ryu, Yinlam Chow, Ross Anderson and
Christian Tjandraatmadja, Craig Boutilier

Keywords Paper

Reinforcement learning (RL), DQN, Continuous control, Mixed-Integer Programming (MIP)

0

0

0

0

5:36

06/12/2020

An Equivalence between Loss Functions and Non-Uniform Sampling in Experience Replay

Scott Fujimoto, David Meger, Doina Precup

Keywords Paper

0

0

0

0

2:53

06/12/2021

Is Bang-Bang Control All You Need? Solving Continuous Control with Bernoulli Policies

Tim Seyde, Igor Gilitschenski, Wilko Schwarting and
Bartolomeo Stellato, Martin Riedmiller, Markus Wulfmeier, Daniela Rus

Keywords Paper

reinforcement learning and planning

0

0

0

0

6:48

06/12/2021

A Minimalist Approach to Offline Reinforcement Learning

Scott Fujimoto, Shixiang (Shane) Gu

Keywords Paper

reinforcement learning and planning, generative model

1

0

0

0

8:31

06/12/2020

Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces

Guy Lorberbom, Chris J. Maddison, Nicolas Heess and
Tamir Hazan, Daniel Tarlow

Keywords Paper

0

0

0

0

3:16

06/12/2021

Co-Adaptation of Algorithmic and Implementational Innovations in Inference-based Deep Reinforcement Learning

Hiroki Furuta, Tadashi Kozuno, Tatsuya Matsushima and
Yutaka Matsuo, Shixiang (Shane) Gu

Keywords Paper

reinforcement learning and planning

0

0

0

0

10:00

26/08/2020

Finite-Time Error Bounds for Biased Stochastic Approximation with Applications to Q-Learning

Gang Wang, Georgios B. Giannakis

Keywords Paper

0

0

0

0

14:03

03/05/2021

Parameter-Based Value Functions

Francesco Faccio, Louis Kirsch, Jürgen Schmidhuber

Keywords Paper

Off-Policy Reinforcement Learning, Reinforcement Learning

0

0

0

0

2:45

18/07/2021

Recomposing the Reinforcement Learning Building Blocks with Hypernetworks

Elad Sarafian, Shai Keynan, Sarit Kraus

Keywords Paper

Reinforcement Learning and Planning, Deep RL

0

0

0

0

5:16

06/12/2021

Time-series Generation by Contrastive Imitation

Daniel Jarrett, Ioana Bica, Mihaela van der Schaar

Keywords Paper

generative model

0

0

0

0

8:47

14/06/2020

Learning Selective Self-Mutual Attention for RGB-D Saliency Detection

Nian Liu, Ni Zhang, Junwei Han

Keywords Paper

rgb-d saliency detection, middle fusion, self-attention, mutual-attention, non-local network, two-stream cnn

0

0

0

0

1:01

06/12/2020

Multi-task Batch Reinforcement Learning with Metric Learning

Jiachen Li, Quan Vuong, Shuang Liu and
Minghua Liu, Kamil Ciosek, Henrik Christensen, Hao Su

Keywords Paper

Algorithms -> Multitask and Transfer Learning; Algorithms -> Representation Learning; Data, Challenges, Implementations, and So, Applications -> Natural Language Processing

0

0

0

0

3:15