Offline Reinforcement Learning with Fisher Divergence Critic Regularization

Abstract: Many modern approaches to offline Reinforcement Learning (RL) utilize behavior regularization, typically augmenting a model-free actor critic algorithm with a penalty measuring divergence of the policy from the offline data. In this work, we propose an alternative approach to encouraging the learned policy to stay close to the data, namely parameterizing the critic as the log-behavior-policy, which generated the offline data, plus a state-action value offset term, which can be learned using a neural network. Behavior regularization then corresponds to an appropriate regularizer on the offset term. We propose using a gradient penalty regularizer for the offset term and demonstrate its equivalence to Fisher divergence regularization, suggesting connections to the score matching and generative energy-based model literature. We thus term our resulting algorithm Fisher-BRC (Behavior Regularized Critic). On standard offline RL benchmarks, Fisher-BRC achieves both improved performance and faster convergence over existing state-of-the-art methods.

19/08/2021

Offline Reinforcement Learning with Fisher Divergence Critic Regularization

Ilya Kostrikov, Rob Fergus, Jonathan Tompson, Ofir Nachum

Comments

Similar Papers

Variational Model-based Policy Optimization

Yinlam Chow, Brandon Cui, Moonkyung Ryu, Mohammad Ghavamzadeh

Keywords Abstract Paper

Machine Learning, Reinforcement Learning

Logistic q-learning

Joan Bas-Serrano, Sebastian Curi, Andreas Krause, Gergely Neu

Keywords Abstract Paper

Local policy search with Bayesian optimization

Sarah Müller, Alexander von Rohr, Sebastian Trimpe

Keywords Abstract Paper

theory, optimization, reinforcement learning and planning, active learning

Representation Matters: Offline Pretraining for Sequential Decision Making

Mengjiao Yang, Ofir Nachum

Keywords Abstract Paper

Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces

Guy Lorberbom, Chris J. Maddison, Nicolas Heess and Tamir Hazan, Daniel Tarlow

Keywords Abstract Paper

Minimax Weight and Q-Function Learning for Off-Policy Evaluation

Masatoshi Uehara, Jiawei Huang, Nan Jiang

Keywords Abstract Paper

Variance-Aware Off-Policy Evaluation with Linear Function Approximation

Yifei Min, Tianhao Wang, Dongruo Zhou, Quanquan Gu

Keywords Abstract Paper

theory, reinforcement learning and planning

Towards Robust Bisimulation Metric Learning

Mete Kemertas, Tristan Aumentado-Armstrong

Keywords Abstract Paper

reinforcement learning and planning, robustness, representation learning

Recomposing the Reinforcement Learning Building Blocks with Hypernetworks

Elad Sarafian, Shai Keynan, Sarit Kraus

Keywords Abstract Paper

Reinforcement Learning and Planning, Deep RL

Generalized Proximal Policy Optimization with Sample Reuse

James Queeney, Yannis Paschalidis, Christos G Cassandras

Keywords Abstract Paper

optimization, reinforcement learning and planning

The Value Equivalence Principle for Model-Based Reinforcement Learning

Christopher Grimm, Andre Barreto, Satinder Singh, David Silver

Keywords Abstract Paper

COMBO: Conservative Offline Model-Based Policy Optimization

Tianhe Yu, Aviral Kumar, Rafael Rafailov and Aravind Rajeswaran, Sergey Levine, Chelsea Finn

Keywords Abstract Paper

deep learning, optimization, reinforcement learning and planning

Rate-Distortion Analysis of Minimum Excess Risk in Bayesian Learning

Hassan Hafez-Kolahi, Behrad Moniri, Shohreh Kasaei, Mahdieh Soleymani Baghshah

Keywords Abstract Paper

Theory, Statistical Learning Theory

Diversity Actor-Critic: Sample-Aware Entropy Regularization for Sample-Efficient Exploration

Seungyul Han, Youngchul Sung

Keywords Abstract Paper

Reinforcement Learning and Planning, Deep RL

Characterizing the Gap Between Actor-Critic and Policy Gradient

Junfeng Wen, Saurabh Kumar, Ramki Gummadi, Dale Schuurmans

Keywords Abstract Paper

Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality

Tengyu Xu, Zhuoran Yang, Zhaoran Wang, Yingbin LIANG

Keywords Abstract Paper

Theory, RL, Decisions and Control Theory

Model-based Multi-agent Policy Optimization with Adaptive Opponent-wise Rollouts

Weinan Zhang, Xihuai Wang, Jian Shen, Ming Zhou

Keywords Abstract Paper

Machine Learning, Deep Reinforcement Learning, Multi-agent Learning

Improving Deep Learning Interpretability by Saliency Guided Training

Aya Abdelsalam Ismail, Hector Corrada Bravo, Soheil Feizi

Keywords Abstract Paper

deep learning, transformers, vision, language, interpretability

Ranking Policy Gradient

Kaixiang Lin, Jiayu Zhou

Keywords Abstract Paper

Sample-efficient reinforcement learning, off-policy learning.

Bridging Explicit and Implicit Deep Generative Models via Neural Stein Estimators

Qitian Wu, Rui Gao, Hongyuan Zha

Keywords Abstract Paper

Learning to Reweight Imaginary Transitions for Model-Based Reinforcement Learning

Wenzhen Huang, Qiyue Yin, Junge Zhang, Kaiqi Huang

Keywords Abstract Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Guy Lorberbom, Chris J. Maddison, Nicolas Heess and
Tamir Hazan, Daniel Tarlow

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Tianhe Yu, Aviral Kumar, Rafael Rafailov and
Aravind Rajeswaran, Sergey Levine, Chelsea Finn

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Youngsuk Park, Ryan Rossi, Zheng Wen and
Gang Wu, Handong Zhao

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang and
Krzysztof Choromanski, Anna Choromanska, Michael Jordan

Keywords Paper

Keywords Paper

Keywords Paper

Kenta Niwa, Guoqiang Zhang, W. Bastiaan Kleijn and
Noboru Harada, Hiroshi Sawada, Akinori Fujino

Keywords Paper

Keywords Paper