Foresee then Evaluate: Decomposing Value Estimation with Latent Future Prediction

Abstract: Value function is the central notion of Reinforcement Learning (RL). Value estimation, especially with function approximation, can be challenging since it involves the stochasticity of environmental dynamics and reward signals that can be sparse and delayed in some cases. A typical model-free RL algorithm usually estimates the values of a policy by Temporal Difference (TD) or Monte Carlo (MC) algorithms directly from rewards, without explicitly taking dynamics into consideration. In this paper, we propose Value Decomposition with Future Prediction (VDFP), providing an explicit two-step understanding of the value estimation process: 1) first foresee the latent future, 2) and then evaluate it. We analytically decompose the value function into a latent future dynamics part and a policy-independent trajectory return part, inducing a way to model latent dynamics and returns separately in value estimation. Further, we derive a practical deep RL algorithm, consisting of a convolutional model to learn compact trajectory representation from past experiences, a conditional variational auto-encoder to predict the latent future dynamics and a convex return model that evaluates trajectory representation. In experiments, we empirically demonstrate the effectiveness of our approach for both off-policy and on-policy RL in several OpenAI Gym continuous control tasks as well as a few challenging variants with delayed reward.

06/12/2020

Foresee then Evaluate: Decomposing Value Estimation with Latent Future Prediction

Hongyao Tang, Zhaopeng Meng, Guangyong Chen, Pengfei Chen, Chen Chen, Yaodong Yang, Luo Zhang, Wulong Liu, Jianye Hao

Comments

Similar Papers

Learning Guidance Rewards with Trajectory-space Smoothing

Tanmay Gangwani, Yuan Zhou, Jian Peng

Keywords Abstract Paper

A Provably Efficient Sample Collection Strategy for Reinforcement Learning

Jean Tarbouriech, Matteo Pirotta, Michal Valko, Alessandro Lazaric

Keywords Abstract Paper

theory, reinforcement learning and planning, generative model

Reinforcement Learning with a Disentangled Universal Value Function for Item Recommendation

Kai Wang, Zhene Zou, Qilin Deng and Jianrong Tao, Runze Wu, Changjie Fan, Liang Chen, Peng Cui

Keywords Abstract Paper

Outcome-Driven Reinforcement Learning via Variational Inference

Tim G. J. Rudner, Vitchyr Pong, Rowan McAllister and Yarin Gal, Sergey Levine

Keywords Abstract Paper

reinforcement learning and planning, generative model

Risk-Aware Transfer in Reinforcement Learning using Successor Features

Michael Gimelfarb, Andre Barreto, Scott Sanner, Chi-Guhn Lee

Keywords Abstract Paper

reinforcement learning and planning, representation learning, transfer learning

Dueling Posterior Sampling for Preference-Based Reinforcement Learning

Ellen Novoseller, Yibing Wei, Yanan Sui and Yisong Yue, Joel Burdick

Keywords Abstract Paper

Local policy search with Bayesian optimization

Sarah Müller, Alexander von Rohr, Sebastian Trimpe

Keywords Abstract Paper

theory, optimization, reinforcement learning and planning, active learning

Deep active inference agents using Monte-Carlo methods

Zafeirios Fountas, Noor Sajid, Pedro Mediano, Karl Friston

Keywords Abstract Paper

The Value Equivalence Principle for Model-Based Reinforcement Learning

Christopher Grimm, Andre Barreto, Satinder Singh, David Silver

Keywords Abstract Paper

There Is No Turning Back: A Self-Supervised Approach for Reversibility-Aware Reinforcement Learning

Nathan Grinsztajn, Johan Ferret, Olivier Pietquin and philippe preux, Matthieu Geist

Keywords Abstract Paper

Active Learning of Conditional Mean Embeddings via Bayesian Optimisation

Sayak Ray Chowdhury, Rafael Oliveira, Fabio Ramos

Keywords Abstract Paper

Explicable Reward Design for Reinforcement Learning Agents

Rati Devidze, Goran Radanovic, Parameswaran Kamalaruban, Adish Singla

Keywords Abstract Paper

optimization, reinforcement learning and planning, interpretability

C-Learning: Learning to Achieve Goals via Recursive Classification

Ben Eysenbach, Ruslan Salakhutdinov, Sergey Levine

Keywords Abstract Paper

reinforcement learning, goal reaching, density estimation, hindsight relabeling, Q-learning

Variational Policy Gradient Method for Reinforcement Learning with General Utilities

Junyu Zhang, Alec Koppel, Amrit Bedi and Csaba Szepesvari, Mengdi Wang

Keywords Abstract Paper

Reward-Free Model-Based Reinforcement Learning with Linear Function Approximation

Weitong ZHANG, Dongruo Zhou, Quanquan Gu

Keywords Abstract Paper

Learning in Non-Cooperative Configurable Markov Decision Processes

Giorgia Ramponi, Alberto Maria Metelli, Alessandro Concetti, Marcello Restelli

Keywords Abstract Paper

reinforcement learning and planning, online learning

Model-based Policy Optimization with Unsupervised Model Adaptation

Jian Shen, Han Zhao, Weinan Zhang, Yong Yu

Keywords Abstract Paper

Efficient Planning under Partial Observability with Unnormalized Q Functions and Spectral Learning

Tianyu Li, Bogdan Mazoure, Doina Precup, Guillaume Rabusseau

Keywords Abstract Paper

Hindsight Value Function for Variance Reduction in Stochastic Dynamic Environment

Jiaming Guo, Rui Zhang, Xishan Zhang and Shaohui Peng, Qi Yi, Zidong Du, Xing Hu, Qi Guo, Yunji Chen

Keywords Abstract Paper

Machine Learning, Deep Learning, Deep Reinforcement Learning, Sequential Decision Making

Conservative Offline Distributional Reinforcement Learning

Yecheng Ma, Dinesh Jayaraman, Osbert Bastani

Keywords Abstract Paper

Reward is enough for convex MDPs

Tom Zahavy, Brendan O'Donoghue, Guillaume Desjardins, Satinder Singh

Keywords Abstract Paper

Variational Model-based Policy Optimization

Yinlam Chow, Brandon Cui, Moonkyung Ryu, Mohammad Ghavamzadeh

Keywords Abstract Paper

Machine Learning, Reinforcement Learning

Learning to Select Exogenous Events for Marked Temporal Point Process

Keywords Paper

Keywords Paper

Kai Wang, Zhene Zou, Qilin Deng and
Jianrong Tao, Runze Wu, Changjie Fan, Liang Chen, Peng Cui

Keywords Paper

Tim G. J. Rudner, Vitchyr Pong, Rowan McAllister and
Yarin Gal, Sergey Levine

Keywords Paper

Keywords Paper

Ellen Novoseller, Yibing Wei, Yanan Sui and
Yisong Yue, Joel Burdick

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Nathan Grinsztajn, Johan Ferret, Olivier Pietquin and
philippe preux, Matthieu Geist

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Junyu Zhang, Alec Koppel, Amrit Bedi and
Csaba Szepesvari, Mengdi Wang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Jiaming Guo, Rui Zhang, Xishan Zhang and
Shaohui Peng, Qi Yi, Zidong Du, Xing Hu, Qi Guo, Yunji Chen

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Ping Zhang, Rishabh Iyer, Ashish Tendulkar and
Gaurav Aggarwal, Abir De

Keywords Paper

Keywords Paper

Susan Amin, Maziar Gomrokchi, Hossein Aboutalebi and
Harsh Satija, Doina Precup

Keywords Paper

Sinong Geng, Houssam Nassif, Carlos Manzanares and
Max Reppen, Ronnie Sircar

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Yijie Guo, Shengyu Feng, Nicolas Le Roux and
Ed H. Chi, Honglak Lee, Minmin Chen

Keywords Paper

Keywords Paper

Keywords Paper

Guy Lorberbom, Chris J. Maddison, Nicolas Heess and
Tamir Hazan, Daniel Tarlow

Keywords Paper

Tengyang Xie, Nan Jiang, Huan Wang and
Caiming Xiong, Yu Bai

Keywords Paper

Yujing Hu, Weixun Wang, Hangtian Jia and
Yixiang Wang, Yingfeng Chen, Jianye Hao, Feng Wu, Changjie Fan

Keywords Paper