Sequential Generative Exploration Model for Partially Observable Reinforcement Learning

Abstract: Many challenging partially observable reinforcement learning problems have sparse rewards and most existing model-free algorithms struggle with such reward sparsity. In this paper, we propose a novel reward shaping approach to infer the intrinsic rewards for the agent from a sequential generative model. Specifically, the sequential generative model processes a sequence of partial observations and actions from the agent's historical transitions to compile a belief state for performing forward dynamics prediction. Then we utilize the error of the dynamics prediction task to infer the intrinsic rewards for the agent. Our proposed method is able to derive intrinsic rewards that could better reflect the agent's surprise or curiosity over its ground-truth state by taking a sequential inference procedure. Furthermore, we formulate the inference procedure for dynamics prediction as a multi-step forward prediction task, where the time abstraction that has been incorporated could effectively help to increase the expressiveness of the intrinsic reward signals. To evaluate our method, we conduct extensive experiments on challenging 3D navigation tasks in ViZDoom and DeepMind Lab. Empirical evaluation results show that our proposed exploration method could lead to significantly faster convergence than various state-of-the-art exploration approaches in the testified navigation domains.

06/12/2020

Sequential Generative Exploration Model for Partially Observable Reinforcement Learning

Haiyan Yin, Jianda Chen, Sinno Jialin Pan, Sebastian Tschiatschek

Comments

Similar Papers

Efficient Exploration of Reward Functions in Inverse Reinforcement Learning via Bayesian Optimization

Sreejith Balakrishnan, Quoc Phong Nguyen, Bryan Kian Hsiang Low, Harold Soh

Keywords Abstract Paper

Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping

Yujing Hu, Weixun Wang, Hangtian Jia and Yixiang Wang, Yingfeng Chen, Jianye Hao, Feng Wu, Changjie Fan

Keywords Abstract Paper

Nested-Wasserstein Self-Imitation Learning for Sequence Generation

Ruiyi Zhang, Changyou Chen, Zhe Gan and Zheng Wen, Wenlin Wang, Lawrence Carin

Keywords Abstract Paper

Explicable Reward Design for Reinforcement Learning Agents

Rati Devidze, Goran Radanovic, Parameswaran Kamalaruban, Adish Singla

Keywords Abstract Paper

optimization, reinforcement learning and planning, interpretability

GaussianPath:A Bayesian Multi-Hop Reasoning Framework for Knowledge Graph Reasoning

Guojia Wan, Bo Du

Keywords Abstract Paper

Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning

Sebastian Curi, Felix Berkenkamp, Andreas Krause

Keywords Abstract Paper

Provably Efficient Algorithms for Multi-Objective Competitive RL

Tiancheng Yu, Yi Tian, Jingzhao Zhang, Suvrit Sra

Keywords Abstract Paper

Theory, RL, Decisions and Control Theory

Robust Deep Reinforcement Learning through Adversarial Loss

Tuomas Oikarinen, Wang Zhang, Alexandre Megretski and Luca Daniel, Tsui-Wei Weng

Keywords Abstract Paper

reinforcement learning and planning, robustness, adversarial robustness and security

PlanGAN: Model-based Planning With Sparse Rewards and Multiple Goals

Henry Charlesworth, Giovanni Montana

Keywords Abstract Paper

Episodic Multi-agent Reinforcement Learning with Curiosity-driven Exploration

Lulu Zheng, Jiarui Chen, Jianhao Wang and Jiamin He, Yujing Hu, Yingfeng Chen, Changjie Fan, Yang Gao, Chongjie Zhang

Keywords Abstract Paper

reinforcement learning and planning

Information Directed Reward Learning for Reinforcement Learning

David Lindner, Matteo Turchetta, Sebastian Tschiatschek and Kamil Ciosek, Andreas Krause

Keywords Abstract Paper

reinforcement learning and planning, active learning

Continuous Mean-Covariance Bandits

Yihan Du, Siwei Wang, Zhixuan Fang, Longbo Huang

Keywords Abstract Paper

bandits

Hindsight Trust Region Policy Optimization

Hanbo Zhang, Site Bai, Xuguang Lan and David Hsu, Nanning Zheng

Keywords Abstract Paper

Machine Learning, Deep Reinforcement Learning, Reinforcement Learning

Observational Overfitting in Reinforcement Learning

Xingyou Song, Yiding Jiang, Stephen Tu and Yilun Du, Behnam Neyshabur

Keywords Abstract Paper

observational, overfitting, reinforcement, learning, generalization, implicit, regularization, overparametrization

Reward-Free Exploration for Reinforcement Learning

Chi Jin, Akshay Krishnamurthy, Max Simchowitz, Tiancheng Yu

Keywords Abstract Paper

Reinforcement Learning - Theory

Policy Gradient Bayesian Robust Optimization for Imitation Learning

Zaynah Javed, Daniel Brown, Satvik Sharma and Jerry Zhu, Ashwin Balakrishna, Marek Petrik, Anca Dragan, Ken Goldberg

Keywords Abstract Paper

Social Aspects of Machine Learning, AI Safety

DORB: Dynamically Optimizing Multiple Rewards with Bandits

Ramakanth Pasunuru, Han Guo, Mohit Bansal

Keywords Abstract Paper

language tasks, optimization rewards, nlg tasks, question generation

Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments

Daochen Zha, Wenye Ma, Lei Yuan and Xia Hu, Ji Liu

Keywords Abstract Paper

Exploration, Reinforcement Learning, Self-Imitation, Generalization of Reinforcement Learning

On Efficiency in Hierarchical Reinforcement Learning

Zheng Wen, Doina Precup, Morteza Ibrahimi and Andre Barreto, Benjamin Van Roy, Satinder Singh

Keywords Abstract Paper

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Yaodong Yang, Jianye Hao, Guangyong Chen and Hongyao Tang, Yingfeng Chen, Yujing Hu, Changjie Fan, Zhongyu Wei

Keywords Abstract Paper

Planning, Control, and Multiagent Learning

Inverse Reinforcement Learning in a Continuous State Space with Formal Guarantees

Gregory Dexter, Kevin Bello, Jean Honorio

Keywords Abstract Paper

Keywords Paper

Yujing Hu, Weixun Wang, Hangtian Jia and
Yixiang Wang, Yingfeng Chen, Jianye Hao, Feng Wu, Changjie Fan

Keywords Paper

Ruiyi Zhang, Changyou Chen, Zhe Gan and
Zheng Wen, Wenlin Wang, Lawrence Carin

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Tuomas Oikarinen, Wang Zhang, Alexandre Megretski and
Luca Daniel, Tsui-Wei Weng

Keywords Paper

Keywords Paper

Lulu Zheng, Jiarui Chen, Jianhao Wang and
Jiamin He, Yujing Hu, Yingfeng Chen, Changjie Fan, Yang Gao, Chongjie Zhang

Keywords Paper

David Lindner, Matteo Turchetta, Sebastian Tschiatschek and
Kamil Ciosek, Andreas Krause

Keywords Paper

Keywords Paper

Hanbo Zhang, Site Bai, Xuguang Lan and
David Hsu, Nanning Zheng

Keywords Paper

Xingyou Song, Yiding Jiang, Stephen Tu and
Yilun Du, Behnam Neyshabur

Keywords Paper

Keywords Paper

Zaynah Javed, Daniel Brown, Satvik Sharma and
Jerry Zhu, Ashwin Balakrishna, Marek Petrik, Anca Dragan, Ken Goldberg

Keywords Paper

Keywords Paper

Daochen Zha, Wenye Ma, Lei Yuan and
Xia Hu, Ji Liu

Keywords Paper

Zheng Wen, Doina Precup, Morteza Ibrahimi and
Andre Barreto, Benjamin Van Roy, Satinder Singh

Keywords Paper

Yaodong Yang, Jianye Hao, Guangyong Chen and
Hongyao Tang, Yingfeng Chen, Yujing Hu, Changjie Fan, Zhongyu Wei

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Joey Hong, Branislav Kveton, Manzil Zaheer and
Yinlam Chow, Amr Ahmed, Craig Boutilier

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Philip Ball, Jack Parker-Holder, Aldo Pacchiano and
Krzysztof Choromanski, Stephen Roberts

Keywords Paper

Pushi Zhang, Xiaoyu Chen, Li Zhao and
Wei Xiong, Tao Qin, Tie-Yan Liu

Keywords Paper

Keywords Paper

Alvaro Velasquez, Brett Bissey, Lior Barak and
Andre Beckus, Ismail Alkhouri, Daniel Melcer, George Atia

Keywords Paper

Keywords Paper

Tianjun Zhang, Paria Rashidinejad, Jiantao Jiao and
Yuandong Tian, Joseph Gonzalez, Stuart Russell

Keywords Paper