Hindsight Trust Region Policy Optimization

Abstract: Reinforcement Learning (RL) with sparse rewards is a major challenge. We pro- pose Hindsight Trust Region Policy Optimization (HTRPO), a new RL algorithm that extends the highly successful TRPO algorithm with hindsight to tackle the challenge of sparse rewards. Hindsight refers to the algorithm’s ability to learn from information across goals, including past goals not intended for the current task. We derive the hindsight form of TRPO, together with QKL, a quadratic approximation to the KL divergence constraint on the trust region. QKL reduces variance in KL divergence estimation and improves stability in policy updates. We show that HTRPO has similar convergence property as TRPO. We also present Hindsight Goal Filtering (HGF), which further improves the learning performance for suitable tasks. HTRPO has been evaluated on various sparse-reward tasks, including Atari games and simulated robot control. Experimental results show that HTRPO consistently outperforms TRPO, as well as HPG, a state-of-the-art policy 14 gradient algorithm for RL with sparse rewards.

02/02/2021

Hindsight Trust Region Policy Optimization

Hanbo Zhang, Site Bai, Xuguang Lan, David Hsu, Nanning Zheng

Comments

Similar Papers

Reinforcement Learning Based Multi-Agent Resilient Control: From Deep Neural Networks to an Adaptive Law

Jian Hou, Fangyuan Wang, Lili Wang, Zhiyong Chen

Keywords Abstract Paper

Provably Efficient Algorithms for Multi-Objective Competitive RL

Tiancheng Yu, Yi Tian, Jingzhao Zhang, Suvrit Sra

Keywords Abstract Paper

Theory, RL, Decisions and Control Theory

Tactical Optimism and Pessimism for Deep Reinforcement Learning

Ted Moskovitz, Jack Parker-Holder, Aldo Pacchiano and Michael Arbel, Michael Jordan

Keywords Abstract Paper

reinforcement learning and planning, bandits

On the Convergence of Smooth Regularized Approximate Value Iteration Schemes

Elena Smirnova, Elvis Dohmatob

Keywords Abstract Paper

Shortest-Path Constrained Reinforcement Learning for Sparse Reward Tasks

Sungryull Sohn, Sungtae Lee, Jongwook Choi and Harm van Seijen, Mehdi Fatemi, Honglak Lee

Keywords Abstract Paper

Reinforcement Learning and Planning, Deep RL

Adaptive Discretization for Model-Based Reinforcement Learning

Sean Sinclair, Tianyu Wang, Gauri Jain and Sid Banerjee, Christina Yu

Keywords Abstract Paper

A Max-Min Entropy Framework for Reinforcement Learning

Seungyul Han, Youngchul Sung

Keywords Abstract Paper

optimization, reinforcement learning and planning

Efficient Exploration of Reward Functions in Inverse Reinforcement Learning via Bayesian Optimization

Sreejith Balakrishnan, Quoc Phong Nguyen, Bryan Kian Hsiang Low, Harold Soh

Keywords Abstract Paper

Improved Corruption Robust Algorithms for Episodic Reinforcement Learning

Yifang Chen, Simon Du, Kevin Jamieson

Keywords Abstract Paper

, Optimization, Non-Convex Optimization, Theory, Online Learning Theory

SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards

Siddharth Reddy, Anca D. Dragan, Sergey Levine

Keywords Abstract Paper

Imitation Learning, Reinforcement Learning

Theoretically Principled Deep RL Acceleration via Nearest Neighbor Function Approximation

Junhong Shen, Lin F. Yang

Keywords Abstract Paper

Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes

Chen-Yu Wei, Mehdi Jafarnia, Haipeng Luo and Hiteshi Sharma, Rahul Jain

Keywords Abstract Paper

Reinforcement Learning - Theory

POMO: Policy Optimization with Multiple Optima for Reinforcement Learning

Yeong-Dae Kwon, Jinho Choo, Byoungjip Kim and Iljoo Yoon, Youngjune Gwon, Seungjai Min

Keywords Abstract Paper

CAQL: Continuous Action Q-Learning

Moonkyung Ryu, Yinlam Chow, Ross Anderson and Christian Tjandraatmadja, Craig Boutilier

Keywords Abstract Paper

Reinforcement learning (RL), DQN, Continuous control, Mixed-Integer Programming (MIP)

Convex Regularization in Monte-Carlo Tree Search

Tuan Q Dam, Carlo D'Eramo, Jan Peters, Joni Pajarinen

Keywords Abstract Paper

Reinforcement Learning and Planning

Sequential Generative Exploration Model for Partially Observable Reinforcement Learning

Haiyan Yin, Jianda Chen, Sinno Jialin Pan, Sebastian Tschiatschek

Keywords Abstract Paper

Is Reinforcement Learning More Difficult Than Bandits? A Near-optimal Algorithm Escaping the Curse of Horizon

Zihan Zhang, Xiangyang Ji, Simon Du

Keywords Abstract Paper

Memory Based Trajectory-conditioned Policies for Learning from Sparse Rewards

Yijie Guo, Jongwook Choi, Marcin Moczulski and Shengyu Feng, Samy Bengio, Mohammad Norouzi, Honglak Lee

Keywords Abstract Paper

Average-Reward Reinforcement Learning with Trust Region Methods

Xiaoteng Ma, Xiaohang Tang, Li Xia and Jun Yang, Qianchuan Zhao

Keywords Abstract Paper

Machine Learning, Deep Reinforcement Learning, Reinforcement Learning, Markov Decision Processes

Self-correcting Q-learning

Rong Zhu, Mattia Rigotti

Keywords Abstract Paper

MADE: Exploration via Maximizing Deviation from Explored Regions

Tianjun Zhang, Paria Rashidinejad, Jiantao Jiao and Yuandong Tian, Joseph Gonzalez, Stuart Russell

Keywords Abstract Paper

reinforcement learning and planning

Responsive Safety in Reinforcement Learning

Adam Stooke, Joshua Achiam, Pieter Abbeel

Keywords Paper

Keywords Paper

Ted Moskovitz, Jack Parker-Holder, Aldo Pacchiano and
Michael Arbel, Michael Jordan

Keywords Paper

Keywords Paper

Sungryull Sohn, Sungtae Lee, Jongwook Choi and
Harm van Seijen, Mehdi Fatemi, Honglak Lee

Keywords Paper

Sean Sinclair, Tianyu Wang, Gauri Jain and
Sid Banerjee, Christina Yu

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Chen-Yu Wei, Mehdi Jafarnia, Haipeng Luo and
Hiteshi Sharma, Rahul Jain

Keywords Paper

Yeong-Dae Kwon, Jinho Choo, Byoungjip Kim and
Iljoo Yoon, Youngjune Gwon, Seungjai Min

Keywords Paper

Moonkyung Ryu, Yinlam Chow, Ross Anderson and
Christian Tjandraatmadja, Craig Boutilier

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Yijie Guo, Jongwook Choi, Marcin Moczulski and
Shengyu Feng, Samy Bengio, Mohammad Norouzi, Honglak Lee

Keywords Paper

Xiaoteng Ma, Xiaohang Tang, Li Xia and
Jun Yang, Qianchuan Zhao

Keywords Paper

Keywords Paper

Tianjun Zhang, Paria Rashidinejad, Jiantao Jiao and
Yuandong Tian, Joseph Gonzalez, Stuart Russell

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Divyansh Garg, Shuvam Chakraborty, Chris Cundy and
Jiaming Song, Stefano Ermon

Keywords Paper

Zheng Wen, Doina Precup, Morteza Ibrahimi and
Andre Barreto, Benjamin Van Roy, Satinder Singh

Keywords Paper

Keywords Paper

Keywords Paper

Max Schwarzer, Ankesh Anand, Rishab Goel and
R Devon Hjelm, Aaron Courville, Philip Bachman

Keywords Paper

Ling Pan, Tabish Rashid, Bei Peng and
Longbo Huang, Shimon Whiteson

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper