Exploration by Maximizing Renyi Entropy for Reward-Free RL Framework

Abstract: Exploration is essential for reinforcement learning (RL). To face the challenges of exploration, we consider a reward-free RL framework that completely separates exploration from exploitation and brings new challenges for exploration algorithms. In the exploration phase, the agent learns an exploratory policy by interacting with a reward-free environment and collects a dataset of transitions by executing the policy. In the planning phase, the agent computes a good policy for any reward function based on the dataset without further interacting with the environment. This framework is suitable for the meta RL setting where there are many reward functions of interest. In the exploration phase, we propose to maximize the Renyi entropy over the state-action space and justify this objective theoretically. The success of using Renyi entropy as the objective results from its encouragement to explore the hard-to-reach state-actions. We further deduce a policy gradient formulation for this objective and design a practical exploration algorithm that can deal with complex environments. In the planning phase, we solve for good policies given arbitrary reward functions using a batch RL algorithm. Empirically, we show that our exploration algorithm is effective and sample efficient, and results in superior policies for arbitrary reward functions in the planning phase.

18/07/2021

Exploration by Maximizing Renyi Entropy for Reward-Free RL Framework

Chuheng Zhang, Yuanying Cai, Longbo Huang, Jian Li

Comments

Similar Papers

On Reward-Free RL with Kernel and Neural Function Approximations: Single-Agent MDP and Markov Game

Shuang Qiu, Jieping Ye, Zhaoran Wang, Zhuoran Yang

Keywords Abstract Paper

Reward-Free Exploration for Reinforcement Learning

Chi Jin, Akshay Krishnamurthy, Max Simchowitz, Tiancheng Yu

Keywords Abstract Paper

Regularized policies are reward robust

Hisham Husain, Kamil Ciosek, Ryota Tomioka

Keywords Abstract Paper

A Max-Min Entropy Framework for Reinforcement Learning

Seungyul Han, Youngchul Sung

Keywords Abstract Paper

optimization, reinforcement learning and planning

MADE: Exploration via Maximizing Deviation from Explored Regions

Tianjun Zhang, Paria Rashidinejad, Jiantao Jiao and Yuandong Tian, Joseph Gonzalez, Stuart Russell

Keywords Abstract Paper

Adversarial Intrinsic Motivation for Reinforcement Learning

Ishan Durugkar, Mauricio Tec, Scott Niekum, Peter Stone

Keywords Abstract Paper

reinforcement learning and planning, generative model

On Reward-Free Reinforcement Learning with Linear Function Approximation

Ruosong Wang, Simon Du, Lin Yang, Russ Salakhutdinov

Keywords Abstract Paper

DORB: Dynamically Optimizing Multiple Rewards with Bandits

Ramakanth Pasunuru, Han Guo, Mohit Bansal

Keywords Abstract Paper

language tasks, optimization rewards, nlg tasks, question generation

PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning

Alekh Agarwal, Mikael Henaff, Sham Kakade, Wen Sun

Keywords Abstract Paper

Learning Human Objectives by Evaluating Hypothetical Behavior

Siddharth Reddy, Anca Dragan, Sergey Levine and Shane Legg, Jan Leike

Keywords Abstract Paper

Provably efficient safe exploration via primal-dual policy optimization

Dongsheng Ding, Xiaohan Wei, Zhuoran Yang and Zhaoran Wang, Mihailo Jovanovic

Keywords Abstract Paper

Towards Robust Bisimulation Metric Learning

Mete Kemertas, Tristan Aumentado-Armstrong

Keywords Abstract Paper

reinforcement learning and planning, robustness, representation learning

Near Optimal Reward-Free Reinforcement Learning

Zhang Zihan, Simon Du, Xiangyang Ji

Keywords Abstract Paper

Theory, RL, Decisions and Control Theory

Exploring supervised and unsupervised rewards in machine translation

Julia Ive, Zixu Wang, Marina Fomicheva, Lucia Specia

Keywords Abstract Paper

Episodic Multi-agent Reinforcement Learning with Curiosity-driven Exploration

Lulu Zheng, Jiarui Chen, Jianhao Wang and Jiamin He, Yujing Hu, Yingfeng Chen, Changjie Fan, Yang Gao, Chongjie Zhang

Keywords Abstract Paper

Explicable Reward Design for Reinforcement Learning Agents

Rati Devidze, Goran Radanovic, Parameswaran Kamalaruban, Adish Singla

Keywords Abstract Paper

optimization, reinforcement learning and planning, interpretability

Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping

Yujing Hu, Weixun Wang, Hangtian Jia and Yixiang Wang, Yingfeng Chen, Jianye Hao, Feng Wu, Changjie Fan

Keywords Abstract Paper

Positive-Unlabeled Reward Learning

Danfei Xu, Misha Denil

Keywords Abstract Paper

Corruption-robust exploration in episodic reinforcement learning

Thodoris Lykouris, Max Simchowitz, Alex Slivkins, Wen Sun

Keywords Abstract Paper

Learning Fair Policies in Multi-Objective (Deep) Reinforcement Learning with Average and Discounted Rewards

Umer Siddique, Paul Weng, Matthieu Zimmer

Keywords Abstract Paper

Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments

Daochen Zha, Wenye Ma, Lei Yuan and Xia Hu, Ji Liu

Keywords Abstract Paper

Exploration, Reinforcement Learning, Self-Imitation, Generalization of Reinforcement Learning

Influence-Based Multi-Agent Exploration

Tonghan Wang*, Jianhao Wang*, Yi Wu, Chongjie Zhang

Keywords Abstract Paper

Multi-agent reinforcement learning, Exploration

Implicit Generative Modeling for Efficient Exploration

Neale Ratzlaff, Qinxun Bai, Fuxin Li, Wei Xu

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Tianjun Zhang, Paria Rashidinejad, Jiantao Jiao and
Yuandong Tian, Joseph Gonzalez, Stuart Russell

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Siddharth Reddy, Anca Dragan, Sergey Levine and
Shane Legg, Jan Leike

Keywords Paper

Dongsheng Ding, Xiaohan Wei, Zhuoran Yang and
Zhaoran Wang, Mihailo Jovanovic

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Lulu Zheng, Jiarui Chen, Jianhao Wang and
Jiamin He, Yujing Hu, Yingfeng Chen, Changjie Fan, Yang Gao, Chongjie Zhang

Keywords Paper

Keywords Paper

Yujing Hu, Weixun Wang, Hangtian Jia and
Yixiang Wang, Yingfeng Chen, Jianye Hao, Feng Wu, Changjie Fan

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Daochen Zha, Wenye Ma, Lei Yuan and
Xia Hu, Ji Liu

Keywords Paper

Tonghan Wang, Jianhao Wang, Yi Wu, Chongjie Zhang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Anuj Mahajan, Mikayel Samvelyan, Lei Mao and
Viktor Makoviychuk, Animesh Garg, Jean Kossaifi, Shimon Whiteson, Yuke Zhu, Anima Anandkumar

Keywords Paper

Keywords Paper

David Lindner, Matteo Turchetta, Sebastian Tschiatschek and
Kamil Ciosek, Andreas Krause

Keywords Paper

Keywords Paper

Ben Eysenbach, Shreyas Chaudhari, Swapnil Asawa and
Sergey Levine, Ruslan Salakhutdinov

Keywords Paper

Keywords Paper

Andreea-Ioana Deac, Petar Veličković, Ognjen Milinkovic and
Pierre-Luc Bacon, Jian Tang, Mladen Nikolic

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper