MADE: Exploration via Maximizing Deviation from Explored Regions

Abstract: In online reinforcement learning (RL), efficient exploration remains particularly challenging in high-dimensional environments with sparse rewards. In low-dimensional environments, where tabular parameterization is possible, count-based upper confidence bound (UCB) exploration methods achieve minimax near-optimal rates. However, it remains unclear how to efficiently implement UCB in realistic RL tasks that involve non-linear function approximation. To address this, we propose a new exploration approach via *maximizing* the deviation of the occupancy of the next policy from the explored regions. We add this term as an adaptive regularizer to the standard RL objective to balance exploration vs. exploitation. We pair the new objective with a provably convergent algorithm, giving rise to a new intrinsic reward that adjusts existing bonuses. The proposed intrinsic reward is easy to implement and combine with other existing RL algorithms to conduct exploration. As a proof of concept, we evaluate the new intrinsic reward on tabular examples across a variety of model-based and model-free algorithms, showing improvements over count-only exploration strategies. When tested on navigation and locomotion tasks from MiniGrid and DeepMind Control Suite benchmarks, our approach significantly improves sample efficiency over state-of-the-art methods.

06/12/2021

MADE: Exploration via Maximizing Deviation from Explored Regions

Tianjun Zhang, Paria Rashidinejad, Jiantao Jiao, Yuandong Tian, Joseph Gonzalez, Stuart Russell

Comments

Similar Papers

A Max-Min Entropy Framework for Reinforcement Learning

Seungyul Han, Youngchul Sung

Keywords Abstract Paper

optimization, reinforcement learning and planning

Bayesian Reinforcement Learning via Deep, Sparse Sampling

Divya Grover, Debabrota Basu, Christos Dimitrakakis

Keywords Abstract Paper

Implicit Generative Modeling for Efficient Exploration

Neale Ratzlaff, Qinxun Bai, Fuxin Li, Wei Xu

Keywords Abstract Paper

Reinforcement Learning - Deep RL

Exploration by Maximizing Renyi Entropy for Reward-Free RL Framework

Chuheng Zhang, Yuanying Cai, Longbo Huang, Jian Li

Keywords Abstract Paper

Reward-Free Exploration for Reinforcement Learning

Chi Jin, Akshay Krishnamurthy, Max Simchowitz, Tiancheng Yu

Keywords Abstract Paper

Reinforcement Learning - Theory

Kinematic State Abstraction and Provably Efficient Rich-Observation Reinforcement Learning

Dipendra Misra, Mikael Henaff, Akshay Krishnamurthy, John Langford

Keywords Abstract Paper

Reinforcement Learning - Theory

Episodic Multi-agent Reinforcement Learning with Curiosity-driven Exploration

Lulu Zheng, Jiarui Chen, Jianhao Wang and Jiamin He, Yujing Hu, Yingfeng Chen, Changjie Fan, Yang Gao, Chongjie Zhang

Keywords Abstract Paper

reinforcement learning and planning

Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning

Gen Li, Laixi Shi, Yuxin Chen and Yuantao Gu, Yuejie Chi

Keywords Abstract Paper

theory, reinforcement learning and planning

Provably Efficient Algorithms for Multi-Objective Competitive RL

Tiancheng Yu, Yi Tian, Jingzhao Zhang, Suvrit Sra

Keywords Abstract Paper

Theory, RL, Decisions and Control Theory

On Reward-Free RL with Kernel and Neural Function Approximations: Single-Agent MDP and Markov Game

Shuang Qiu, Jieping Ye, Zhaoran Wang, Zhuoran Yang

Keywords Abstract Paper

Reinforcement Learning and Planning

PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning

Alekh Agarwal, Mikael Henaff, Sham Kakade, Wen Sun

Keywords Abstract Paper

Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments

Daochen Zha, Wenye Ma, Lei Yuan and Xia Hu, Ji Liu

Keywords Abstract Paper

Exploration, Reinforcement Learning, Self-Imitation, Generalization of Reinforcement Learning

Hindsight Trust Region Policy Optimization

Hanbo Zhang, Site Bai, Xuguang Lan and David Hsu, Nanning Zheng

Keywords Abstract Paper

Machine Learning, Deep Reinforcement Learning, Reinforcement Learning

Regularized policies are reward robust

Hisham Husain, Kamil Ciosek, Ryota Tomioka

Keywords Abstract Paper

Learning Space Partitions for Path Planning

Kevin Yang, Tianjun Zhang, Chris Cummins and Brandon Cui, Benoit Steiner, Linnan Wang, Joseph Gonzalez, Dan Klein, Yuandong Tian

Keywords Abstract Paper

optimization, reinforcement learning and planning

Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration

Andrea Zanette, Alessandro Lazaric, Mykel J Kochenderfer, Emma Brunskill

Keywords Abstract Paper

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Jin Zhang, Jianhao Wang, Hao Hu and Tong Chen, Yingfeng Chen, Changjie Fan, Chongjie Zhang

Keywords Abstract Paper

Algorithms, Multitask, Transfer, and Meta Learning

Exploration in Reinforcement Learning with Deep Covering Options

Yuu Jinnai, Jee Won Park, Marlos C. Machado, George Konidaris

Keywords Abstract Paper

Reinforcement learning, temporal abstraction, exploration

Deep Bandits Show-Off: Simple and Efficient Exploration with Deep Networks

Rong Zhu, Mattia Rigotti

Keywords Abstract Paper

theory, deep learning, reinforcement learning and planning, bandits

Fast active learning for pure exploration in reinforcement learning

Pierre MENARD, Omar Darwiche Domingues, Anders Jonsson and Emilie Kaufmann, Edouard Leurent, Michal Valko

Keywords Abstract Paper

Theory, RL, Decisions and Control Theory

Principled Exploration via Optimistic Bootstrapping and Backward Induction

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Lulu Zheng, Jiarui Chen, Jianhao Wang and
Jiamin He, Yujing Hu, Yingfeng Chen, Changjie Fan, Yang Gao, Chongjie Zhang

Keywords Paper

Gen Li, Laixi Shi, Yuxin Chen and
Yuantao Gu, Yuejie Chi

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Daochen Zha, Wenye Ma, Lei Yuan and
Xia Hu, Ji Liu

Keywords Paper

Hanbo Zhang, Site Bai, Xuguang Lan and
David Hsu, Nanning Zheng

Keywords Paper

Keywords Paper

Kevin Yang, Tianjun Zhang, Chris Cummins and
Brandon Cui, Benoit Steiner, Linnan Wang, Joseph Gonzalez, Dan Klein, Yuandong Tian

Keywords Paper

Keywords Paper

Jin Zhang, Jianhao Wang, Hao Hu and
Tong Chen, Yingfeng Chen, Changjie Fan, Chongjie Zhang

Keywords Paper

Keywords Paper

Keywords Paper

Pierre MENARD, Omar Darwiche Domingues, Anders Jonsson and
Emilie Kaufmann, Edouard Leurent, Michal Valko

Keywords Paper

Chenjia Bai, Lingxiao Wang, Lei Han and
Jianye Hao, Animesh Garg, Peng Liu, Zhaoran Wang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Younggyo Seo, Lili Chen, Jinwoo Shin and
Honglak Lee, Pieter Abbeel, Kimin Lee

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Gen Li, Yuxin Chen, Yuejie Chi and
Yuantao Gu, Yuting Wei

Keywords Paper