Principled Exploration via Optimistic Bootstrapping and Backward Induction

Abstract: One principled approach for provably efficient exploration is incorporating the upper confidence bound (UCB) into the value function as a bonus. However, UCB is specified to deal with linear and tabular settings and is incompatible with Deep Reinforcement Learning (DRL). In this paper, we propose a principled exploration method for DRL through Optimistic Bootstrapping and Backward Induction (OB2I). OB2I constructs a general-purpose UCB-bonus through non-parametric bootstrap in DRL. The UCB-bonus estimates the epistemic uncertainty of state-action pairs for optimistic exploration. We build theoretical connections between the proposed UCB-bonus and the LSVI-UCB in linear setting. We propagate future uncertainty in a time-consistent manner through episodic backward update, which exploits the theoretical advantage and empirically improves the sample-efficiency. Our experiments in MNIST maze and Atari suit suggest that OB2I outperforms several state-of-the-art exploration approaches.

18/07/2021

Principled Exploration via Optimistic Bootstrapping and Backward Induction

Chenjia Bai, Lingxiao Wang, Lei Han, Jianye Hao, Animesh Garg, Peng Liu, Zhaoran Wang

Comments

Similar Papers

APS: Active Pretraining with Successor Features

Hao Liu, Pieter Abbeel

Keywords Abstract Paper

Reinforcement Learning and Planning, Deep RL

A Max-Min Entropy Framework for Reinforcement Learning

Seungyul Han, Youngchul Sung

Keywords Abstract Paper

optimization, reinforcement learning and planning

MADE: Exploration via Maximizing Deviation from Explored Regions

Tianjun Zhang, Paria Rashidinejad, Jiantao Jiao and Yuandong Tian, Joseph Gonzalez, Stuart Russell

Keywords Abstract Paper

Minimax Weight and Q-Function Learning for Off-Policy Evaluation

Masatoshi Uehara, Jiawei Huang, Nan Jiang

Keywords Abstract Paper

Regularized policies are reward robust

Hisham Husain, Kamil Ciosek, Ryota Tomioka

Keywords Abstract Paper

On Reward-Free RL with Kernel and Neural Function Approximations: Single-Agent MDP and Markov Game

Shuang Qiu, Jieping Ye, Zhaoran Wang, Zhuoran Yang

Keywords Abstract Paper

Learning infinite-horizon average-reward MDPs with linear function approximation

Chen-Yu Wei, Mehdi Jafarnia Jahromi, Haipeng Luo, Rahul Jain

Keywords Abstract Paper

Learning Bounds for Risk-sensitive Learning

Jaeho Lee, Sejun Park, Jinwoo Shin

Keywords Abstract Paper

CRPO: A New Approach for Safe Reinforcement Learning with Convergence Guarantee

Tengyu Xu, Yingbin LIANG, Guanghui Lan

Keywords Abstract Paper

Theory, RL, Decisions and Control Theory

Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization

Zeke Xie, Li Yuan, Zhanxing Zhu, Masashi Sugiyama

Keywords Abstract Paper

Optimization, Stochastic Optimization

Provably efficient safe exploration via primal-dual policy optimization

Dongsheng Ding, Xiaohan Wei, Zhuoran Yang and Zhaoran Wang, Mihailo Jovanovic

Keywords Abstract Paper

Tightening Exploration in Upper Confidence Reinforcement Learning

Hippolyte Bourel, Odalric-Ambrym Maillard, Mohammad Sadegh Talebi

Keywords Abstract Paper

Variational Model-based Policy Optimization

Yinlam Chow, Brandon Cui, Moonkyung Ryu, Mohammad Ghavamzadeh

Keywords Abstract Paper

Machine Learning, Reinforcement Learning

Conservative Offline Distributional Reinforcement Learning

Yecheng Ma, Dinesh Jayaraman, Osbert Bastani

Keywords Abstract Paper

Hindsight Trust Region Policy Optimization

Hanbo Zhang, Site Bai, Xuguang Lan and David Hsu, Nanning Zheng

Keywords Abstract Paper

Machine Learning, Deep Reinforcement Learning, Reinforcement Learning

Deep Bandits Show-Off: Simple and Efficient Exploration with Deep Networks

Rong Zhu, Mattia Rigotti

Keywords Abstract Paper

theory, deep learning, reinforcement learning and planning, bandits

Hindsight Value Function for Variance Reduction in Stochastic Dynamic Environment

Jiaming Guo, Rui Zhang, Xishan Zhang and Shaohui Peng, Qi Yi, Zidong Du, Xing Hu, Qi Guo, Yunji Chen

Keywords Abstract Paper

Machine Learning, Deep Learning, Deep Reinforcement Learning, Sequential Decision Making

Provably Efficient Reinforcement Learning with Kernel and Neural Function Approximations

Zhuoran Yang, Chi Jin, Zhaoran Wang and Mengdi Wang, Michael Jordan

Keywords Abstract Paper

Density Constrained Reinforcement Learning

Zengyi Qin, Yuxiao Chen, Chuchu Fan

Keywords Abstract Paper

Improved Confidence Bounds for the Linear Logistic Model and Applications to Bandits

Kwang-Sung Jun, Lalit Jain, Houssam Nassif, Blake Mason

Keywords Abstract Paper

Reinforcement Learning and Planning, Bandits

Stochastically Dominant Distributional Reinforcement Learning

John Martin, Michal Lyskawinski, Xiaohu Li, Brendan Englot

Keywords Abstract Paper

A Contour Stochastic Gradient Langevin Dynamics Algorithm for Simulations of Multi-modal Distributions

Wei Deng, Guang Lin, Faming Liang

Keywords Abstract Paper

Efficient Exploration of Reward Functions in Inverse Reinforcement Learning via Bayesian Optimization

Keywords Paper

Keywords Paper

Tianjun Zhang, Paria Rashidinejad, Jiantao Jiao and
Yuandong Tian, Joseph Gonzalez, Stuart Russell

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Dongsheng Ding, Xiaohan Wei, Zhuoran Yang and
Zhaoran Wang, Mihailo Jovanovic

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Hanbo Zhang, Site Bai, Xuguang Lan and
David Hsu, Nanning Zheng

Keywords Paper

Keywords Paper

Jiaming Guo, Rui Zhang, Xishan Zhang and
Shaohui Peng, Qi Yi, Zidong Du, Xing Hu, Qi Guo, Yunji Chen

Keywords Paper

Zhuoran Yang, Chi Jin, Zhaoran Wang and
Mengdi Wang, Michael Jordan

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Futoshi Futami, Tomoharu Iwata, naonori ueda and
Issei Sato, Masashi Sugiyama

Keywords Paper

Haque Ishfaq, Qiwen Cui, Viet Nguyen and
Alex Ayoub, Zhuoran Yang, Zhaoran Wang, Doina Precup, Lin Yang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Sungryull Sohn, Sungtae Lee, Jongwook Choi and
Harm van Seijen, Mehdi Fatemi, Honglak Lee

Keywords Paper

Keywords Paper

Son Nguyen, Duong Nguyen, Khai Nguyen and
Khoat Than, Hung Bui, Nhat Ho

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper