Tightening Exploration in Upper Confidence Reinforcement Learning

Abstract: The upper confidence reinforcement learning (UCRL2) strategy introduced in \citep{jaksch2010near} is a popular method to perform regret minimization in unknown discrete Markov Decision Processes under the average-reward criterion. Despite its nice and generic theoretical regret guarantees, this strategy and its variants have remained until now mostly theoretical as numerical experiments on simple environments exhibit long burn-in phases before the learning takes place. Motivated by practical efficiency, we present UCRL3, following the lines of UCRL2, but with two key modifications: First, it uses state-of-the-art time-uniform concentration inequalities to compute confidence sets on the reward and transition distributions for each state-action pair. To further tighten exploration, we introduce an adaptive computation of the support of each transition distributions. This enables to revisit the extended value iteration procedure to optimize over distributions with reduced support by disregarding low probability transitions, while still ensuring near-optimism. We demonstrate, through numerical experiments on standard environments, that reducing exploration this way yields a substantial numerical improvement compared to UCRL2 and its variants. On the theoretical side, these key modifications enable to derive a regret bound for UCRL3 improving on UCRL2, that for the first time makes appear a notion of local diameter and effective support, thanks to variance-aware concentration bounds.

26/04/2020

Tightening Exploration in Upper Confidence Reinforcement Learning

Hippolyte Bourel, Odalric-Ambrym Maillard, Mohammad Sadegh Talebi

Comments

Similar Papers

Ranking Policy Gradient

Kaixiang Lin, Jiayu Zhou

Keywords Abstract Paper

Sample-efficient reinforcement learning, off-policy learning.

Deep Inverse Q-learning with Constraints

Gabriel Kalweit, Maria Huegle, Moritz Werling, Joschka Boedecker

Keywords Abstract Paper

Stage-wise Conservative Linear Bandits

Ahmadreza Moradipari, Christos Thrampoulidis, Mahnoosh Alizadeh

Keywords Abstract Paper

Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism

Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu

Keywords Abstract Paper

Online Learning, Active Learning, and Bandits

Responsive Safety in Reinforcement Learning

Adam Stooke, Joshua Achiam, Pieter Abbeel

Keywords Abstract Paper

Reinforcement Learning - Deep RL

Average-Reward Reinforcement Learning with Trust Region Methods

Xiaoteng Ma, Xiaohang Tang, Li Xia and Jun Yang, Qianchuan Zhao

Keywords Abstract Paper

Machine Learning, Deep Reinforcement Learning, Reinforcement Learning, Markov Decision Processes

Online model selection for reinforcement learning with function approximation

Jonathan Lee, Aldo Pacchiano, Vidya Muthukumar and Weihao Kong, Emma Brunskill

Keywords Abstract Paper

Efficient Bandit Convex Optimization: Beyond Linear Losses

Arun Sai Suggala, Pradeep Ravikumar, Praneeth Netrapalli

Keywords Abstract Paper

Near-Optimal Model-Free Reinforcement Learning in Non-Stationary Episodic MDPs

Weichao Mao, Kaiqing Zhang, Ruihao Zhu and David Simchi-Levi, Tamer Basar

Keywords Abstract Paper

Theory, RL, Decisions and Control Theory

Bayesian decision-making under misspecified priors with applications to meta-learning

Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy and Daniel Hsu, Thodoris Lykouris, Miro Dudik, Robert E Schapire

Keywords Abstract Paper

meta learning, bandits

Dynamic Regret of Policy Optimization in Non-Stationary Environments

Yingjie Fei, Zhuoran Yang, Zhaoran Wang, Qiaomin Xie

Keywords Abstract Paper

Risk Minimization from Adaptively Collected Data: Guarantees for Supervised and Policy Learning

Aurelien Bibaut, Nathan Kallus, Maria Dimakopoulou and Antoine Chambaz, Mark van der Laan

Keywords Abstract Paper

theory, reinforcement learning and planning, machine learning, bandits

Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning

Sebastian Curi, Felix Berkenkamp, Andreas Krause

Keywords Abstract Paper

Generalization Bound of Gradient Descent for Non-Convex Metric Learning

MINGZHI DONG, Xiaochen Yang, Rui Zhu and Yujiang Wang, Jing-Hao Xue

Keywords Abstract Paper

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

Jean Tarbouriech, Runlong Zhou, Simon Du and Matteo Pirotta, Michal Valko, Alessandro Lazaric

Keywords Abstract Paper

theory, reinforcement learning and planning

Learning Near Optimal Policies with Low Inherent Bellman Error

Andrea Zanette, Alessandro Lazaric, Mykel Kochenderfer, Emma Brunskill

Keywords Abstract Paper

Reinforcement Learning - Theory

Experimental design for regret minimization in linear bandits

Andrew Wagenmaker, Julian Katz-Samuels, Kevin Jamieson

Keywords Abstract Paper

Fast Rates for the Regret of Offline Reinforcement Learning

Yichun Hu, Nathan Kallus, Masatoshi Uehara

Keywords Abstract Paper

Extrapolation for Large-batch Training in Deep Learning

Tao LIN, Lingjing Kong, Sebastian Stich, Martin Jaggi

Keywords Abstract Paper

Deep Learning - Algorithms

Scaling Ensemble Distribution Distillation to Many Classes with Proxy Targets

Max Ryabinin, Andrey Malinin, Mark Gales

Keywords Abstract Paper

machine learning

Task-Robust Model-Agnostic Meta-Learning

Liam Collins, Aryan Mokhtari, Sanjay Shakkottai

Keywords Abstract Paper

CAQL: Continuous Action Q-Learning

Moonkyung Ryu, Yinlam Chow, Ross Anderson and Christian Tjandraatmadja, Craig Boutilier

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Xiaoteng Ma, Xiaohang Tang, Li Xia and
Jun Yang, Qianchuan Zhao

Keywords Paper

Jonathan Lee, Aldo Pacchiano, Vidya Muthukumar and
Weihao Kong, Emma Brunskill

Keywords Paper

Keywords Paper

Weichao Mao, Kaiqing Zhang, Ruihao Zhu and
David Simchi-Levi, Tamer Basar

Keywords Paper

Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy and
Daniel Hsu, Thodoris Lykouris, Miro Dudik, Robert E Schapire

Keywords Paper

Keywords Paper

Aurelien Bibaut, Nathan Kallus, Maria Dimakopoulou and
Antoine Chambaz, Mark van der Laan

Keywords Paper

Keywords Paper

MINGZHI DONG, Xiaochen Yang, Rui Zhu and
Yujiang Wang, Jing-Hao Xue

Keywords Paper

Jean Tarbouriech, Runlong Zhou, Simon Du and
Matteo Pirotta, Michal Valko, Alessandro Lazaric

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Moonkyung Ryu, Yinlam Chow, Ross Anderson and
Christian Tjandraatmadja, Craig Boutilier

Keywords Paper

Keywords Paper

Tianhe Yu, Aviral Kumar, Rafael Rafailov and
Aravind Rajeswaran, Sergey Levine, Chelsea Finn

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Dongsheng Ding, Xiaohan Wei, Zhuoran Yang and
Zhaoran Wang, Mihailo Jovanovic

Keywords Paper

Keywords Paper

Chen-Yu Wei, Mehdi Jafarnia, Haipeng Luo and
Hiteshi Sharma, Rahul Jain

Keywords Paper

Keywords Paper

Haque Ishfaq, Qiwen Cui, Viet Nguyen and
Alex Ayoub, Zhuoran Yang, Zhaoran Wang, Doina Precup, Lin Yang

Keywords Paper

Keywords Paper