Variational Bayesian Reinforcement Learning with Regret Bounds

Abstract: We consider the exploration-exploitation trade-off in reinforcement learning and show that an agent endowed with an exponential epistemic-risk-seeking utility function explores efficiently, as measured by regret. The state-action values induced by the exponential utility satisfy a Bellman recursion, so we can use dynamic programming to compute them. We call the resulting algorithm K-learning (for knowledge) and the risk-seeking utility ensures that the associated state-action values (K-values) are optimistic for the expected optimal Q-values under the posterior. The exponential utility function induces a Boltzmann exploration policy for which the 'temperature' parameter is equal to the risk-seeking parameter and is carefully controlled to yield a Bayes regret bound of $\tilde O(L^{3/2} \sqrt{S A T})$, where $L$ is the time horizon, $S$ is the number of states, $A$ is the number of actions, and $T$ is the total number of elapsed timesteps. We conclude with a numerical example demonstrating that K-learning is competitive with other state-of-the-art algorithms in practice.

06/12/2020

Variational Bayesian Reinforcement Learning with Regret Bounds

Comments

Similar Papers

Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret

Yingjie Fei, Zhuoran Yang, Yudong Chen and Zhaoran Wang, Qiaomin Xie

Keywords Abstract Paper

Model-based Reinforcement Learning for Continuous Control with Posterior Sampling

Ying Fan, Yifei Ming

Keywords Abstract Paper

Provably efficient safe exploration via primal-dual policy optimization

Dongsheng Ding, Xiaohan Wei, Zhuoran Yang and Zhaoran Wang, Mihailo Jovanovic

Keywords Abstract Paper

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

Jean Tarbouriech, Runlong Zhou, Simon Du and Matteo Pirotta, Michal Valko, Alessandro Lazaric

Keywords Abstract Paper

theory, reinforcement learning and planning

Accommodating Picky Customers: Regret Bound and Exploration Complexity for Multi-Objective Reinforcement Learning

Jingfeng Wu, Vladimir Braverman, Lin Yang

Keywords Abstract Paper

theory, reinforcement learning and planning

Adaptivity in Adaptive Submodularity

Hossein Esfandiari, Amin Karbasi, Vahab Mirrokni

Keywords Abstract Paper

Perturbation-based Regret Analysis of Predictive Control in Linear Time Varying Systems

Yiheng Lin, Yang Hu, Guanya Shi and Haoyuan Sun, Guannan Qu, Adam Wierman

Keywords Abstract Paper

Nearly Minimax Optimal Reinforcement Learning for Linear Mixture MDPs

Dongruo Zhou, Quanquan Gu, Csaba Szepesvari

Keywords Abstract Paper

Randomized Exploration in Reinforcement Learning with General Value Function Approximation

Haque Ishfaq, Qiwen Cui, Viet Nguyen and Alex Ayoub, Zhuoran Yang, Zhaoran Wang, Doina Precup, Lin Yang

Keywords Abstract Paper

Theory, RL, Decisions and Control Theory

Contextual Combinatorial Volatile Multi-armed Bandit with Adaptive Discretization

Andi Nika, Sepehr Elahi, Cem Tekin

Keywords Abstract Paper

Adaptive Reward-Poisoning Attacks against Reinforcement Learning

Xuezhou Zhang, Yuzhe Ma, Adish Singla, Jerry Zhu

Keywords Abstract Paper

Frequentist Regret Bounds for Randomized Least-Squares Value Iteration

Andrea Zanette, David Brandfonbrener, Emma Brunskill and Matteo Pirotta, Alessandro Lazaric

Keywords Abstract Paper

Dynamic Regret of Policy Optimization in Non-Stationary Environments

Yingjie Fei, Zhuoran Yang, Zhaoran Wang, Qiaomin Xie

Keywords Abstract Paper

Adaptive Online Estimation of Piecewise Polynomial Trends

Dheeraj Baby, Yu-Xiang Wang

Keywords Abstract Paper

Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning

Ming Yin, Yu-Xiang Wang

Keywords Abstract Paper

Variational Policy Gradient Method for Reinforcement Learning with General Utilities

Junyu Zhang, Alec Koppel, Amrit Bedi and Csaba Szepesvari, Mengdi Wang

Keywords Abstract Paper

Combinatorial Pure Exploration with Full-Bandit or Partial Linear Feedback

Yihan Du, Yuko Kuroki, Wei Chen

Keywords Abstract Paper

UCB Momentum Q-learning: Correcting the bias without forgetting

Pierre MENARD, Omar Darwiche Domingues, Xuedong Shang, Michal Valko

Keywords Abstract Paper

Theory, RL, Decisions and Control Theory

Nearly Horizon-Free Offline Reinforcement Learning

Tongzheng Ren, Jialian Li, Bo Dai and Simon Du, Sujay Sanghavi

Keywords Abstract Paper

theory, optimization, reinforcement learning and planning

Geometric Exploration for Online Control

Orestis Plevrakis, Elad Hazan

Keywords Abstract Paper

Exploration by Maximizing Renyi Entropy for Reward-Free RL Framework

Chuheng Zhang, Yuanying Cai, Longbo Huang, Jian Li

Keywords Abstract Paper

Stage-wise Conservative Linear Bandits

Ahmadreza Moradipari, Christos Thrampoulidis, Mahnoosh Alizadeh

Keywords Abstract Paper

Reinforcement learning in parametric MDPs with exponential families

Sayak Ray Chowdhury, Aditya Gopalan, Odalric-Ambrym Maillard

Keywords Abstract Paper

Provably Efficient Reinforcement Learning with Kernel and Neural Function Approximations

Zhuoran Yang, Chi Jin, Zhaoran Wang and Mengdi Wang, Michael Jordan

Keywords Abstract Paper

Yingjie Fei, Zhuoran Yang, Yudong Chen and
Zhaoran Wang, Qiaomin Xie

Keywords Paper

Keywords Paper

Dongsheng Ding, Xiaohan Wei, Zhuoran Yang and
Zhaoran Wang, Mihailo Jovanovic

Keywords Paper

Jean Tarbouriech, Runlong Zhou, Simon Du and
Matteo Pirotta, Michal Valko, Alessandro Lazaric

Keywords Paper

Keywords Paper

Keywords Paper

Yiheng Lin, Yang Hu, Guanya Shi and
Haoyuan Sun, Guannan Qu, Adam Wierman

Keywords Paper

Keywords Paper

Haque Ishfaq, Qiwen Cui, Viet Nguyen and
Alex Ayoub, Zhuoran Yang, Zhaoran Wang, Doina Precup, Lin Yang

Keywords Paper

Keywords Paper

Keywords Paper

Andrea Zanette, David Brandfonbrener, Emma Brunskill and
Matteo Pirotta, Alessandro Lazaric

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Junyu Zhang, Alec Koppel, Amrit Bedi and
Csaba Szepesvari, Mengdi Wang

Keywords Paper

Keywords Paper

Keywords Paper

Tongzheng Ren, Jialian Li, Bo Dai and
Simon Du, Sujay Sanghavi

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Zhuoran Yang, Chi Jin, Zhaoran Wang and
Mengdi Wang, Michael Jordan

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Moonkyung Ryu, Yinlam Chow, Ross Anderson and
Christian Tjandraatmadja, Craig Boutilier

Keywords Paper

Keywords Paper

Chi Jin, Tiancheng Jin, Haipeng Luo and
Suvrit Sra, Tiancheng Yu

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper