Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection

Abstract: We study the role of the representation of state-action value functions in regret minimization in finite-horizon Markov Decision Processes (MDPs) with linear structure. We first derive a necessary condition on the representation, called universally spanning optimal features (UNISOFT), to achieve constant regret in any MDP with linear reward function. This result encompasses the well-known settings of low-rank MDPs and, more generally, zero inherent Bellman error (also known as the Bellman closure assumption). We then demonstrate that this condition is also sufficient for these classes of problems by deriving a constant regret bound for two optimistic algorithms (LSVI-UCB and ELEANOR). Finally, we propose an algorithm for representation selection and we prove that it achieves constant regret when one of the given representations, or a suitable combination of them, satisfies the UNISOFT condition.

13/04/2021

Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection

Matteo Papini, Andrea Tirinzoni, Aldo Pacchiano, Marcello Restelli, Alessandro Lazaric, Matteo Pirotta

Comments

Similar Papers

Improved exploration in factored average-reward MDPs

Mohammad Sadegh Talebi, Anders Jonsson, Odalric Maillard

Keywords Abstract Paper

Bandit algorithms: Letting go of logarithmic regret for statistical robustness

Kumar Ashutosh, Jayakrishnan Nair, Anmol Kagrecha, Krishna Jagannathan

Keywords Abstract Paper

Smooth bandit optimization: Generalization to holder space

Yusha Liu, Yining Wang, Aarti Singh

Keywords Abstract Paper

Near-Optimal Representation Learning for Linear Bandits and Linear RL

Jiachen Hu, Xiaoyu Chen, Chi Jin and Lihong Li, Liwei Wang

Keywords Abstract Paper

Theory, Online Learning Theory

Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs

jiafan he, Dongruo Zhou, Quanquan Gu

Keywords Abstract Paper

Regret-optimal filtering

Oron Sabag, Babak Hassibi

Keywords Abstract Paper

Thompson Sampling for Linearly Constrained Bandits

Vidit Saxena, Joakim Jalden, Joseph Gonzalez

Keywords Abstract Paper

Sequential prediction under log-loss and misspecification

Meir Feder, Yury Polyanskiy

Keywords Abstract Paper

Dynamic Regret of Policy Optimization in Non-Stationary Environments

Yingjie Fei, Zhuoran Yang, Zhaoran Wang, Qiaomin Xie

Keywords Abstract Paper

Improved Worst-Case Regret Bounds for Randomized Least-Squares Value Iteration

Priyank Agrawal, Jinglin Chen, Nan Jiang

Keywords Abstract Paper

Neural Regret-Matching for Distributed Constraint Optimization Problems

Yanchen Deng, Runsheng Yu, Xinrun Wang, Bo An

Keywords Abstract Paper

Agent-based and Multi-agent Systems, Coordination and Cooperation, Constraint Optimization, Distributed Constraints

Optimizing Optimizers: Regret-optimal gradient descent algorithms

Philippe Casgrain, Anastasis Kratsios

Keywords Abstract Paper

Leveraging Good Representations in Linear Contextual Bandits

Matteo Papini, Andrea Tirinzoni, Marcello Restelli and Alessandro Lazaric, Matteo Pirotta

Keywords Abstract Paper

Reinforcement Learning and Planning, Bandits

Minimax Regret Optimisation for Robust Planning in Uncertain Markov Decision Processes

Marc Rigter, Bruno Lacerda, Nick Hawes

Keywords Abstract Paper

Model-based Reinforcement Learning for Continuous Control with Posterior Sampling

Ying Fan, Yifei Ming

Keywords Abstract Paper

Regret Bounds without Lipschitz Continuity: Online Learning with Relative-Lipschitz Losses

Yihan Zhou, Victor Sanches Portella, Mark Schmidt, Nick Harvey

Keywords Abstract Paper

RL for Latent MDPs: Regret Guarantees and a Lower Bound

Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor

Keywords Abstract Paper

Regret Bounds for Batched Bandits

Hossein Esfandiari, Amin Karbasi, Abbas Mehrabian, Vahab Mirrokni

Keywords Abstract Paper

Smooth Contextual Bandits: Bridging the Parametric and Non-differentiable Regret Regimes

YICHUN HU, Nathan Kallus, Xiaojie Mao

Keywords Abstract Paper

Bandit problems,

Lenient Regret for Multi-Armed Bandits

Nadav Merlis, Shie Mannor

Keywords Abstract Paper

On Lower Bounds for Standard and Robust Gaussian Process Bandit Optimization

Xu Cai, Jonathan Scarlett

Keywords Abstract Paper

Applications, Natural Language Processing, Applications, Network Analysis, Reinforcement Learning and Planning, Bandits

Logistic Regression Regret: What’s the Catch?

Keywords Abstract Paper

Online learning, Convex optimization, Information theory, Regression

Beyond $log^2(T)$ regret for decentralized bandits in matching markets

Soumya Basu, Karthik Abinav Sankararaman, Abishek Sankararaman

Keywords Abstract Paper

Reinforcement Learning and Planning, Bandits

Fine-Grained Gap-Dependent Bounds for Tabular MDPs via Adaptive Multi-Step Bootstrap

Keywords Paper

Keywords Paper

Keywords Paper

Jiachen Hu, Xiaoyu Chen, Chi Jin and
Lihong Li, Liwei Wang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Matteo Papini, Andrea Tirinzoni, Marcello Restelli and
Alessandro Lazaric, Matteo Pirotta

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Udaya Ghai, Holden Lee, Karan Singh and
Cyril Zhang, Yi Zhang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper