Sublinear Optimal Policy Value Estimation in Contextual Bandits

Abstract: We study the problem of estimating the expected reward of the optimal policy in the stochastic disjoint linear bandit setting. We prove that for certain settings it is possible to obtain an accurate estimate of the optimal policy value even with a sublinear number of samples, where a linear set would be needed to reliably estimate the reward that can be obtained by any policy. We establish near matching information theoretic lower bounds, showing that our algorithm achieves near optimal estimation error. Finally, we demonstrate the effectiveness of our algorithm on joke recommendation and cancer inhibition dosage selection problems using real datasets.

13/04/2021

Sublinear Optimal Policy Value Estimation in Contextual Bandits

Weihao Kong, Emma Brunskill, Gregory Valiant

Comments

Similar Papers

Stochastic bandits with linear constraints

Aldo Pacchiano, Mohammad Ghavamzadeh, Peter Bartlett, Heinrich Jiang

Keywords Abstract Paper

Empirical Likelihood for Contextual Bandits

Nikos Karampatziakis, John Langford, Paul Mineiro

Keywords Abstract Paper

Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation

Yaqi Duan, Zeyu Jia, Mengdi Wang

Keywords Abstract Paper

Learning Theory

Neural Contextual Bandits with UCB-based Exploration

Dongruo Zhou, Lihong Li, Quanquan Gu

Keywords Abstract Paper

Online Learning, Active Learning, and Bandits

An Efficient Algorithm for Deep Stochastic Contextual Bandits

Tan Zhu, Guannan Liang, Chunjiang Zhu and Haining Li, Jinbo Bi

Keywords Abstract Paper

Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient

Botao Hao, Yaqi Duan, Tor Lattimore and Csaba Szepesvari, Mengdi Wang

Keywords Abstract Paper

Theory, Statistical Learning Theory

Provably efficient safe exploration via primal-dual policy optimization

Dongsheng Ding, Xiaohan Wei, Zhuoran Yang and Zhaoran Wang, Mihailo Jovanovic

Keywords Abstract Paper

Adversarial Intrinsic Motivation for Reinforcement Learning

Ishan Durugkar, Mauricio Tec, Scott Niekum, Peter Stone

Keywords Abstract Paper

reinforcement learning and planning, generative model

Variational Bayesian Optimistic Sampling

Brendan O'Donoghue, Tor Lattimore

Keywords Abstract Paper

optimization, reinforcement learning and planning, generative model, bandits, online learning

Reinforcement Learning with Trajectory Feedback

Yonathan Efroni, Nadav Merlis, Shie Mannor

Keywords Abstract Paper

Smooth bandit optimization: Generalization to holder space

Yusha Liu, Yining Wang, Aarti Singh

Keywords Abstract Paper

Information Directed Reward Learning for Reinforcement Learning

David Lindner, Matteo Turchetta, Sebastian Tschiatschek and Kamil Ciosek, Andreas Krause

Keywords Abstract Paper

reinforcement learning and planning, active learning

Disposable Linear Bandits for Online Recommendations

Melda Korkut, Andrew Li

Keywords Abstract Paper

A Lower Bound for the Sample Complexity of Inverse Reinforcement Learning

Abi Komanduru, Jean Honorio

Keywords Abstract Paper

Theory, Statistical Learning Theory

Settling the Variance of Multi-Agent Policy Gradients

Jakub Grudzien Kuba, Muning Wen, Linghui Meng and shangding gu, Haifeng Zhang, David Mguni, Jun Wang, Yaodong Yang

Keywords Abstract Paper

deep learning, reinforcement learning and planning

Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality

Tengyu Xu, Zhuoran Yang, Zhaoran Wang, Yingbin LIANG

Keywords Abstract Paper

Theory, RL, Decisions and Control Theory

Variational Policy Gradient Method for Reinforcement Learning with General Utilities

Junyu Zhang, Alec Koppel, Amrit Bedi and Csaba Szepesvari, Mengdi Wang

Keywords Abstract Paper

On-Policy Deep Reinforcement Learning for the Average-Reward Criterion

Yiming Zhang, Keith Ross

Keywords Abstract Paper

Reinforcement Learning and Planning

A Bandit Learning Algorithm and Applications to Auction Design

Kim Thang Nguyen

Keywords Abstract Paper

Online A-Optimal Design and Active Linear Regression

Xavier Fontaine, Pierre Perrault, Michal Valko, Vianney Perchet

Keywords Abstract Paper

Algorithms, Online Learning Algorithms

Variance-Aware Off-Policy Evaluation with Linear Function Approximation

Yifei Min, Tianhao Wang, Dongruo Zhou, Quanquan Gu

Keywords Abstract Paper

theory, reinforcement learning and planning

On Reward-Free Reinforcement Learning with Linear Function Approximation

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Tan Zhu, Guannan Liang, Chunjiang Zhu and
Haining Li, Jinbo Bi

Keywords Paper

Botao Hao, Yaqi Duan, Tor Lattimore and
Csaba Szepesvari, Mengdi Wang

Keywords Paper

Dongsheng Ding, Xiaohan Wei, Zhuoran Yang and
Zhaoran Wang, Mihailo Jovanovic

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

David Lindner, Matteo Turchetta, Sebastian Tschiatschek and
Kamil Ciosek, Andreas Krause

Keywords Paper

Keywords Paper

Keywords Paper

Jakub Grudzien Kuba, Muning Wen, Linghui Meng and
shangding gu, Haifeng Zhang, David Mguni, Jun Wang, Yaodong Yang

Keywords Paper

Keywords Paper

Junyu Zhang, Alec Koppel, Amrit Bedi and
Csaba Szepesvari, Mengdi Wang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Xi Liu, Ping-Chun Hsieh, Yu Heng Hung and
Anirban Bhattacharya, P. Kumar

Keywords Paper

Keywords Paper

Ziyang Tang, Yihao Feng, Na Zhang and
Jian Peng, Qiang Liu

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Thanh Nguyen, Sunil Gupta, Huong Ha and
Santu Rana, Svetha Venkatesh

Keywords Paper

Keywords Paper

Kaiqing Zhang, TAO SUN, Yunzhe Tao and
Sahika Genc, Sunil Mallya, Tamer Basar

Keywords Paper