Empirical Likelihood for Contextual Bandits

Abstract: We propose an estimator and confidence interval for computing the value of a policy from off-policy data in the contextual bandit setting. To this end we apply empirical likelihood techniques to formulate our estimator and confidence interval as simple convex optimization problems. Using the lower bound of our confidence interval, we then propose an off-policy policy optimization algorithm that searches for policies with large reward lower bound. We empirically find that both our estimator and confidence interval improve over previous proposals in finite sample regimes. Finally, the policy optimization algorithm we propose outperforms a strong baseline system for learning from off-policy data.

12/07/2020

Empirical Likelihood for Contextual Bandits

Nikos Karampatziakis, John Langford, Paul Mineiro

Comments

Similar Papers

Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation

Yaqi Duan, Zeyu Jia, Mengdi Wang

Keywords Abstract Paper

Learning Theory

Learning prediction intervals for regression: Generalization and calibration

Haoxian Chen, Ziyi Huang, Henry Lam and Huajie Qian, Haofeng Zhang

Keywords Abstract Paper

Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality

Tengyu Xu, Zhuoran Yang, Zhaoran Wang, Yingbin LIANG

Keywords Abstract Paper

Theory, RL, Decisions and Control Theory

Risk Bounds and Calibration for a Smart Predict-then-Optimize Method

Heyuan Liu, Paul Grigas

Keywords Abstract Paper

theory, optimization, machine learning

Sublinear Optimal Policy Value Estimation in Contextual Bandits

Weihao Kong, Emma Brunskill, Gregory Valiant

Keywords Abstract Paper

Ranking Policy Gradient

Kaixiang Lin, Jiayu Zhou

Keywords Abstract Paper

Sample-efficient reinforcement learning, off-policy learning.

Variational Bayesian Optimistic Sampling

Brendan O'Donoghue, Tor Lattimore

Keywords Abstract Paper

optimization, reinforcement learning and planning, generative model, bandits, online learning

Variance Penalized On-Policy and Off-Policy Actor-Critic

Arushi Jain, Gandharv Patil, Ayush Jain and Khimya Khetarpal, Doina Precup

Keywords Abstract Paper

Exploiting Higher Order Smoothness in Derivative-free Optimization and Continuous Bandits

Arya Akhavan, Massimiliano Pontil, Alexandre Tsybakov

Keywords Abstract Paper

Reinforcement Learning and Planning -> Reinforcement Learning, Applications -> Privacy, Anonymity, and Security

Distributionally Robust Optimization with Markovian Data

Mengmeng Li, Tobias Sutter, Daniel Kuhn

Keywords Abstract Paper

Optimization

Wasserstein Distributionally Robust Inverse Multiobjective Optimization

Chaosheng Dong, Bo Zeng

Keywords Abstract Paper

Why Non-myopic Bayesian Optimization is Promising and How Far Should We Look-ahead? A Study via Rollout

Xubo Yue, Raed AL Kontar

Keywords Abstract Paper

Ordered SGD: A New Stochastic Optimization Framework for Empirical Risk Minimization

Kenji Kawaguchi, Haihao Lu

Keywords Abstract Paper

Global Concavity and Optimization in a Class of Dynamic Discrete Choice Models

Yiding Feng, Ekaterina Khmelnitskaya, Denis Nekipelov

Keywords Abstract Paper

Applications - Other

Distributionally Robust Federated Averaging

Yuyang Deng, Mohammad Mahdi Kamani, Mehrdad Mahdavi

Keywords Abstract Paper

Objective Bound Conditional Gaussian Process for Bayesian Optimization

Taewon Jeong, Heeyoung Kim

Keywords Abstract Paper

Probabilistic Methods, Gaussian Processes and Bayesian non-parametrics

Variance-Aware Off-Policy Evaluation with Linear Function Approximation

Yifei Min, Tianhao Wang, Dongruo Zhou, Quanquan Gu

Keywords Abstract Paper

theory, reinforcement learning and planning

Beyond Variance Reduction: Understanding the True Impact of Baselines on Policy Optimization

Wes Chung, Valentin Thomas, Marlos C. Machado, Nicolas Le Roux

Keywords Abstract Paper

Reinforcement Learning and Planning

Risk Minimization from Adaptively Collected Data: Guarantees for Supervised and Policy Learning

Aurelien Bibaut, Nathan Kallus, Maria Dimakopoulou and Antoine Chambaz, Mark van der Laan

Keywords Abstract Paper

theory, reinforcement learning and planning, machine learning, bandits

Distributionally Robust Bayesian Quadrature Optimization

Thanh Nguyen, Sunil Gupta, Huong Ha and Santu Rana, Svetha Venkatesh

Keywords Abstract Paper

CoinDICE: Off-Policy Confidence Interval Estimation

Bo Dai, Ofir Nachum, Yinlam Chow and Lihong Li, Csaba Szepesvari, Dale Schuurmans

Keywords Abstract Paper

Logistic q-learning

Keywords Paper

Haoxian Chen, Ziyi Huang, Henry Lam and
Huajie Qian, Haofeng Zhang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Arushi Jain, Gandharv Patil, Ayush Jain and
Khimya Khetarpal, Doina Precup

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Aurelien Bibaut, Nathan Kallus, Maria Dimakopoulou and
Antoine Chambaz, Mark van der Laan

Keywords Paper

Thanh Nguyen, Sunil Gupta, Huong Ha and
Santu Rana, Svetha Venkatesh

Keywords Paper

Bo Dai, Ofir Nachum, Yinlam Chow and
Lihong Li, Csaba Szepesvari, Dale Schuurmans

Keywords Paper

Keywords Paper

Yichong Xu, Ruosong Wang, Lin Yang and
Aarti Singh, Artur Dubrawski

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Bahar Taskesen, Man Chung Yue, Jose Blanchet and
Daniel Kuhn, Viet Anh Nguyen

Keywords Paper

Xinyun Chen, Lu Wang, Yizhe Hang and
Heng Ge, Hongyuan Zha

Keywords Paper

Evgenii Chzhen, Christophe Denis, Mohamed Hebiri and
Luca Oneto, Massimiliano Pontil

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper