Off-Policy Interval Estimation with Lipschitz Value Iteration

Abstract: Off-policy evaluation provides an essential tool for evaluating the effects of different policies or treatments using only observed data. When applied to high-stakes scenarios such as medical diagnosis or financial decision-making, it is essential to provide provably correct upper and lower bounds of the expected reward, not just a classical single point estimate, to the end-users, as executing a poor policy can be very costly. In this work, we propose a provably correct method for obtaining interval bounds for off-policy evaluation in a general continuous setting. The idea is to search for the maximum and minimum values of the expected reward among all the Lipschitz Q-functions that are consistent with the observations, which amounts to solving a constrained optimization problem on a Lipschitz function space. We go on to introduce a Lipschitz value iteration method to monotonically tighten the interval, which is simple yet efficient and provably convergent. We demonstrate the practical efficiency of our method on a range of benchmarks.

03/05/2021

Off-Policy Interval Estimation with Lipschitz Value Iteration

Ziyang Tang, Yihao Feng, Na Zhang, Jian Peng, Qiang Liu

Comments

Similar Papers

Non-asymptotic Confidence Intervals of Off-policy Evaluation: Primal and Dual Bounds

Yihao Feng, Ziyang Tang, Na Zhang, Qiang Liu

Keywords Abstract Paper

Reinforcement Learnings, Off Policy Evaluation, Non-asymptotic Confidence Intervals

Offline Constrained Multi-Objective Reinforcement Learning via Pessimistic Dual Value Iteration

Runzhe Wu, Yufeng Zhang, Zhuoran Yang, Zhaoran Wang

Keywords Abstract Paper

reinforcement learning and planning

Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality

Tengyu Xu, Zhuoran Yang, Zhaoran Wang, Yingbin LIANG

Keywords Abstract Paper

Theory, RL, Decisions and Control Theory

Lenient Regret and Good-Action Identification in Gaussian Process Bandits

Xu Cai, Selwyn Gomes, Jonathan Scarlett

Keywords Abstract Paper

Probabilistic Methods, Gaussian Processes and Bayesian non-parametrics

Tightening Exploration in Upper Confidence Reinforcement Learning

Hippolyte Bourel, Odalric-Ambrym Maillard, Mohammad Sadegh Talebi

Keywords Abstract Paper

Reinforcement Learning - General

The Power of Adaptivity for Stochastic Submodular Cover

Rohan Ghuge, Anupam Gupta, viswanath nagarajan

Keywords Abstract Paper

Optimization, Stochastic Optimization

High-Dimensional Sparse Linear Bandits

Botao Hao, Tor Lattimore, Mengdi Wang

Keywords Abstract Paper

Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes

Alekh Agarwal, Sham Kakade, Jason Lee, Gaurav Mahajan

Keywords Abstract Paper

Reinforcement learning, Non-convex optimization

Dynamic Regret of Convex and Smooth Functions

Peng Zhao, Yu-Jie Zhang, Lijun Zhang, Zhi-Hua Zhou

Keywords Abstract Paper

Stage-wise Conservative Linear Bandits

Ahmadreza Moradipari, Christos Thrampoulidis, Mahnoosh Alizadeh

Keywords Abstract Paper

Near-Optimal Offline Reinforcement Learning via Double Variance Reduction

Ming Yin, Yu Bai, Yu-Xiang Wang

Keywords Abstract Paper

theory, optimization, reinforcement learning and planning

Adaptive Sampling for Stochastic Risk-Averse Learning

Sebastian Curi, Kfir Y. Levy, Stefanie Jegelka, Andreas Krause

Keywords Abstract Paper

On Local Optimizers of Acquisition Functions in Bayesian Optimization

Jungtaek Kim, Seungjin Choi

Keywords Abstract Paper

global optimization, bayesian optimization, acquisition function optimization, instantaneous regret analysis

Risk Bounds and Calibration for a Smart Predict-then-Optimize Method

Heyuan Liu, Paul Grigas

Keywords Abstract Paper

theory, optimization, machine learning

Reanalysis of Variance Reduced Temporal Difference Learning

Tengyu Xu, Zhe Wang, Yi Zhou, Yingbin Liang

Keywords Abstract Paper

Reinforcement Learning, TD learning, Markovian sample, Variance Reduction

Evaluating State-of-the-Art Classification Models Against Bayes Optimality

Ryan Theisen, Huan Wang, Lav Varshney and Caiming Xiong, Richard Socher

Keywords Abstract Paper

machine learning, generative model

Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement Learning

Nathan Kallus, Angela Zhou

Keywords Abstract Paper

Optimal Uniform OPE and Model-based Offline Reinforcement Learning in Time-Homogeneous, Reward-Free and Task-Agnostic Settings

Ming Yin, Yu-Xiang Wang

Keywords Abstract Paper

theory, reinforcement learning and planning

Dynamic Regret of Policy Optimization in Non-Stationary Environments

Yingjie Fei, Zhuoran Yang, Zhaoran Wang, Qiaomin Xie

Keywords Abstract Paper

Leveraging Predictions in Smoothed Online Convex Optimization via Gradient-based Algorithms

Yingying Li, Na Li

Keywords Abstract Paper

Deep Learning -> Generative Models, Deep Learning -> Attention Models

Blending MPC & Value Function Approximation for Efficient Reinforcement Learning

Mohak Bhardwaj, Sanjiban Choudhury, Byron Boots

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Ryan Theisen, Huan Wang, Lav Varshney and
Caiming Xiong, Richard Socher

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Aurelien Bibaut, Nathan Kallus, Maria Dimakopoulou and
Antoine Chambaz, Mark van der Laan

Keywords Paper

Keywords Paper