Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation

26/04/2020

Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation

Ziyang Tang, Yihao Feng, Lihong Li, Dengyong Zhou, Qiang Liu

Keywords: off-policy evaluation, infinite horizon, doubly robust, reinforcement learning

Abstract Paper Similar Papers

Abstract: Infinite horizon off-policy policy evaluation is a highly challenging task due to the excessively large variance of typical importance sampling (IS) estimators. Recently, Liu et al. (2018) proposed an approach that significantly reduces the variance of infinite-horizon off-policy evaluation by estimating the stationary density ratio, but at the cost of introducing potentially high risks due to the error in density ratio estimation. In this paper, we develop a bias-reduced augmentation of their method, which can take advantage of a learned value function to obtain higher accuracy. Our method is doubly robust in that the bias vanishes when either the density ratio or value function estimation is perfect. In general, when either of them is accurate, the bias can also be reduced. Both theoretical and empirical results show that our method yields significant advantages over previous methods.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICLR 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

26/08/2020

A Unified Statistically Efficient Estimation Framework for Unnormalized Models

Masatoshi Uehara, Takafumi Kanamori, Takashi Takenouchi, Takeru Matsuda

Keywords Paper

0

0

0

0

13:58

12/07/2020

Understanding the Curse of Horizon in Off-Policy Evaluation via Conditional Importance Sampling

Yao Liu, Pierre-Luc Bacon, Emma Brunskill

Keywords Paper

Reinforcement Learning - Theory

0

0

0

0

14:45

06/12/2020

Robust and Heavy-Tailed Mean Estimation Made Simple, via Regret Minimization

Sam Hopkins, Jerry Li, Fred Zhang

Keywords Paper

0

0

0

0

3:34

06/12/2020

Task-Robust Model-Agnostic Meta-Learning

Liam Collins, Aryan Mokhtari, Sanjay Shakkottai

Keywords Paper

0

0

0

0

3:17

14/06/2020

A Graduated Filter Method for Large Scale Robust Estimation

Huu Le, Christopher Zach

Keywords Paper

robust fitting, bundle adjustment, non-convex, poor local minima, non-linear least squares, graduated non-convexity.

0

0

0

0

1:01

06/12/2020

Asymptotic Guarantees for Generative Modeling Based on the Smooth Wasserstein Distance

Ziv Goldfeld, Kristjan Greenewald, Kengo Kato

Keywords Paper

0

0

0

0

3:16

06/12/2020

Minimax Value Interval for Off-Policy Evaluation and Policy Optimization

Nan Jiang, Jiawei Huang

Keywords Paper

Algorithms -> Classification, Algorithms -> Semi-Supervised Learning

0

0

0

0

2:56

18/07/2021

Wasserstein Distributional Normalization For Robust Distributional Certification of Noisy Labeled Data

Sung Woo Park, Junseok Kwon

Keywords Paper

Deep Learning, Generative Models, Algorithms, Representation Learning; Optimization, Submodular Optimization, Probabilistic Methods, Robust statistics

0

0

0

0

5:20

06/12/2020

Demystifying Orthogonal Monte Carlo and Beyond

Han Lin, Haoxian Chen, Krzysztof M Choromanski and
Tianyi Zhang, Clement Laroche

Keywords Paper

0

0

0

0

3:19

02/02/2021

Practical and Rigorous Uncertainty Bounds for Gaussian Process Regression

Christian Fiedler, Carsten W. Scherer, Sebastian Trimpe

Keywords Paper

0

0

0

0

18:49

03/05/2021

Rao-Blackwellizing the Straight-Through Gumbel-Softmax Gradient Estimator

Max B Paulus, Chris Maddison, Andreas Krause

Keywords Paper

softmax, gumbel, rao-blackwell, rao, straightthrough, straight-through, gumbel-softmax

0

0

0

0

13:25

06/12/2020

Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses

Raef Bassily, Vitaly Feldman, Cristóbal Guzmán, Kunal Talwar

Keywords Paper

0

0

0

0

3:11

19/08/2021

Independence-aware Advantage Estimation

Pushi Zhang, Li Zhao, Guoqing Liu and
Jiang Bian, Minlie Huang, Tao Qin, Tie-Yan Liu

Keywords Paper

Machine Learning, Reinforcement Learning, Deep Reinforcement Learning

0

0

0

0

14:58

18/07/2021

Improved Confidence Bounds for the Linear Logistic Model and Applications to Bandits

Kwang-Sung Jun, Lalit Jain, Houssam Nassif, Blake Mason

Keywords Paper

Reinforcement Learning and Planning, Bandits

0

0

0

0

5:11

06/12/2021

Conformal Bayesian Computation

Edwin Fong, Chris C Holmes

Keywords Paper

machine learning

0

0

0

0

14:54

18/07/2021

Reinforcement Learning for Cost-Aware Markov Decision Processes

Wesley Suttle, Kaiqing Zhang, Zhuoran Yang and
Ji Liu, David N Kraemer

Keywords Paper

Reinforcement Learning and Planning, Reinforcement Learning, Applications, Robotics, Reinforcement Learning and Planning

0

0

0

0

5:25

13/04/2021

Fundamental limits of ridge-regularized empirical risk minimization in high dimensions

Hossein Taheri, Ramtin Pedarsani, Christos Thrampoulidis

Keywords Paper

0

0

0

0

3:33

18/07/2021

Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality

Tengyu Xu, Zhuoran Yang, Zhaoran Wang, Yingbin LIANG

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

4:23

06/12/2021

Continuous Latent Process Flows

Ruizhi Deng, Marcus Brubaker, Greg Mori, Andreas M Lehrmann

Keywords Paper

generative model

0

0

0

0

14:54

06/12/2021

Trustworthy Multimodal Regression with Mixture of Normal-inverse Gamma Distributions

Huan Ma, Zongbo Han, Changqing Zhang and
Huazhu Fu, Joey Tianyi Zhou, Qinghua Hu

Keywords Paper

0

0

0

0

5:37

12/07/2020

Doubly robust off-policy evaluation with shrinkage

Yi Su, Maria Dimakopoulou, Akshay Krishnamurthy, Miroslav Dudik

Keywords Paper

Online Learning, Active Learning, and Bandits

0

0

0

0

15:08

06/12/2020

Markovian Score Climbing: Variational Inference with KL(p||q)

Christian Naesseth, Fredrik Lindsten, David Blei

Keywords Paper

0

0

0

0

2:30

12/07/2020

Structure Adaptive Algorithms for Stochastic Bandits

Rémy Degenne, Han Shao, Wouter Koolen

Keywords Paper

Online Learning, Active Learning, and Bandits

0

0

0

0

16:05

13/04/2021

Direct loss minimization for sparse gaussian processes

Yadi Wei, Rishit Sheth, Roni Khardon

Keywords Paper

0

0

0

0

3:24

06/12/2021

Differentiable Annealed Importance Sampling and the Perils of Gradient Noise

Guodong Zhang, Kyle Hsu, Jianing Li and
Chelsea Finn, Roger Grosse

Keywords Paper

optimization, generative model

0

0

0

0

15:30

06/12/2020

Projection Robust Wasserstein Distance and Riemannian Optimization

Darren Lin, Chenyou Fan, Nhat Ho and
Marco Cuturi, Michael Jordan

Keywords Paper

Optimization -> Non-Convex Optimization; Optimization -> Stochastic Optimization, Deep Learning -> Optimization for Deep Networks

0

0

0

1

3:01

26/04/2020

On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning

Jian Li, Xuanyuan Luo, Mingda Qiao

Keywords Paper

learning theory, generalization, nonconvex learning, stochastic gradient descent, Langevin dynamics

0

0

0

0

4:50

06/12/2021

Loss function based second-order Jensen inequality and its application to particle variational inference

Futoshi Futami, Tomoharu Iwata, naonori ueda and
Issei Sato, Masashi Sugiyama

Keywords Paper

optimization, generative model

0

0

0

0

14:09

26/08/2020

Distributionally Robust Bayesian Quadrature Optimization

Thanh Nguyen, Sunil Gupta, Huong Ha and
Santu Rana, Svetha Venkatesh

Keywords Paper

0

0

0

0

11:54

03/08/2020

Joint Stochastic Approximation and Its Application to Learning Discrete Latent Variable Models

Zhijian Ou, Yunfu Song

Keywords Paper

0

0

0

0

8:24

12/07/2020

Fast and Consistent Learning of Hidden Markov Models by Incorporating Non-Consecutive Correlations

Robert Mattila, Cristian Rojas, Eric Moulines and
Vikram Krishnamurthy, Bo Wahlberg

Keywords Paper

Sequential, Network, and Time-Series Modeling

0

0

0

0

13:37

06/12/2021

Preconditioned Gradient Descent for Over-Parameterized Nonconvex Matrix Factorization

Jialun Zhang, Salar Fattahi, Richard Y Zhang

Keywords Paper

optimization

0

0

0

0

8:36

18/07/2021

Robust Inference for High-Dimensional Linear Models via Residual Randomization

Y. Samuel Wang, Si Kai Lee, Panos Toulis, Mladen Kolar

Keywords Paper

Theory, Statistical Learning Theory

0

0

0

0

3:45

06/12/2021

On Density Estimation with Diffusion Models

Diederik Kingma, Tim Salimans, Ben Poole, Jonathan Ho

Keywords Paper

optimization, generative model

0

0

0

0

9:53

06/12/2020

Noise-Contrastive Estimation for Multivariate Point Processes

Hongyuan Mei, Tom Wan, Jason Eisner

Keywords Paper

0

0

0

0

3:20

06/12/2020

Recursive Inference for Variational Autoencoders

Minyoung Kim, Vladimir Pavlovic

Keywords Paper

0

0

0

0

3:24

18/07/2021

Quantization Algorithms for Random Fourier Features

Xiaoyun Li, Ping Li

Keywords Paper

Deep Learning, Adversarial Networks, Deep Learning, Generative Models, Algorithms, Large Scale Learning

0

0

0

0

5:14

04/08/2021

Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation

Andrea Zanette, Ching-An Cheng, Alekh Agarwal

Keywords Paper

0

0

0

0

15:11

06/12/2021

A Closer Look at the Worst-case Behavior of Multi-armed Bandit Algorithms

Anand Kalvit, Assaf Zeevi

Keywords Paper

bandits

0

0

0

0

15:13

06/12/2020

Robustness Analysis of Non-Convex Stochastic Gradient Descent using Biased Expectations

Kevin Scaman, Cedric Malherbe

Keywords Paper

0

0

0

0

3:09