Black-box Off-policy Estimation for Infinite-Horizon Reinforcement Learning

26/04/2020

Black-box Off-policy Estimation for Infinite-Horizon Reinforcement Learning

Ali Mousavi, Lihong Li, Qiang Liu, Denny Zhou

Keywords: reinforcement learning, off-policy estimation, importance sampling, propensity score

Abstract Paper Similar Papers

Abstract: Off-policy estimation for long-horizon problems is important in many real-life applications such as healthcare and robotics, where high-fidelity simulators may not be available and on-policy evaluation is expensive or impossible. Recently, \citet{liu18breaking} proposed an approach that avoids the curse of horizon suffered by typical importance-sampling-based methods. While showing promising results, this approach is limited in practice as it requires data being collected by a known behavior policy. In this work, we propose a novel approach that eliminates such limitations. In particular, we formulate the problem as solving for the fixed point of a "backward flow" operator and show that the fixed point solution gives the desired importance ratios of stationary distributions between the target and behavior policies. We analyze its asymptotic consistency and finite-sample generalization. Experiments on benchmarks verify the effectiveness of our proposed approach.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICLR 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

18/07/2021

Monotonic Robust Policy Optimization with Model Discrepancy

yuankun jiang, Chenglin Li, Wenrui Dai and
Junni Zou, Hongkai Xiong

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

5:17

03/05/2021

Offline Model-Based Optimization via Normalized Maximum Likelihood Estimation

Justin Fu, Sergey Levine

Keywords Paper

model-based optimization, normalized maximum likelihood

0

0

0

0

7:37

12/07/2020

Forecasting sequential data using Consistent Koopman Autoencoders

Omri Azencot, N. Benjamin Erichson, Vanessa Lin, Michael Mahoney

Keywords Paper

Sequential, Network, and Time-Series Modeling

0

0

0

0

14:06

26/08/2020

A Nonparametric Off-Policy Policy Gradient

Samuele Tosatto, Joao Carvalho, Hany Abdulsamad, Jan Peters

Keywords Paper

0

0

0

0

12:19

03/05/2021

Representation Balancing Offline Model-based Reinforcement Learning

Byung-Jun Lee, Jongmin Lee, Kee-Eung Kim

Keywords Paper

Off-policy policy evaluation, Batch Reinforcement Learning, Offline Reinforcement Learning, Model-based Reinforcement Learning, Reinforcement Learning

0

0

0

0

5:45

18/07/2021

Outside the Echo Chamber: Optimizing the Performative Risk

John Miller, Juan Perdomo, Tijana Zrnic

Keywords Paper

Theory, Statistical Learning Theory

0

0

0

0

5:05

18/07/2021

Continuous-time Model-based Reinforcement Learning

Cagatay Yildiz, Markus Heinonen, Harri Lähdesmäki

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

5:00

26/04/2020

Short and Sparse Deconvolution --- A Geometric Approach

Yenson Lau, Qing Qu, Han-Wen Kuo and
Pengcheng Zhou, Yuqian Zhang, John Wright

Keywords Paper

0

0

0

0

7:18

18/07/2021

Provably Correct Optimization and Exploration with Non-linear Policies

Fei Feng, Wotao Yin, Alekh Agarwal, Lin Yang

Keywords Paper

Deep Learning, Adversarial Networks, Applications, Fairness, Accountability, and Transparency, Theory, RL, Decisions and Control Theory

0

0

0

0

5:03

03/05/2021

Enforcing robust control guarantees within neural network policies

Priya Donti, Melrose Roderick, Mahyar Fazlyab, Zico Kolter

Keywords Paper

reinforcement learning, differentiable optimization, robust control

0

0

0

1

5:09

04/08/2021

Group testing and local search: is there a computational-statistical gap?

Fotis Iliopoulos, Ilias Zadik

Keywords Paper

0

0

0

0

17:50

06/12/2020

Reinforcement Learning for Control with Multiple Frequencies

Jongmin Lee, Byung-Jun Lee, Kee-Eung Kim

Keywords Paper

Algorithms -> Multitask and Transfer Learning; Deep Learning -> Supervised Deep Networks; Theory -> Learning Theory; Theory -> , Deep Learning

0

0

0

0

3:21

06/12/2021

Integrated Latent Heterogeneity and Invariance Learning in Kernel Space

Jiashuo Liu, Zheyuan Hu, Peng Cui and
Bo Li, Zheyan Shen

Keywords Paper

deep learning, reinforcement learning and planning, machine learning

0

0

0

0

11:11

14/09/2020

Escaping Saddle Points of Empirical Risk Privately and Scalably via DP-Trust Region Method

Di Wang, Jinhui Xu

Keywords Paper

differential privacy, empirical risk minimization, private machine learning

0

0

0

0

15:13

18/07/2021

Value Iteration in Continuous Actions, States and Time

Michael Lutter, Shie Mannor, Jan Peters and
Dieter Fox, Animesh Garg

Keywords Paper

Reinforcement Learning and Planning, Planning and Control

0

0

0

0

5:09

06/12/2020

Neural Controlled Differential Equations for Irregular Time Series

Patrick Kidger, James Morrill, James Foster, Terry Lyons

Keywords Paper

0

0

0

0

3:09

06/12/2020

Task-Agnostic Amortized Inference of Gaussian Process Hyperparameters

Sulin Liu, Xingyuan Sun, Peter J Ramadge, Ryan Adams

Keywords Paper

0

0

0

0

3:46

03/05/2021

Non-asymptotic Confidence Intervals of Off-policy Evaluation: Primal and Dual Bounds

Yihao Feng, Ziyang Tang, Na Zhang, Qiang Liu

Keywords Paper

Reinforcement Learnings, Off Policy Evaluation, Non-asymptotic Confidence Intervals

0

0

0

0

4:26

06/12/2021

Truncated Marginal Neural Ratio Estimation

Benjamin K Miller, Alex Cole, Patrick Forré and
Gilles Louppe, Christoph Weniger

Keywords Paper

robustness

0

0

0

0

10:11

06/12/2021

Model-Based Domain Generalization

Alexander Robey, George J. Pappas, Hamed Hassani

Keywords Paper

theory, deep learning, optimization, robustness, domain adaptation

0

0

0

0

15:08

18/07/2021

EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL

Seyed Kamyar Seyed Ghasemipour, Dale Schuurmans, Shixiang Gu

Keywords Paper

Reinforcement Learning and Planning

0

0

0

1

5:54

06/12/2021

SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data

Alicia Curth, Changhee Lee, Mihaela van der Schaar

Keywords Paper

deep learning, machine learning, domain adaptation, causality

0

0

0

0

13:43

06/12/2021

Determinantal point processes based on orthogonal polynomials for sampling minibatches in SGD

Rémi Bardenet, Subhroshekhar Ghosh, Meixia LIN

Keywords Paper

optimization, machine learning

0

0

0

0

14:51

22/11/2021

Neighborhood-Aware Neural Architecture Search

Xiaofang Wang, Shengcao Cao, Mengtian Li, Kris Kitani

Keywords Paper

Neural Architecture Search, Generalization, Flat Minima

0

0

0

0

2:45

06/12/2021

Cross-modal Domain Adaptation for Cost-Efficient Visual Reinforcement Learning

Xiong-Hui Chen, Shengyi Jiang, Feng Xu and
Zongzhang Zhang, Yang Yu

Keywords Paper

reinforcement learning and planning, domain adaptation

0

0

0

0

10:55

02/02/2021

Any-Precision Deep Neural Networks

Haichao Yu, Haoxiang Li, Humphrey Shi and
Thomas S. Huang, Gang Hua

Keywords Paper

0

0

0

0

14:26

26/04/2020

On Robustness of Neural Ordinary Differential Equations

Hanshu YAN, Jiawei DU, Vincent TAN, Jiashi FENG

Keywords Paper

Neural ODE

0

0

0

0

5:09

26/08/2020

The Area of the Convex Hull of Sampled Curves: a Robust Functional Statistical Depth measure

Guillaume Staerman, Pavlo Mozharovskyi, Stéphan Clémençon

Keywords Paper

0

0

0

0

14:59

06/12/2020

Adversarial Robustness of Supervised Sparse Coding

Jeremias Sulam, Ramchandran Muthukumar, Raman Arora

Keywords Paper

0

0

0

0

3:08

18/07/2021

Meta-Cal: Well-controlled Post-hoc Calibration by Ranking

Xingchen Ma, Matthew B Blaschko

Keywords Paper

Algorithms, Supervised Learning

0

0

0

0

4:28

06/12/2021

Structured Dropout Variational Inference for Bayesian Neural Networks

Son Nguyen, Duong Nguyen, Khai Nguyen and
Khoat Than, Hung Bui, Nhat Ho

Keywords Paper

deep learning, generative model

0

0

0

0

11:28

03/05/2021

Modeling the Second Player in Distributionally Robust Optimization

Paul Michel, Tatsunori Hashimoto, Graham Neubig

Keywords Paper

adversarial learning, deep learning, robustness, distributionally robust optimization

0

0

0

0

5:09

18/07/2021

Multiplicative Noise and Heavy Tails in Stochastic Optimization

Liam Hodgkinson, Michael Mahoney

Keywords Paper

Optimization, Stochastic Optimization

0

0

0

0

5:16

06/12/2021

Loss function based second-order Jensen inequality and its application to particle variational inference

Futoshi Futami, Tomoharu Iwata, naonori ueda and
Issei Sato, Masashi Sugiyama

Keywords Paper

optimization, generative model

0

0

0

0

14:09

06/12/2020

Robust, Accurate Stochastic Optimization for Variational Inference

Akash Kumar Dhaka, Alejandro Catalina, Michael Andersen and
Måns Magnusson, Jonathan Huggins, Aki Vehtari

Keywords Paper

0

0

0

0

3:23

02/02/2021

Counterfactual Explanations for Oblique Decision Trees:Exact, Efficient Algorithms

Miguel Á. Carreira-Perpiñán, Suryabhan Singh Hada

Keywords Paper

0

0

0

0

16:16

18/07/2021

DORO: Distributional and Outlier Robust Optimization

Runtian Zhai, Chen Dan, Zico Kolter, Pradeep Ravikumar

Keywords Paper

Probabilistic Methods, Robust statistics

0

0

0

1

5:06

18/07/2021

Nondeterminism and Instability in Neural Network Optimization

Cecilia Summers, Michael J Dinneen

Keywords Paper

Deep Learning, Optimization for Deep Networks

0

0

0

0

5:12

06/12/2020

Goal-directed Generation of Discrete Structures with Conditional Generative Models

Amina Mollaysa, Brooks Paige, Alexandros Kalousis

Keywords Paper

0

0

0

0

3:10

18/07/2021

Density Constrained Reinforcement Learning

Zengyi Qin, Yuxiao Chen, Chuchu Fan

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

4:50