GenDICE: Generalized Offline Estimation of Stationary Values

Abstract: An important problem that arises in reinforcement learning and Monte Carlo methods is estimating quantities defined by the stationary distribution of a Markov chain. In many real-world applications, access to the underlying transition operator is limited to a fixed set of data that has already been collected, without additional interaction with the environment being available. We show that consistent estimation remains possible in this scenario, and that effective estimation can still be achieved in important applications. Our approach is based on estimating a ratio that corrects for the discrepancy between the stationary and empirical distributions, derived from fundamental properties of the stationary distribution, and exploiting constraint reformulations based on variational divergence minimization. The resulting algorithm, GenDICE, is straightforward and effective. We prove the consistency of the method under general conditions, provide a detailed error analysis, and demonstrate strong empirical performance on benchmark tasks, including off-line PageRank and off-policy policy evaluation.

12/07/2020

distance metric learning, offline/batch reinforcement learning, meta-reinforcement learning, contrastive learning, multi-task reinforcement learning

6:21

02/02/2021

GenDICE: Generalized Offline Estimation of Stationary Values

Ruiyi Zhang*, Bo Dai*, Lihong Li, Dale Schuurmans

Comments

Similar Papers

The continuous categorical: a novel simplex-valued exponential family

Elliott Gordon-Rodriguez, Gabriel Loaiza-Ganem, John Cunningham

Keywords Abstract Paper

Probabilistic Inference - Models and Probabilistic Programming

An Analysis of Constant Step Size SGD in the Non-convex Regime: Asymptotic Normality and Bias

Lu Yu, Krishnakumar Balasubramanian, Stanislav Volgushev, Murat Erdogdu

Keywords Abstract Paper

optimization, machine learning

Consistent Structured Prediction with Max-Min Margin Markov Networks

Alex Nowak, Francis Bach, Alessandro Rudi

Keywords Abstract Paper

Sequential, Network, and Time-Series Modeling

Transfer Learning via $\ell_1$ Regularization

Masaaki Takada, Hironori Fujisawa

Keywords Abstract Paper

Adaptive, Distribution-Free Prediction Intervals for Deep Networks

Danijel Kivaranovic, Kory D. Johnson, Hannes Leeb

Keywords Abstract Paper

Group testing and local search: is there a computational-statistical gap?

Fotis Iliopoulos, Ilias Zadik

Keywords Abstract Paper

A nonasymptotic law of iterated logarithm for general M-estimators

Nicolas Schreuder, Victor-Emmanuel Brunel, Arnak Dalalyan,

Keywords Abstract Paper

Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement Learning

Nathan Kallus, Angela Zhou

Keywords Abstract Paper

FOCAL: Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization

Lanqing Li, Rui Yang, Dijun Luo

Keywords Abstract Paper

distance metric learning, offline/batch reinforcement learning, meta-reinforcement learning, contrastive learning, multi-task reinforcement learning

Variance Penalized On-Policy and Off-Policy Actor-Critic

Arushi Jain, Gandharv Patil, Ayush Jain and Khimya Khetarpal, Doina Precup

Keywords Abstract Paper

Explicit Mean-Square Error Bounds for Monte-Carlo and Linear Stochastic Approximation

Shuhang Chen, Adithya Devraj, Ana Busic, Sean Meyn

Keywords Abstract Paper

Asymptotically Exact Error Characterization of Offline Policy Evaluation with Misspecified Linear Models

Kohei Miyaguchi

Keywords Abstract Paper

reinforcement learning and planning

Adversarially robust estimate and risk analysis in linear regression

Yue Xing, Ruizhi Zhang, Guang Cheng

Keywords Abstract Paper

Robust Unsupervised Learning via L-statistic Minimization

Andreas Maurer, Daniela Angela Parletta, Andrea Paudice, Massimiliano Pontil

Keywords Abstract Paper

Theory, Statistical Learning Theory

Batch Stationary Distribution Estimation

Junfeng Wen, Bo Dai, Lihong Li, Dale Schuurmans

Keywords Abstract Paper

Probabilistic Inference - Approximate, Monte Carlo, and Spectral Methods

Online Robust Reinforcement Learning with Model Uncertainty

Yue Wang, Shaofeng Zou

Keywords Abstract Paper

reinforcement learning and planning, robustness

Bayesian Optimization of Function Networks

Raul Astudillo, Peter Frazier

Keywords Abstract Paper

optimization, reinforcement learning and planning, kernel methods

A Theoretical Case Study of Structured Variational Inference for Community Detection

Mingzhang Yin, Y. X. Rachel Wang, Purnamrita Sarkar

Keywords Abstract Paper

Hamiltonian Monte Carlo using an adjoint-differentiated Laplace approximation: Bayesian inference for latent Gaussian models and beyond

Charles Margossian, Aki Vehtari, Daniel Simpson, Raj Agrawal

Keywords Abstract Paper

Finite Time Analysis of Linear Two-timescale Stochastic Approximation with Markovian Noise

Maksim Kaledin, Eric Moulines, Alexey Naumov and Vladislav Tadic, Hoi-To Wai

Keywords Abstract Paper

Stochastic optimization, Reinforcement learning

Linear Convergence of Gradient Methods for Estimating Structured Transition Matrices in High-dimensional Vector Autoregressive Models

Xiao Lv, Wei Cui, Yulong Liu

Keywords Abstract Paper

theory, optimization

Serving DNNs like Clockwork: Performance Predictability from the Bottom Up

Arpan Gujarati, Reza Karimi, Safya Alzayat and Wei Hao, Antoine Kaufmann, Ymir Vigfusson, Jonathan Mace

Ruiyi Zhang, Bo Dai, Lihong Li, Dale Schuurmans

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Arushi Jain, Gandharv Patil, Ayush Jain and
Khimya Khetarpal, Doina Precup

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Maksim Kaledin, Eric Moulines, Alexey Naumov and
Vladislav Tadic, Hoi-To Wai

Keywords Paper

Keywords Paper

Arpan Gujarati, Reza Karimi, Safya Alzayat and
Wei Hao, Antoine Kaufmann, Ymir Vigfusson, Jonathan Mace

Keywords Paper

Fan Bao, Guoqiang Wu, Chongxuan LI and
Jun Zhu, Bo Zhang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Kashif Rasul, Abdul-Saboor Sheikh, Ingmar Schuster and
Urs Bergmann, Roland Vollgraf

Keywords Paper

Mingrui Zhang, Zebang Shen, Aryan Mokhtari and
Hamed Hassani, Amin Karbasi

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper