A Maximum-Entropy Approach to Off-Policy Evaluation in Average-Reward MDPs

Abstract: This work focuses on off-policy evaluation (OPE) with function approximation in infinite-horizon undiscounted Markov decision processes (MDPs). For MDPs that are ergodic and linear (i.e. where rewards and dynamics are linear in some known features), we provide the first finite-sample OPE error bound, extending the existing results beyond the episodic and discounted cases. In a more general setting, when the feature dynamics are approximately linear and for arbitrary rewards, we propose a new approach for estimating stationary distributions with function approximation. We formulate this problem as finding the maximum-entropy distribution subject to matching feature expectations under empirical dynamics. We show that this results in an exponential-family distribution whose sufficient statistics are the features, paralleling maximum-entropy approaches in supervised learning. We demonstrate the effectiveness of the proposed OPE approaches in multiple environments.

06/12/2020

A Maximum-Entropy Approach to Off-Policy Evaluation in Average-Reward MDPs

Nevena Lazic, Dong Yin, Mehrdad Farajtabar, Nir Levine, DILAN Gorur, Chris Harris, Dale Schuurmans

Comments

Similar Papers

Distributionally Robust Federated Averaging

Yuyang Deng, Mohammad Mahdi Kamani, Mehrdad Mahdavi

Keywords Abstract Paper

Relative Deviation Margin Bounds

Corinna Cortes, Mehryar Mohri, Ananda Theertha Suresh

Keywords Abstract Paper

Theory, Models of Learning and Generalization

Fast Rates for Structured Prediction

Vivien A Cabannnes, Francis Bach, Alessandro Rudi

Keywords Abstract Paper

Estimating weighted areas under the ROC curve

Andreas Maurer, Massimiliano Pontil

Keywords Abstract Paper

Kernel Conditional Density Operators

Ingmar Schuster, Mattes Mollenhauer, Stefan Klus, Krikamol Muandet

Keywords Abstract Paper

Divergence-Based Motivation for Online EM and Combining Hidden Variable Models

Ehsan Amid, Manfred K. Warmuth

Keywords Abstract Paper

Global optimality of softmax policy gradient with single hidden layer neural networks in the mean-field regime

Andrea Agazzi, Jianfeng Lu

Keywords Abstract Paper

policy gradient, mean-field dynamics, entropy regularization, neural networks

Gradient Descent Maximizes the Margin of Homogeneous Neural Networks

Kaifeng Lyu, Jian Li

Keywords Abstract Paper

margin, homogeneous, gradient descent

A Contour Stochastic Gradient Langevin Dynamics Algorithm for Simulations of Multi-modal Distributions

Wei Deng, Guang Lin, Faming Liang

Keywords Abstract Paper

Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality

Tengyu Xu, Zhuoran Yang, Zhaoran Wang, Yingbin LIANG

Keywords Abstract Paper

Theory, RL, Decisions and Control Theory

Slice Sampling Reparameterization Gradients

David M Zoltowski, Diana Cai, Ryan Adams

Keywords Abstract Paper

optimization, machine learning, generative model

Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement Learning

Nathan Kallus, Angela Zhou

Keywords Abstract Paper

Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation

Yaqi Duan, Zeyu Jia, Mengdi Wang

Keywords Abstract Paper

Learning Theory

Learning prediction intervals for regression: Generalization and calibration

Haoxian Chen, Ziyi Huang, Henry Lam and Huajie Qian, Haofeng Zhang

Keywords Abstract Paper

Optimal Rates for Averaged Stochastic Gradient Descent under Neural Tangent Kernel Regime

Atsushi Nitanda, Taiji Suzuki

Keywords Abstract Paper

stochastic gradient descent, neural tangent kernel, over-parameterization, two-layer neural network

Towards a Unified Information-Theoretic Framework for Generalization

Mahdi Haghifam, Gintare Karolina Dziugaite, Shay Moran, Dan Roy

Keywords Abstract Paper

graph learning

On Convergence of Gradient Expected Sarsa(λ)

Long Yang, Gang Zheng, Yu Zhang and Qian Zheng, Pengfei Li, Gang Pan

Keywords Abstract Paper

Learning infinite-horizon average-reward MDPs with linear function approximation

Chen-Yu Wei, Mehdi Jafarnia Jahromi, Haipeng Luo, Rahul Jain

Keywords Abstract Paper

On the Global Convergence Rates of Softmax Policy Gradient Methods

Jincheng Mei, Chenjun Xiao, Csaba Szepesvari, Dale Schuurmans

Keywords Abstract Paper

Reinforcement Learning - Theory

Logistic q-learning

Joan Bas-Serrano, Sebastian Curi, Andreas Krause, Gergely Neu

Keywords Abstract Paper

Sparse and Continuous Attention Mechanisms

André Martins, António Farinhas, Marcos Treviso and Vlad Niculae, Pedro Aguiar, Mario Figueiredo

Keywords Abstract Paper

Finite-Time Error Bounds for Biased Stochastic Approximation with Applications to Q-Learning

Gang Wang, Georgios B. Giannakis

Keywords Abstract Paper

Fourier Sparse Leverage Scores and Approximate Kernel Learning

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Haoxian Chen, Ziyi Huang, Henry Lam and
Huajie Qian, Haofeng Zhang

Keywords Paper

Keywords Paper

Keywords Paper

Long Yang, Gang Zheng, Yu Zhang and
Qian Zheng, Pengfei Li, Gang Pan

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

André Martins, António Farinhas, Marcos Treviso and
Vlad Niculae, Pedro Aguiar, Mario Figueiredo

Keywords Paper

Keywords Paper

Keywords Paper

Richard Kurle, Syama Sundar Rangapuram, Emmanuel de Bézenac and
Stephan Günnemann, Jan Gasthaus

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Futoshi Futami, Tomoharu Iwata, naonori ueda and
Issei Sato, Masashi Sugiyama

Keywords Paper

Ping Zhang, Rishabh Iyer, Ashish Tendulkar and
Gaurav Aggarwal, Abir De

Keywords Paper

Keywords Paper

Alain Durmus, Eric Moulines, Alexey Naumov and
Sergey Samsonov, Hoi-To Wai

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper