Minimax Weight and Q-Function Learning for Off-Policy Evaluation

Abstract: We provide theoretical investigations into off-policy evaluation in reinforcement learning using function approximators for (marginalized) importance weights and value functions. Our contributions include: (1) A new estimator, MWL, that directly estimates importance ratios over the state-action distributions, removing the reliance on knowledge of the behavior policy as in prior work (Liu et.al, 2018), (2) Another new estimator, MQL, obtained by swapping the roles of importance weights and value-functions in MWL. MQL has an intuitive interpretation of minimizing average Bellman errors and can be combined with MWL in a doubly robust manner, (3) Several additional results that offer further insights, including the sample complexities of MWL and MQL, their asymptotic optimality in the tabular setting, how the learned importance weights depend the choice of the discriminator class, and how our methods provide a unified view of some old and new algorithms in RL.

12/07/2020

Minimax Weight and Q-Function Learning for Off-Policy Evaluation

Masatoshi Uehara, Jiawei Huang, Nan Jiang

Comments

Similar Papers

Representations for Stable Off-Policy Reinforcement Learning

Dibya Ghosh, Marc Bellemare

Keywords Abstract Paper

Reinforcement Learning - General

Structured Dropout Variational Inference for Bayesian Neural Networks

Son Nguyen, Duong Nguyen, Khai Nguyen and Khoat Than, Hung Bui, Nhat Ho

Keywords Abstract Paper

deep learning, generative model

Towards Robust Bisimulation Metric Learning

Mete Kemertas, Tristan Aumentado-Armstrong

Keywords Abstract Paper

reinforcement learning and planning, robustness, representation learning

Variational Model-based Policy Optimization

Yinlam Chow, Brandon Cui, Moonkyung Ryu, Mohammad Ghavamzadeh

Keywords Abstract Paper

Machine Learning, Reinforcement Learning

Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning

Anuj Mahajan, Mikayel Samvelyan, Lei Mao and Viktor Makoviychuk, Animesh Garg, Jean Kossaifi, Shimon Whiteson, Yuke Zhu, Anima Anandkumar

Keywords Abstract Paper

Reinforcement Learning and Planning, Multi-Agent RL

Offline Reinforcement Learning with Fisher Divergence Critic Regularization

Ilya Kostrikov, Rob Fergus, Jonathan Tompson, Ofir Nachum

Keywords Abstract Paper

Reinforcement Learning and Planning, Deep RL

A Local Temporal Difference Code for Distributional Reinforcement Learning

Pablo Tano, Peter Dayan, Alexandre Pouget

Keywords Abstract Paper

A New Representation of Successor Features for Transfer across Dissimilar Environments

Majid Abdolshah, Hung Le, Thommen Karimpanal George and Sunil Gupta, Santu Rana, Svetha Venkatesh

Keywords Abstract Paper

Reinforcement Learning and Planning

Adversarially robust estimate and risk analysis in linear regression

Yue Xing, Ruizhi Zhang, Guang Cheng

Keywords Abstract Paper

Bayesian Distributional Policy Gradients

Luchen Li, A. Aldo Faisal

Keywords Abstract Paper

Bayesian Bellman Operators

Mattie Fellows, Kristian Hartikainen, Shimon Whiteson

Keywords Abstract Paper

reinforcement learning and planning

Logistic q-learning

Joan Bas-Serrano, Sebastian Curi, Andreas Krause, Gergely Neu

Keywords Abstract Paper

Boosting with Multiple Sources

Corinna Cortes, Mehryar Mohri, Dmitry Storcheus, Ananda Theertha Suresh

Keywords Abstract Paper

federated learning

CAQL: Continuous Action Q-Learning

Moonkyung Ryu, Yinlam Chow, Ross Anderson and Christian Tjandraatmadja, Craig Boutilier

Keywords Abstract Paper

Reinforcement learning (RL), DQN, Continuous control, Mixed-Integer Programming (MIP)

Active Learning of Continuous-time Bayesian Networks through Interventions

Dominik Linzner, Heinz Koeppl

Keywords Abstract Paper

Probabilistic Methods, Graphical Models

Regret Minimization Experience Replay in Off-Policy Reinforcement Learning

Xu-Hui Liu, Zhenghai Xue, Jingcheng Pang and Shengyi Jiang, Feng Xu, Yang Yu

Keywords Abstract Paper

theory, reinforcement learning and planning

Offline Reinforcement Learning as One Big Sequence Modeling Problem

Michael Janner, Qiyang Li, Sergey Levine

Keywords Abstract Paper

reinforcement learning and planning, transformers, language

Latent derivative bayesian last layer networks

Joe Watson, Jihao Andreas Lin, Pascal Klink and Joni Pajarinen, Jan Peters

Keywords Abstract Paper

Loss function based second-order Jensen inequality and its application to particle variational inference

Futoshi Futami, Tomoharu Iwata, naonori ueda and Issei Sato, Masashi Sugiyama

Keywords Abstract Paper

optimization, generative model

Independent Subspace Analysis for Unsupervised Learning of Disentangled Representations

Jan Stuehmer, Richard Turner, Sebastian Nowozin

Keywords Abstract Paper

Variance-Aware Off-Policy Evaluation with Linear Function Approximation

Yifei Min, Tianhao Wang, Dongruo Zhou, Quanquan Gu

Keywords Paper

Son Nguyen, Duong Nguyen, Khai Nguyen and
Khoat Than, Hung Bui, Nhat Ho

Keywords Paper

Keywords Paper

Keywords Paper

Anuj Mahajan, Mikayel Samvelyan, Lei Mao and
Viktor Makoviychuk, Animesh Garg, Jean Kossaifi, Shimon Whiteson, Yuke Zhu, Anima Anandkumar

Keywords Paper

Keywords Paper

Keywords Paper

Majid Abdolshah, Hung Le, Thommen Karimpanal George and
Sunil Gupta, Santu Rana, Svetha Venkatesh

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Moonkyung Ryu, Yinlam Chow, Ross Anderson and
Christian Tjandraatmadja, Craig Boutilier

Keywords Paper

Keywords Paper

Xu-Hui Liu, Zhenghai Xue, Jingcheng Pang and
Shengyi Jiang, Feng Xu, Yang Yu

Keywords Paper

Keywords Paper

Joe Watson, Jihao Andreas Lin, Pascal Klink and
Joni Pajarinen, Jan Peters

Keywords Paper

Futoshi Futami, Tomoharu Iwata, naonori ueda and
Issei Sato, Masashi Sugiyama

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Haotong Qin, Ruihao Gong, Xianglong Liu and
Mingzhu Shen, Ziran Wei, Fengwei Yu, Jingkuan Song

Keywords Paper

Tianhe Yu, Aviral Kumar, Rafael Rafailov and
Aravind Rajeswaran, Sergey Levine, Chelsea Finn

Keywords Paper

Fan Bao, Guoqiang Wu, Chongxuan LI and
Jun Zhu, Bo Zhang

Keywords Paper

Keywords Paper

Ba-Hien Tran, Simone Rossi, Dimitrios Milios and
Pietro Michiardi, Edwin Bonilla, Maurizio Filippone

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Ashok Cutkosky, Christoph Dann, Abhimanyu Das and
Claudio Gentile, Aldo Pacchiano, Manish Purohit

Keywords Paper

Keywords Paper