Adaptive Experimental Design with Temporal Interference: A Maximum Likelihood Approach

06/12/2020

Adaptive Experimental Design with Temporal Interference: A Maximum Likelihood Approach

Peter W Glynn, Ramesh Johari, Mohammad Rasouli

Keywords:

Abstract Paper Similar Papers

Abstract: Suppose an online platform wants to compare a treatment and control policy (e.g., two different matching algorithms in a ridesharing system, or two different inventory management algorithms in an online retail site). Standard experimental approaches to this problem are biased (due to temporal interference between the policies), and not sample efficient. We study optimal experimental design for this setting. We view testing the two policies as the problem of estimating the steady state difference in reward between two unknown Markov chains (i.e., policies). We assume estimation of the steady state reward for each chain proceeds via nonparametric maximum likelihood, and search for consistent (i.e., asymptotically unbiased) experimental designs that are efficient (i.e., asymptotically minimum variance). Characterizing such designs is equivalent to a Markov decision problem with a minimum variance objective; such problems generally do not admit tractable solutions. Remarkably, in our setting, using a novel application of classical martingale analysis of Markov chains via Poisson's equation, we characterize efficient designs via a succinct convex optimization problem. We use this characterization to propose a consistent, efficient online experimental design that adaptively samples the two Markov chains.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2020

Algorithmic recourse under imperfect causal knowledge: a probabilistic approach

Amir Karimi, Julius von Kügelgen, Bernhard Schölkopf, Isabel Valera

Keywords Paper

0

0

0

0

3:55

06/12/2020

Leveraging Predictions in Smoothed Online Convex Optimization via Gradient-based Algorithms

Yingying Li, Na Li

Keywords Paper

Deep Learning -> Generative Models, Deep Learning -> Attention Models

0

0

0

0

3:19

06/12/2020

Stage-wise Conservative Linear Bandits

Ahmadreza Moradipari, Christos Thrampoulidis, Mahnoosh Alizadeh

Keywords Paper

0

0

0

0

3:18

06/12/2020

Correlation Robust Influence Maximization

Louis Chen, Divya Padmanabhan, Chee Chin Lim, Karthik Natarajan

Keywords Paper

0

0

0

0

3:19

06/12/2020

Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms

Dheeraj Nagaraj, Xian Wu, Guy Bresler and
Prateek Jain, Praneeth Netrapalli

Keywords Paper

0

0

0

0

3:34

06/12/2021

Bayesian decision-making under misspecified priors with applications to meta-learning

Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy and
Daniel Hsu, Thodoris Lykouris, Miro Dudik, Robert E Schapire

Keywords Paper

meta learning, bandits

0

0

0

0

14:58

18/07/2021

Beyond Variance Reduction: Understanding the True Impact of Baselines on Policy Optimization

Wes Chung, Valentin Thomas, Marlos C. Machado, Nicolas Le Roux

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

5:13

26/04/2020

Reanalysis of Variance Reduced Temporal Difference Learning

Tengyu Xu, Zhe Wang, Yi Zhou, Yingbin Liang

Keywords Paper

Reinforcement Learning, TD learning, Markovian sample, Variance Reduction

0

0

0

0

4:29

19/10/2020

Tolerant markov boundary discovery for feature selection

Xingyu Wu, Bingbing Jiang, Yan Zhong, Huanhuan Chen

Keywords Paper

feature selection, markov boundary, kernel method

0

0

0

0

6:01

02/02/2021

Disposable Linear Bandits for Online Recommendations

Melda Korkut, Andrew Li

Keywords Paper

0

0

0

0

17:20

06/12/2021

Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism

Paria Rashidinejad, Banghua Zhu, Cong Ma and
Jiantao Jiao, Stuart Russell

Keywords Paper

theory, reinforcement learning and planning, bandits

0

0

0

0

12:21

12/07/2020

Black-Box Variational Inference as a Parametric Approximation to Langevin Dynamics

Matthew Hoffman, Yian Ma

Keywords Paper

Probabilistic Inference - Approximate, Monte Carlo, and Spectral Methods

0

0

0

0

14:38

03/05/2021

Blending MPC & Value Function Approximation for Efficient Reinforcement Learning

Mohak Bhardwaj, Sanjiban Choudhury, Byron Boots

Keywords Paper

reinforcement learning, model-predictive control

0

0

0

0

5:09

13/04/2021

Stochastic bandits with linear constraints

Aldo Pacchiano, Mohammad Ghavamzadeh, Peter Bartlett, Heinrich Jiang

Keywords Paper

0

0

0

0

3:02

18/07/2021

Decision-Making Under Selective Labels: Optimal Finite-Domain Policies and Beyond

Dennis Wei

Keywords Paper

Applications, Computer Vision, Deep Learning, Adversarial Networks; Deep Learning, Generative Models, Social Aspects of Machine Learning

0

0

0

0

5:13

18/07/2021

Distributionally Robust Optimization with Markovian Data

Mengmeng Li, Tobias Sutter, Daniel Kuhn

Keywords Paper

Optimization

0

0

0

0

5:18

06/12/2021

Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning

Tengyang Xie, Nan Jiang, Huan Wang and
Caiming Xiong, Yu Bai

Keywords Paper

theory, optimization, reinforcement learning and planning

1

0

0

0

10:57

02/02/2021

A Primal-Dual Online Algorithm for Online Matching Problem in Dynamic Environments

Yu-Hang Zhou, Peng Hu, Chen Liang and
Huan Xu, Guangda Huzhang, Yinfu Feng, Qing Da, Xinshang Wang, An-Xiang Zeng

Keywords Paper

0

0

0

0

18:32

06/12/2021

Deep Bandits Show-Off: Simple and Efficient Exploration with Deep Networks

Rong Zhu, Mattia Rigotti

Keywords Paper

theory, deep learning, reinforcement learning and planning, bandits

0

0

0

0

8:45

06/12/2021

Variational Bayesian Optimistic Sampling

Brendan O'Donoghue, Tor Lattimore

Keywords Paper

optimization, reinforcement learning and planning, generative model, bandits, online learning

0

0

0

0

15:13

13/04/2021

Off-policy evaluation in infinite-horizon reinforcement learning with latent confounders

Andrew Bennett, Nathan Kallus, Lihong Li, Ali Mousavi

Keywords Paper

0

0

0

0

2:33

06/12/2020

Off-Policy Imitation Learning from Observations

Zhuangdi Zhu, Kaixiang Lin, Bo Dai, Jiayu Zhou

Keywords Paper

0

0

0

1

3:24

18/07/2021

OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation

Jongmin Lee, Wonseok Jeon, Byung-Jun Lee and
Joelle Pineau, Kee-Eung Kim

Keywords Paper

Reinforcement Learning and Planning

1

0

0

1

5:15

18/07/2021

Shortest-Path Constrained Reinforcement Learning for Sparse Reward Tasks

Sungryull Sohn, Sungtae Lee, Jongwook Choi and
Harm van Seijen, Mehdi Fatemi, Honglak Lee

Keywords Paper

Reinforcement Learning and Planning, Deep RL

0

0

0

0

5:19

02/02/2021

Loop Estimator for Discounted Values in Markov Reward Processes

Falcon Z. Dai, Matthew R. Walter

Keywords Paper

0

0

0

0

21:51

06/12/2021

COMBO: Conservative Offline Model-Based Policy Optimization

Tianhe Yu, Aviral Kumar, Rafael Rafailov and
Aravind Rajeswaran, Sergey Levine, Chelsea Finn

Keywords Paper

deep learning, optimization, reinforcement learning and planning

0

0

0

0

12:35

06/12/2021

Fast Doubly-Adaptive MCMC to Estimate the Gibbs Partition Function with Weak Mixing Time Bounds

Shahrzad Haddadan, Yue Zhuang, Cyrus Cousins, Eli Upfal

Keywords Paper

generative model, graph learning

0

0

0

0

14:01

18/07/2021

A Sharp Analysis of Model-based Reinforcement Learning with Self-Play

Qinghua Liu, Tiancheng Yu, Yu Bai, Chi Jin

Keywords Paper

Algorithms, Multitask and Transfer Learning, Algorithms, Meta-Learning; Applications, Object Recognition; Data, Challenges, Implementations, and Software, Benchmarks;, Theory, RL, Decisions and Control Theory

0

0

0

0

4:49

26/08/2020

A Hybrid Stochastic Policy Gradient Algorithm for Reinforcement Learning

Nhan Pham, Lam Nguyen, Dzung Phan and
PHUONG HA NGUYEN, Marten van Dijk, Quoc Tran-Dinh

Keywords Paper

0

0

0

0

15:49

18/07/2021

Online A-Optimal Design and Active Linear Regression

Xavier Fontaine, Pierre Perrault, Michal Valko, Vianney Perchet

Keywords Paper

Algorithms, Online Learning Algorithms

0

0

0

0

5:21

06/12/2021

Accommodating Picky Customers: Regret Bound and Exploration Complexity for Multi-Objective Reinforcement Learning

Jingfeng Wu, Vladimir Braverman, Lin Yang

Keywords Paper

theory, reinforcement learning and planning

0

0

0

0

14:22

13/04/2021

Direct loss minimization for sparse gaussian processes

Yadi Wei, Rishit Sheth, Roni Khardon

Keywords Paper

0

0

0

0

3:24

09/07/2020

Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes

Alekh Agarwal, Sham Kakade, Jason Lee, Gaurav Mahajan

Keywords Paper

Reinforcement learning, Non-convex optimization

0

0

0

0

11:00

06/12/2020

Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement Learning

Nathan Kallus, Angela Zhou

Keywords Paper

0

0

0

0

4:51

12/07/2020

Adversarial Risk via Optimal Transport and Optimal Couplings

Muni Sreenivas Pydi, Varun Jog

Keywords Paper

Adversarial Examples

0

0

0

0

12:34

19/08/2021

Hindsight Value Function for Variance Reduction in Stochastic Dynamic Environment

Jiaming Guo, Rui Zhang, Xishan Zhang and
Shaohui Peng, Qi Yi, Zidong Du, Xing Hu, Qi Guo, Yunji Chen

Keywords Paper

Machine Learning, Deep Learning, Deep Reinforcement Learning, Sequential Decision Making

0

0

0

0

14:36

06/12/2020

Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies

Nathan Kallus, Masatoshi Uehara

Keywords Paper

0

0

0

0

3:11

12/07/2020

Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation

Yaqi Duan, Zeyu Jia, Mengdi Wang

Keywords Paper

Learning Theory

0

0

0

0

14:10

18/07/2021

Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality

Tengyu Xu, Zhuoran Yang, Zhaoran Wang, Yingbin LIANG

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

4:23

06/12/2021

Recurrent Submodular Welfare and Matroid Blocking Semi-Bandits

Orestis Papadigenopoulos, Constantine Caramanis

Keywords Paper

bandits

0

0

0

0

12:28