Regret Analysis of Bandit Problems with Causal Background Knowledge

03/08/2020

Regret Analysis of Bandit Problems with Causal Background Knowledge

Yangyi Lu, Amirhossein Meisami, Ambuj Tewari, William Yan

Keywords:

Abstract Paper Similar Papers

Abstract: We study how to learn optimal interventions sequentially given causal information represented as a causal graph along with associated conditional distributions. Causal modeling is useful in real world problems like online advertisement where complex causal mechanisms underlie the relationship between interventions and outcomes. We propose two algorithms, causal upper confidence bound (C-UCB) and causal Thompson Sampling (C-TS), that enjoy improved cumulative regret bounds compared with algorithms that do not use causal information. We thus resolve an open problem posed by Lattimore et al. (2016). Further, we extend C-UCB and C-TS to the linear bandit setting and propose causal linear UCB (CL-UCB) and causal linear TS (CL-TS) algorithms. These algorithms enjoy a cumulative regret bound that only scales with the feature dimension. Our experiments show the benefit of using causal information. For example, we observe that even with a few hundreds of iterations, the regret of causal algorithms is less than that of standard algorithms by a factor of three. We also show that under certain causal structures, our algorithms scale better than the standard bandit algorithms as the number of interventions increases.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at UAI 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

13/04/2021

Budgeted and non-budgeted causal bandits

Vineet Nair, Vishakha Patil, Gaurav Sinha

Keywords Paper

0

0

0

0

3:02

06/12/2021

Bayesian decision-making under misspecified priors with applications to meta-learning

Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy and
Daniel Hsu, Thodoris Lykouris, Miro Dudik, Robert E Schapire

Keywords Paper

meta learning, bandits

0

0

0

0

14:58

02/02/2021

Learning from eXtreme Bandit Feedback

Romain Lopez, Inderjit S. Dhillon, Michael I. Jordan

Keywords Paper

0

0

0

0

19:29

19/08/2021

Thompson Sampling for Bandits with Clustered Arms

Emil Carlsson, Devdatt Dubhashi, Fredrik D. Johansson

Keywords Paper

Machine Learning, Online Learning, Learning Theory, Reinforcement Learning

0

0

0

0

14:27

12/07/2020

Thompson Sampling Algorithms for Mean-Variance Bandits

Qiuyu Zhu, Vincent Tan

Keywords Paper

Online Learning, Active Learning, and Bandits

0

0

0

0

14:31

04/08/2021

Efficient Bandit Convex Optimization: Beyond Linear Losses

Arun Sai Suggala, Pradeep Ravikumar, Praneeth Netrapalli

Keywords Paper

0

0

0

0

20:29

06/12/2021

Stochastic Online Linear Regression: the Forward Algorithm to Replace Ridge

Reda Ouhamma, Odalric-Ambrym Maillard, Vianney Perchet

Keywords Paper

robustness, bandits

0

0

0

0

11:30

06/12/2020

Latent Bandits Revisited

Joey Hong, Branislav Kveton, Manzil Zaheer and
Yinlam Chow, Amr Ahmed, Craig Boutilier

Keywords Paper

0

0

0

0

3:11

06/12/2020

Stage-wise Conservative Linear Bandits

Ahmadreza Moradipari, Christos Thrampoulidis, Mahnoosh Alizadeh

Keywords Paper

0

0

0

0

3:18

06/12/2020

Adaptive Importance Sampling for Finite-Sum Optimization and Sampling with Decreasing Step-Sizes

Ayoub El Hanchi, David Stephens

Keywords Paper

0

0

0

0

3:33

06/12/2021

Hybrid Regret Bounds for Combinatorial Semi-Bandits and Adversarial Linear Bandits

Shinji Ito

Keywords Paper

bandits

0

0

0

0

10:49

06/12/2020

Contextual Games: Multi-Agent Learning with Side Information

Pier Giuseppe Sessa, Ilija Bogunovic, Andreas Krause, Maryam Kamgarpour

Keywords Paper

0

0

0

0

3:30

22/09/2020

Doubly robust estimator for ranking metrics with post-click conversions

Yuta Saito

Keywords Paper

inverse propensity score., post-click conversions, ranking metrics, selection bias, doubly robust

0

0

0

0

3:19

04/08/2021

Adaptive Discretization for Adversarial Lipschitz Bandits

Chara Podimata, Alex Slivkins

Keywords Paper

0

0

0

0

18:13

13/04/2021

Experimental design for regret minimization in linear bandits

Andrew Wagenmaker, Julian Katz-Samuels, Kevin Jamieson

Keywords Paper

0

0

0

0

3:05

26/08/2020

Stochastic Linear Contextual Bandits with Diverse Contexts

Weiqiang Wu, Jing Yang, Cong Shen

Keywords Paper

0

0

0

0

15:23

13/04/2021

Stochastic bandits with linear constraints

Aldo Pacchiano, Mohammad Ghavamzadeh, Peter Bartlett, Heinrich Jiang

Keywords Paper

0

0

0

0

3:02

18/07/2021

Improved Corruption Robust Algorithms for Episodic Reinforcement Learning

Yifang Chen, Simon Du, Kevin Jamieson

Keywords Paper

, Optimization, Non-Convex Optimization, Theory, Online Learning Theory

0

0

0

0

5:20

26/04/2020

Causal Discovery with Reinforcement Learning

Shengyu Zhu, Ignavier Ng, Zhitang Chen

Keywords Paper

causal discovery, structure learning, reinforcement learning, directed acyclic graph

0

0

0

0

12:51

13/04/2021

Learning user preferences in non-stationary environments

Wasim Huleihel, Soumyabrata Pal, Ofer Shayevitz

Keywords Paper

0

0

0

0

3:14

06/12/2021

Optimal Gradient-based Algorithms for Non-concave Bandit Optimization

Baihe Huang, Kaixuan Huang, Sham Kakade and
Jason Lee, Qi Lei, Runzhe Wang, Jiaqi Yang

Keywords Paper

theory, deep learning, optimization, generative model, bandits

0

0

0

0

10:53

12/07/2020

Kernel Methods for Cooperative Multi-Agent Learning with Delays

Abhimanyu Dubey, Alex `Sandy' Pentland

Keywords Paper

Planning, Control, and Multiagent Learning

0

0

0

0

12:57

12/07/2020

Exploration Through Bias: Revisiting Biased Maximum Likelihood Estimation in Stochastic Multi-Armed Bandits

Xi Liu, Ping-Chun Hsieh, Yu Heng Hung and
Anirban Bhattacharya, P. Kumar

Keywords Paper

Online Learning, Active Learning, and Bandits

0

0

0

0

14:46

04/08/2021

Corruption-robust exploration in episodic reinforcement learning

Thodoris Lykouris, Max Simchowitz, Alex Slivkins, Wen Sun

Keywords Paper

0

0

0

0

18:27

12/07/2020

Meta-learning with Stochastic Linear Bandits

Leonardo Cella, Alessandro Lazaric, Massimiliano Pontil

Keywords Paper

Transfer, Multitask and Meta-learning

1

1

0

0

13:17

02/02/2021

Decentralized Multi-Agent Linear Bandits with Safety Constraints

Sanae Amani, Christos Thrampoulidis

Keywords Paper

0

0

0

0

19:13

06/12/2020

Temporal Variability in Implicit Online Learning

Nicolò Campolongo, Francesco Orabona

Keywords Paper

1

1

0

1

3:11

18/07/2021

Adapting to misspecification in contextual bandits with offline regression oracles

Sanath Kumar Krishnamurthy, Vitor Hadad, Susan Athey

Keywords Paper

Reinforcement Learning and Planning, Bandits

0

0

0

0

4:17

06/12/2021

Continuous Mean-Covariance Bandits

Yihan Du, Siwei Wang, Zhixuan Fang, Longbo Huang

Keywords Paper

bandits

0

0

0

0

11:33

02/02/2021

Reward-Biased Maximum Likelihood Estimation for Linear Stochastic Bandits

Yu-Heng Hung, Ping-Chun Hsieh, Xi Liu, P. R. Kumar

Keywords Paper

0

0

0

0

19:35

19/10/2020

U-rank: Utility-oriented learning to rank with implicit feedback

Xinyi Dai, Jiawei Hou, Qing Liu and
Yunjia Xi, Ruiming Tang, Weinan Zhang, Xiuqiang He, Jun Wang, Yong Yu

Keywords Paper

implicit feedback, learning to rank, utility maximization, position bias

0

0

0

0

9:07

04/08/2021

Parameter-Free Multi-Armed Bandit Algorithms with Hybrid Data-Dependent Regret Bounds

Shinji Ito

Keywords Paper

0

0

0

0

15:29

12/07/2020

Structured Linear Contextual Bandits: A Sharp and Geometric Smoothed Analysis

Vidyashankar Sivakumar, Steven Wu, Arindam Banerjee

Keywords Paper

Online Learning, Active Learning, and Bandits

0

0

0

0

17:56

06/12/2020

Locally Differentially Private (Contextual) Bandits Learning

Kai Zheng, Tianle Cai, Weiran Huang and
Zhenguo Li, Liwei Wang

Keywords Paper

Reinforcement Learning and Planning -> Markov Decision Processes; Reinforcement Learning and Planning -> Reinforcement Learning, Reinforcement Learning and Planning

0

0

0

0

3:03

03/08/2020

On the design of consequential ranking algorithms

Behzad Tabibian, Vicenç Gómez, Abir De and
Bernhard Schölkopf, Manuel Gomez Rodriguez

Keywords Paper

0

0

0

0

9:21

25/07/2020

Fairness-aware explainable recommendation over knowledge graphs

Zuohui Fu, Yikun Xian, Ruoyuan Gao and
Jieyu Zhao, Qiaoying Huang, Yingqiang Ge, Shuyuan Xu, Shijie Geng, Chirag Shah, Yongfeng Zhang, Gerard Melo

Keywords Paper

explainable recommendation, knowledge graphs, fairness

0

0

0

0

21:26

18/07/2021

Parametric Graph for Unimodal Ranking Bandit

CamilleS GAUTHIER, Romaric Gaudel, Elisa Fromont, Boammani Aser Lompo

Keywords Paper

Reinforcement Learning and Planning, Bandits

0

0

0

0

4:49

26/08/2020

Thompson Sampling for Linearly Constrained Bandits

Vidit Saxena, Joakim Jalden, Joseph Gonzalez

Keywords Paper

0

0

0

0

13:06

18/07/2021

Fairness of Exposure in Stochastic Bandits

Lequn Wang, Yiwei Bai, Wen Sun, Thorsten Joachims

Keywords Paper

Social Aspects of Machine Learning, Fairness, Accountability, and Transparency

0

0

0

0

5:20

19/10/2020

Tolerant markov boundary discovery for feature selection

Xingyu Wu, Bingbing Jiang, Yan Zhong, Huanhuan Chen

Keywords Paper

feature selection, markov boundary, kernel method

0

0

0

0

6:01