Linear bandits with Stochastic Delayed Feedback

Abstract: Stochastic linear bandits are a natural and well-studied model for structured exploration/exploitation problems and are widely used in applications such as on-line marketing and recommendation. One of the main challenges faced by practitioners hoping to apply existing algorithms is that usually the feedback is randomly delayed and delays are only partially observable. For example, while a purchase is usually observable some time after the display, the decision of not buying is never explicitly sent to the system. In other words, the learner only observes delayed positive events. We formalize this problem as a novel stochastic delayed linear bandit and propose OTFLinUCB and OTFLinTS, two computationally efficient algorithms able to integrate new information as it becomes available and to deal with the permanently censored feedback. We prove optimal O(d\sqrt{T}) bounds on the regret of the first algorithm and study the dependency on delay-dependent parameters. Our model, assumptions and results are validated by experiments on simulated and real data.

12/07/2020

distance metric learning, offline/batch reinforcement learning, meta-reinforcement learning, contrastive learning, multi-task reinforcement learning

6:21

04/08/2021

Linear bandits with Stochastic Delayed Feedback

Claire Vernade, Alexandra Carpentier, Tor Lattimore, Giovanni Zappella, Beyza Ermis, Michael Brueckner

Comments

Similar Papers

Non-Stationary Bandits with Intermediate Observations

Claire Vernade, András György, Timothy Mann

Keywords Abstract Paper

Online Learning, Active Learning, and Bandits

Deep bayesian bandits: Exploring in online personalized recommendations

Dalin Guo, Sofia Ira Ktena, Pranay Kumar Myana and Ferenc Huszar, Wenzhe Shi, Alykhan Tejani, Michael Kneier, Sourav Das

Keywords Abstract Paper

Contextual bandit, Recommender Systems, Algorithmic bias

Learning the truth from only one side of the story

Heinrich Jiang, Qijia Jiang, Aldo Pacchiano

Keywords Abstract Paper

FOCAL: Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization

Lanqing Li, Rui Yang, Dijun Luo

Keywords Abstract Paper

distance metric learning, offline/batch reinforcement learning, meta-reinforcement learning, contrastive learning, multi-task reinforcement learning

Efficient Bandit Convex Optimization: Beyond Linear Losses

Arun Sai Suggala, Pradeep Ravikumar, Praneeth Netrapalli

Keywords Abstract Paper

Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning

Sebastian Curi, Felix Berkenkamp, Andreas Krause

Keywords Abstract Paper

Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs

Chung-Wei Lee, Haipeng Luo, Chen-Yu Wei, Mengxiao Zhang

Keywords Abstract Paper

Experimental design for regret minimization in linear bandits

Andrew Wagenmaker, Julian Katz-Samuels, Kevin Jamieson

Keywords Abstract Paper

The Impact of Record Linkage on Learning from Feature Partitioned Data

Richard Nock, Stephen J Hardy, Wilko Henecka and Hamish Ivey-Law, Jakub Nabaglo, Giorgio Patrini, Guillaume Smith, Brian Thorne

Keywords Abstract Paper

Theory, Statistical Learning Theory

Federated Multi-Armed Bandits

Chengshuai Shi, Cong Shen

Keywords Abstract Paper

DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction

Aviral Kumar, Abhishek Gupta, Sergey Levine

Keywords Abstract Paper

Online Markov Decision Processes with Aggregate Bandit Feedback

Alon Cohen, Haim Kaplan, Tomer Koren, Yishay Mansour

Keywords Abstract Paper

A Distribution-dependent Analysis of Meta Learning

Mikhail Konobeev, Ilja Kuzborskij, Csaba Szepesvari

Keywords Abstract Paper

Theory, Statistical Learning Theory

MOPO: Model-based Offline Policy Optimization

Tianhe (Kevin) Yu, Garrett Thomas, Lantao Yu and Stefano Ermon, James Zou, Sergey Levine, Chelsea Finn, Tengyu Ma

Keywords Abstract Paper

Learning from eXtreme Bandit Feedback

Romain Lopez, Inderjit S. Dhillon, Michael I. Jordan

Keywords Abstract Paper

Acting in Delayed Environments with Non-Stationary Markov Policies

Esther Derman, Gal Dalal, Shie Mannor

Keywords Abstract Paper

reinforcement learning, delay

Tackling Instance-Dependent Label Noise via a Universal Probabilistic Model

Qizhou Wang, Bo Han, Tongliang Liu and Gang Niu, Jian Yang, Chen Gong

Keywords Abstract Paper

Online Sign Identification: Minimization of the Number of Errors in Thresholding Bandits

Reda Ouhamma, Rémy Degenne, Vianney Perchet, Pierre Gaillard

Keywords Abstract Paper

bandits, online learning

Achieving Near Instance-Optimality and Minimax-Optimality in Stochastic and Adversarial Linear Bandits Simultaneously

Chung-Wei Lee, Haipeng Luo, Chen-Yu Wei and Mengxiao Zhang, Xiaojin Zhang

Keywords Abstract Paper

Reinforcement Learning and Planning, Bandits

BooVAE: Boosting Approach for Continual Learning of VAE

Evgenii Egorov, Anna Kuzina, Evgeny Burnaev

Keywords Abstract Paper

self-supervised learning, generative model, continual learning

Tactical Optimism and Pessimism for Deep Reinforcement Learning

Ted Moskovitz, Jack Parker-Holder, Aldo Pacchiano and Michael Arbel, Michael Jordan

Keywords Abstract Paper

reinforcement learning and planning, bandits

Off-Policy Imitation Learning from Observations

Zhuangdi Zhu, Kaixiang Lin, Bo Dai, Jiayu Zhou

Keywords Abstract Paper

Keywords Paper

Dalin Guo, Sofia Ira Ktena, Pranay Kumar Myana and
Ferenc Huszar, Wenzhe Shi, Alykhan Tejani, Michael Kneier, Sourav Das

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Richard Nock, Stephen J Hardy, Wilko Henecka and
Hamish Ivey-Law, Jakub Nabaglo, Giorgio Patrini, Guillaume Smith, Brian Thorne

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Tianhe (Kevin) Yu, Garrett Thomas, Lantao Yu and
Stefano Ermon, James Zou, Sergey Levine, Chelsea Finn, Tengyu Ma

Keywords Paper

Keywords Paper

Keywords Paper

Qizhou Wang, Bo Han, Tongliang Liu and
Gang Niu, Jian Yang, Chen Gong

Keywords Paper

Keywords Paper

Chung-Wei Lee, Haipeng Luo, Chen-Yu Wei and
Mengxiao Zhang, Xiaojin Zhang

Keywords Paper

Keywords Paper

Ted Moskovitz, Jack Parker-Holder, Aldo Pacchiano and
Michael Arbel, Michael Jordan

Keywords Paper

Keywords Paper

Keywords Paper

Jonathan Lee, Aldo Pacchiano, Vidya Muthukumar and
Weihao Kong, Emma Brunskill

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Aurelien Bibaut, Nathan Kallus, Maria Dimakopoulou and
Antoine Chambaz, Mark van der Laan

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Dibya Ghosh, Abhishek Gupta, Ashwin D Reddy and
Justin Fu, Coline M Devin, Ben Eysenbach, Sergey Levine

Keywords Paper

Keywords Paper