Dynamic Planning and Learning under Recovering Rewards

18/07/2021

Dynamic Planning and Learning under Recovering Rewards

David Simchi-Levi, Zeyu Zheng, Feng Zhu

Keywords: Reinforcement Learning and Planning, Bandits

Abstract Paper Similar Papers

Abstract: Motivated by emerging applications such as live-streaming e-commerce, promotions and recommendations, we introduce a general class of multi-armed bandit problems that have the following two features: (i) the decision maker can pull and collect rewards from at most $K$ out of $N$ different arms in each time period; (ii) the expected reward of an arm immediately drops after it is pulled, and then non-parametrically recovers as the idle time increases. With the objective of maximizing expected cumulative rewards over $T$ time periods, we propose, construct and prove performance guarantees for a class of ``Purely Periodic Policies''. For the offline problem when all model parameters are known, our proposed policy obtains an approximation ratio that is at the order of $1-\mathcal O(1/\sqrt{K})$, which is asymptotically optimal when $K$ grows to infinity. For the online problem when the model parameters are unknown and need to be learned, we design an Upper Confidence Bound (UCB) based policy that approximately has $\widetilde\mathcal O(N\sqrt{T})$ regret against the offline benchmark. Our framework and policy design may have the potential to be adapted into other offline planning and online learning applications with non-stationary and recovering rewards.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICML 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

13/04/2021

Contextual blocking bandits

Soumya Basu, Orestis Papadigenopoulos, Constantine Caramanis, Sanjay Shakkottai

Keywords Paper

0

0

0

0

2:47

18/07/2021

Combinatorial Blocking Bandits with Stochastic Delays

Alexia Atsidakou, Orestis Papadigenopoulos, Soumya Basu and
Constantine Caramanis, Sanjay Shakkottai

Keywords Paper

Reinforcement Learning and Planning, Bandits

0

0

0

0

5:12

26/08/2020

Contextual Combinatorial Volatile Multi-armed Bandit with Adaptive Discretization

Andi Nika, Sepehr Elahi, Cem Tekin

Keywords Paper

0

0

0

0

13:12

04/08/2021

Adaptivity in Adaptive Submodularity

Hossein Esfandiari, Amin Karbasi, Vahab Mirrokni

Keywords Paper

0

0

0

0

13:54

04/08/2021

Minimax Regret for Stochastic Shortest Path with Adversarial Costs and Known Transition

Liyu Chen, Haipeng Luo, Chen-Yu Wei

Keywords Paper

0

0

0

0

14:48

06/12/2021

Recurrent Submodular Welfare and Matroid Blocking Semi-Bandits

Orestis Papadigenopoulos, Constantine Caramanis

Keywords Paper

bandits

0

0

0

0

12:28

13/04/2021

Stochastic bandits with linear constraints

Aldo Pacchiano, Mohammad Ghavamzadeh, Peter Bartlett, Heinrich Jiang

Keywords Paper

0

0

0

0

3:02

26/08/2020

Stochastic Bandits with Delay-Dependent Payoffs

Leonardo Cella, Nicolò Cesa-Bianchi

Keywords Paper

0

0

0

0

14:50

06/12/2021

Cooperative Stochastic Bandits with Asynchronous Agents and Constrained Feedback

Lin Yang, Yu-Zhen Janice Chen, Stephen Pasteris and
Mohammad Hajiesmaili, John C. S. Lui, Don Towsley

Keywords Paper

bandits

0

0

0

0

12:07

06/12/2020

Stage-wise Conservative Linear Bandits

Ahmadreza Moradipari, Christos Thrampoulidis, Mahnoosh Alizadeh

Keywords Paper

0

0

0

0

3:18

18/07/2021

Finding the Stochastic Shortest Path with Low Regret: the Adversarial Cost and Unknown Transition Case

Liyu Chen, Haipeng Luo

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

5:08

06/12/2021

Accommodating Picky Customers: Regret Bound and Exploration Complexity for Multi-Objective Reinforcement Learning

Jingfeng Wu, Vladimir Braverman, Lin Yang

Keywords Paper

theory, reinforcement learning and planning

0

0

0

0

14:22

02/02/2021

Disposable Linear Bandits for Online Recommendations

Melda Korkut, Andrew Li

Keywords Paper

0

0

0

0

17:20

06/12/2021

Rebounding Bandits for Modeling Satiation Effects

Liu Leqi, Fatma Kilinc Karzan, Zachary Lipton, Alan Montgomery

Keywords Paper

bandits

0

0

0

0

13:49

06/12/2021

Efficient Online Estimation of Causal Effects by Deciding What to Observe

Shantanu Gupta, Zachary Lipton, David Childers

Keywords Paper

reinforcement learning and planning, graph learning, causality

0

0

0

0

14:18

26/08/2020

Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning

Ming Yin, Yu-Xiang Wang

Keywords Paper

0

0

0

0

14:17

06/12/2020

Finite-Time Analysis of Round-Robin Kullback-Leibler Upper Confidence Bounds for Optimal Adaptive Allocation with Multiple Plays and Markovian Rewards

Vrettos Moulos

Keywords Paper

0

0

0

0

3:10

06/12/2020

From Finite to Countable-Armed Bandits

Anand Kalvit, Assaf Zeevi

Keywords Paper

, Theory -> Control Theory

0

0

0

0

3:15

06/12/2021

Bandits with many optimal arms

Rianne de Heide, James Cheshire, Pierre Ménard, Alexandra Carpentier

Keywords Paper

bandits

0

0

0

0

12:23

06/12/2020

Unreasonable Effectiveness of Greedy Algorithms in Multi-Armed Bandit with Many Arms

Mohsen Bayati, Nima Hamidi, Ramesh Johari, Khashayar Khosravi

Keywords Paper

0

0

0

0

3:23

02/02/2021

A Primal-Dual Online Algorithm for Online Matching Problem in Dynamic Environments

Yu-Hang Zhou, Peng Hu, Chen Liang and
Huan Xu, Guangda Huzhang, Yinfu Feng, Qing Da, Xinshang Wang, An-Xiang Zeng

Keywords Paper

0

0

0

0

18:32

06/12/2021

Stochastic bandits with groups of similar arms.

Fabien Pesquerel, Hassan SABER, Odalric-Ambrym Maillard

Keywords Paper

optimization, generative model, bandits

0

0

0

0

13:22

18/07/2021

Incentivized Bandit Learning with Self-Reinforcing User Preferences

Tianchen Zhou, Jia Liu, Chaosheng Dong, jingyuan deng

Keywords Paper

Reinforcement Learning and Planning, Bandits

0

0

0

0

4:46

18/07/2021

Resource Allocation in Multi-armed Bandit Exploration: Overcoming Sublinear Scaling with Adaptive Parallelism

Brijen Thananjeyan, Kirthevasan Kandasamy, Ion Stoica and
Michael Jordan, Ken Goldberg, Joseph E Gonzalez

Keywords Paper

Reinforcement Learning and Planning, Bandits

0

0

0

0

20:41

06/12/2021

Bayesian decision-making under misspecified priors with applications to meta-learning

Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy and
Daniel Hsu, Thodoris Lykouris, Miro Dudik, Robert E Schapire

Keywords Paper

meta learning, bandits

0

0

0

0

14:58

06/12/2021

An Efficient Pessimistic-Optimistic Algorithm for Stochastic Linear Bandits with General Constraints

Xin Liu, Bin Li, Pengyi Shi, Lei Ying

Keywords Paper

optimization, bandits

0

0

0

0

12:44

26/08/2020

Budget-Constrained Bandits over General Cost and Reward Distributions

Semih Cayci, Atilla Eryilmaz, R Srikant

Keywords Paper

0

0

0

0

10:40

12/07/2020

Structure Adaptive Algorithms for Stochastic Bandits

Rémy Degenne, Han Shao, Wouter Koolen

Keywords Paper

Online Learning, Active Learning, and Bandits

0

0

0

0

16:05

06/12/2020

A Single Recipe for Online Submodular Maximization with Adversarial or Stochastic Constraints

Omid Sadeghi, Prasanna Raut, Maryam Fazel

Keywords Paper

0

0

0

0

3:18

13/04/2021

Multitask bandit learning through heterogeneous feedback aggregation

Zhi Wang, Chicheng Zhang, Manish Kumar Singh and
Laurel Riek, Kamalika Chaudhuri

Keywords Paper

0

0

0

0

3:07

02/02/2021

Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous Feedback

Siwei Wang, Haoyun Wang, Longbo Huang

Keywords Paper

0

0

0

0

19:29

06/12/2021

Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning

Gen Li, Laixi Shi, Yuxin Chen and
Yuantao Gu, Yuejie Chi

Keywords Paper

theory, reinforcement learning and planning

0

0

0

0

15:32

06/12/2021

Near-Optimal Offline Reinforcement Learning via Double Variance Reduction

Ming Yin, Yu Bai, Yu-Xiang Wang

Keywords Paper

theory, optimization, reinforcement learning and planning

0

0

0

0

8:57

06/12/2020

Adversarial Blocking Bandits

Nicholas Bishop, Hau Chan, Debmalya Mandal, Long Tran-Thanh

Keywords Paper

0

0

0

0

3:09

06/12/2021

Nearly Horizon-Free Offline Reinforcement Learning

Tongzheng Ren, Jialian Li, Bo Dai and
Simon Du, Sujay Sanghavi

Keywords Paper

theory, optimization, reinforcement learning and planning

0

0

0

0

8:44

09/07/2020

Smooth Contextual Bandits: Bridging the Parametric and Non-differentiable Regret Regimes

YICHUN HU, Nathan Kallus, Xiaojie Mao

Keywords Paper

Bandit problems,

0

0

0

0

14:35

06/12/2021

Variational Bayesian Optimistic Sampling

Brendan O'Donoghue, Tor Lattimore

Keywords Paper

optimization, reinforcement learning and planning, generative model, bandits, online learning

0

0

0

0

15:13

06/12/2021

Off-Policy Risk Assessment in Contextual Bandits

Audrey Huang, Liu Leqi, Zachary Lipton, Kamyar Azizzadenesheli

Keywords Paper

robustness, bandits

0

0

0

0

15:06

09/07/2020

Efficient and robust algorithms for adversarial linear contextual bandits

Gergely Neu, Julia Olkhovskaya

Keywords Paper

Bandit problems, Online learning

0

0

0

0

9:53

18/07/2021

Joint Online Learning and Decision-making via Dual Mirror Descent

Alfonso Lobos Ruiz, Paul Grigas, Zheng Wen

Keywords Paper

Deep Learning, Generative Models, Applications, Computer Vision; Applications, Visual Scene Analysis and Interpretation; Deep Learning, Adversarial Network, Algorithms, Online Learning Algorithms

0

0

0

0

5:15