State-Aware Value Function Approximation with Attention Mechanism for Restless Multi-armed Bandits

19/08/2021

State-Aware Value Function Approximation with Attention Mechanism for Restless Multi-armed Bandits

Shuang Wu, Jingyu Zhao, Guangjian Tian, Jun Wang

Keywords: Agent-based and Multi-agent Systems, Multi-agent Planning, Resource Allocation, Planning and Scheduling, Markov Decisions Processes

Abstract Paper Similar Papers

Abstract: The restless multi-armed bandit (RMAB) problem is a generalization of the multi-armed bandit with non-stationary rewards. Its optimal solution is intractable due to exponentially large state and action spaces with respect to the number of arms. Existing approximation approaches, e.g., Whittle's index policy, have difficulty in capturing either temporal or spatial factors such as impacts from other arms. We propose considering both factors using the attention mechanism, which has achieved great success in deep learning. Our state-aware value function approximation solution comprises an attention-based value function approximator and a Bellman equation solver. The attention-based coordination module capture both spatial and temporal factors for arm coordination. The Bellman equation solver utilizes the decoupling structure of RMABs to acquire solutions with significantly reduced computation overheads. In particular, the time complexity of our approximation is linear in the number of arms. Finally, we illustrate the effectiveness and investigate the properties of our proposed method with numerical experiments.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at IJCAI 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

12/07/2020

Structure Adaptive Algorithms for Stochastic Bandits

Rémy Degenne, Han Shao, Wouter Koolen

Keywords Paper

Online Learning, Active Learning, and Bandits

0

0

0

0

16:05

18/07/2021

Combinatorial Blocking Bandits with Stochastic Delays

Alexia Atsidakou, Orestis Papadigenopoulos, Soumya Basu and
Constantine Caramanis, Sanjay Shakkottai

Keywords Paper

Reinforcement Learning and Planning, Bandits

0

0

0

0

5:12

06/12/2021

A Closer Look at the Worst-case Behavior of Multi-armed Bandit Algorithms

Anand Kalvit, Assaf Zeevi

Keywords Paper

bandits

0

0

0

0

15:13

26/08/2020

A Novel Confidence-Based Algorithm for Structured Bandits

Andrea Tirinzoni, Alessandro Lazaric, Marcello Restelli

Keywords Paper

0

0

0

0

12:17

06/12/2021

Stochastic bandits with groups of similar arms.

Fabien Pesquerel, Hassan SABER, Odalric-Ambrym Maillard

Keywords Paper

optimization, generative model, bandits

0

0

0

0

13:22

06/12/2020

An Empirical Process Approach to the Union Bound: Practical Algorithms for Combinatorial and Linear Bandits

Julian Katz-Samuels, Lalit Jain, zohar karnin, Kevin Jamieson

Keywords Paper

0

0

0

0

3:20

06/12/2021

A unified framework for bandit multiple testing

Ziyu Xu, Ruodu Wang, Aaditya Ramdas

Keywords Paper

theory, reinforcement learning and planning, bandits

0

0

0

0

13:39

06/12/2021

Recurrent Submodular Welfare and Matroid Blocking Semi-Bandits

Orestis Papadigenopoulos, Constantine Caramanis

Keywords Paper

bandits

0

0

0

0

12:28

06/12/2020

On Regret with Multiple Best Arms

Yinglun Zhu, Robert Nowak

Keywords Paper

0

0

0

0

3:22

06/12/2021

Bandits with many optimal arms

Rianne de Heide, James Cheshire, Pierre Ménard, Alexandra Carpentier

Keywords Paper

bandits

0

0

0

0

12:23

06/12/2021

Fast Minimum-norm Adversarial Attacks through Adaptive Norm Constraints

Maura Pintor, Fabio Roli, Wieland Brendel, Battista Biggio

Keywords Paper

optimization, machine learning, robustness, adversarial robustness and security, vision

0

0

0

0

11:35

06/12/2021

Fast Algorithms for $L_\infty$-constrained S-rectangular Robust MDPs

Bahram Behzadian, Marek Petrik, Chin Pang Ho

Keywords Paper

optimization, reinforcement learning and planning

0

0

0

0

6:14

18/07/2021

Robust Pure Exploration in Linear Bandits with Limited Budget

Ayya Alieva, Ashok Cutkosky, Abhimanyu Das

Keywords Paper

Algorithms, Adversarial Learning, Algorithms, Unsupervised Learning, Reinforcement Learning and Planning, Bandits

0

0

0

0

6:02

06/12/2021

Pure Exploration in Kernel and Neural Bandits

Yinglun Zhu, Dongruo Zhou, Ruoxi Jiang and
Quanquan Gu, Rebecca Willett, Robert Nowak

Keywords Paper

theory, deep learning, reinforcement learning and planning, bandits, representation learning

0

0

0

0

14:47

02/02/2021

Robust Finite-State Controllers for Uncertain POMDPs

Murat Cubuktepe, Nils Jansen, Sebastian Junges and
Ahmadreza Marandi, Marnix Suilen, Ufuk Topcu

Keywords Paper

0

0

0

0

16:50

18/07/2021

Resource Allocation in Multi-armed Bandit Exploration: Overcoming Sublinear Scaling with Adaptive Parallelism

Brijen Thananjeyan, Kirthevasan Kandasamy, Ion Stoica and
Michael Jordan, Ken Goldberg, Joseph E Gonzalez

Keywords Paper

Reinforcement Learning and Planning, Bandits

0

0

0

0

20:41

06/12/2021

Dealing With Misspecification In Fixed-Confidence Linear Top-m Identification

Clémence Réda, Andrea Tirinzoni, Rémy Degenne

Keywords Paper

theory, reinforcement learning and planning, bandits

0

0

0

0

14:14

06/12/2020

Interior Point Solving for LP-based prediction+optimisation

Jayanta Mandi, Tias Guns

Keywords Paper

0

0

0

1

3:28

02/02/2021

Disposable Linear Bandits for Online Recommendations

Melda Korkut, Andrew Li

Keywords Paper

0

0

0

0

17:20

09/07/2020

Estimating Principal Components under Adversarial Perturbations

Pranjal Awasthi, Xue Chen, Aravindan Vijayaraghavan

Keywords Paper

Unsupervised and semi-supervised learning, Adversarial learning and robustness

0

0

0

0

15:40

02/02/2021

Robustness Guarantees for Mode Estimation with an Application to Bandits

Aldo Pacchiano, Heinrich Jiang, Michael I. Jordan

Keywords Paper

0

0

0

0

17:04

13/04/2021

Multi-fidelity high-order gaussian processes for physical simulation

Zheng Wang, Wei Xing, Robert Kirby, Shandian Zhe

Keywords Paper

0

0

0

0

3:34

02/02/2021

Federated Multi-Armed Bandits

Chengshuai Shi, Cong Shen

Keywords Paper

0

0

0

0

15:26

09/07/2020

Tight Lower Bounds for Combinatorial Multi-Armed Bandits

Nadav Merlis, Shie Mannor

Keywords Paper

Bandit problems, Learning with algebraic or combinatorial structure

0

0

0

0

14:00

06/12/2020

Differentiable Expected Hypervolume Improvement for Parallel Multi-Objective Bayesian Optimization

Sam Daulton, Max Balandat, Eytan Bakshy

Keywords Paper

0

0

0

0

3:20

06/12/2020

Deep reconstruction of strange attractors from time series

William Gilpin

Keywords Paper

0

0

0

0

3:21

13/04/2021

Contextual blocking bandits

Soumya Basu, Orestis Papadigenopoulos, Constantine Caramanis, Sanjay Shakkottai

Keywords Paper

0

0

0

0

2:47

03/05/2021

FOCAL: Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization

Lanqing Li, Rui Yang, Dijun Luo

Keywords Paper

distance metric learning, offline/batch reinforcement learning, meta-reinforcement learning, contrastive learning, multi-task reinforcement learning

1

0

0

0

6:21

06/12/2021

Adversarial Attack Generation Empowered by Min-Max Optimization

Jingkang Wang, Tianyun Zhang, Sijia Liu and
Pin-Yu Chen, Jiacen Xu, Makan Fardad, Bo Li

Keywords Paper

optimization, robustness, adversarial robustness and security

0

0

0

0

15:11

12/07/2020

Adversarial Nonnegative Matrix Factorization

lei luo, yanfu Zhang, Heng Huang

Keywords Paper

Applications - Computer Vision

0

0

0

0

10:37

04/08/2021

Instance-Dependent Complexity of Contextual Bandits and Reinforcement Learning: A Disagreement-Based Perspective

Dylan Foster, Alexander Rakhlin, David Simchi-Levi, Yunzong Xu

Keywords Paper

0

0

0

0

16:53

06/12/2020

POMO: Policy Optimization with Multiple Optima for Reinforcement Learning

Yeong-Dae Kwon, Jinho Choo, Byoungjip Kim and
Iljoo Yoon, Youngjune Gwon, Seungjai Min

Keywords Paper

0

0

0

0

3:19

06/12/2021

Self-Adaptable Point Processes with Nonparametric Time Decays

Zhimeng Pan, Zheng Wang, Jeff M Phillips, Shandian Zhe

Keywords Paper

deep learning, kernel methods

0

0

0

0

10:01

06/12/2021

Online Multi-Armed Bandits with Adaptive Inference

Maria Dimakopoulou, Zhimei Ren, Zhengyuan Zhou

Keywords Paper

theory, reinforcement learning and planning, bandits, online learning, causality

0

0

0

0

17:11

06/12/2020

Unreasonable Effectiveness of Greedy Algorithms in Multi-Armed Bandit with Many Arms

Mohsen Bayati, Nima Hamidi, Ramesh Johari, Khashayar Khosravi

Keywords Paper

0

0

0

0

3:23

06/12/2020

Restless-UCB, an Efficient and Low-complexity Algorithm for Online Restless Bandits

Siwei Wang, Longbo Huang, John C. S. Lui

Keywords Paper

0

0

0

0

3:19

06/12/2020

Hamiltonian Monte Carlo using an adjoint-differentiated Laplace approximation: Bayesian inference for latent Gaussian models and beyond

Charles Margossian, Aki Vehtari, Daniel Simpson, Raj Agrawal

Keywords Paper

0

0

0

0

3:05

06/12/2021

Twice regularized MDPs and the equivalence between robustness and regularization

Esther Derman, Matthieu Geist, Shie Mannor

Keywords Paper

optimization, reinforcement learning and planning, robustness

0

0

0

0

14:19

06/12/2020

Batched Coarse Ranking in Multi-Armed Bandits

Nikolai Karpov, Qin Zhang

Keywords Paper

0

0

0

0

3:20

06/12/2020

Finite-Time Analysis of Round-Robin Kullback-Leibler Upper Confidence Bounds for Optimal Adaptive Allocation with Multiple Plays and Markovian Rewards

Vrettos Moulos

Keywords Paper

0

0

0

0

3:10