A One-Size-Fits-All Solution to Conservative Bandit Problems

02/02/2021

A One-Size-Fits-All Solution to Conservative Bandit Problems

Yihan Du, Siwei Wang, Longbo Huang

Keywords:

Abstract Paper Similar Papers

Abstract: In this paper, we study a family of conservative bandit problems (CBPs) with sample-path reward constraints, i.e., the learner's reward performance must be at least as well as a given baseline at any time. We propose a One-Size-Fits-All solution to CBPs and present its applications to three encompassed problems, i.e. conservative multi-armed bandits (CMAB), conservative linear bandits (CLB) and conservative contextual combinatorial bandits (CCCB). Different from previous works which consider high probability constraints on the expected reward, we focus on a sample-path constraint on the actually received reward, and achieve better theoretical guarantees (T-independent additive regrets instead of T-dependent) and empirical performance. Furthermore, we extend the results and consider a novel conservative mean-variance bandit problem (MV-CBP), which measures the learning performance with both the expected reward and variability. For this extended problem, we provide a novel algorithm with O(1/T) normalized additive regrets (T-independent in the cumulative form) and validate this result through empirical evaluation.

The video of this talk cannot be embedded. You can watch it here:

https://slideslive.com/38948214

(Link will open in new window)

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at AAAI 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2020

Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping

Yujing Hu, Weixun Wang, Hangtian Jia and
Yixiang Wang, Yingfeng Chen, Jianye Hao, Feng Wu, Changjie Fan

Keywords Paper

0

0

0

0

3:20

06/12/2021

Explicable Reward Design for Reinforcement Learning Agents

Rati Devidze, Goran Radanovic, Parameswaran Kamalaruban, Adish Singla

Keywords Paper

optimization, reinforcement learning and planning, interpretability

0

0

0

0

4:10

18/07/2021

On-Policy Deep Reinforcement Learning for the Average-Reward Criterion

Yiming Zhang, Keith Ross

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

5:14

26/04/2020

Observational Overfitting in Reinforcement Learning

Xingyou Song, Yiding Jiang, Stephen Tu and
Yilun Du, Behnam Neyshabur

Keywords Paper

observational, overfitting, reinforcement, learning, generalization, implicit, regularization, overparametrization

0

0

0

0

4:52

06/12/2021

Information Directed Reward Learning for Reinforcement Learning

David Lindner, Matteo Turchetta, Sebastian Tschiatschek and
Kamil Ciosek, Andreas Krause

Keywords Paper

reinforcement learning and planning, active learning

0

0

0

0

11:47

12/07/2020

Learning Fair Policies in Multi-Objective (Deep) Reinforcement Learning with Average and Discounted Rewards

Umer Siddique, Paul Weng, Matthieu Zimmer

Keywords Paper

Reinforcement Learning - Deep RL

0

0

0

0

15:17

06/12/2020

Stage-wise Conservative Linear Bandits

Ahmadreza Moradipari, Christos Thrampoulidis, Mahnoosh Alizadeh

Keywords Paper

0

0

0

0

3:18

26/04/2020

Maximum Likelihood Constraint Inference for Inverse Reinforcement Learning

Dexter R.R. Scobee, S. Shankar Sastry

Keywords Paper

learning from demonstration, inverse reinforcement learning, constraint inference

0

0

0

0

5:19

06/12/2021

Distributional Reinforcement Learning for Multi-Dimensional Reward Functions

Pushi Zhang, Xiaoyu Chen, Li Zhao and
Wei Xiong, Tao Qin, Tie-Yan Liu

Keywords Paper

reinforcement learning and planning

0

0

0

0

8:30

02/02/2021

Reinforcement Learning with Trajectory Feedback

Yonathan Efroni, Nadav Merlis, Shie Mannor

Keywords Paper

0

0

0

0

14:17

19/08/2021

Mean Field Equilibrium in Multi-Armed Bandit Game with Continuous Reward

Xiong Wang, Riheng Jia

Keywords Paper

Machine Learning, Online Learning, Algorithmic Game Theory, Multi-agent Learning

0

0

0

0

10:19

18/07/2021

Model-based Reinforcement Learning for Continuous Control with Posterior Sampling

Ying Fan, Yifei Ming

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

18:34

13/04/2021

No-regret reinforcement learning with heavy-tailed rewards

Vincent Zhuang, Yanan Sui

Keywords Paper

0

0

0

0

2:49

13/04/2021

Provably efficient safe exploration via primal-dual policy optimization

Dongsheng Ding, Xiaohan Wei, Zhuoran Yang and
Zhaoran Wang, Mihailo Jovanovic

Keywords Paper

0

0

0

0

3:07

06/12/2021

Continuous Mean-Covariance Bandits

Yihan Du, Siwei Wang, Zhixuan Fang, Longbo Huang

Keywords Paper

bandits

0

0

0

0

11:33

06/12/2020

Latent Bandits Revisited

Joey Hong, Branislav Kveton, Manzil Zaheer and
Yinlam Chow, Amr Ahmed, Craig Boutilier

Keywords Paper

0

0

0

0

3:11

12/07/2020

Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism

Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu

Keywords Paper

Online Learning, Active Learning, and Bandits

0

0

0

0

13:26

06/12/2021

Reward is enough for convex MDPs

Tom Zahavy, Brendan O'Donoghue, Guillaume Desjardins, Satinder Singh

Keywords Paper

reinforcement learning and planning

0

0

0

0

14:12

13/04/2021

Regularized policies are reward robust

Hisham Husain, Kamil Ciosek, Ryota Tomioka

Keywords Paper

0

0

0

0

2:21

12/07/2020

Minimax Weight and Q-Function Learning for Off-Policy Evaluation

Masatoshi Uehara, Jiawei Huang, Nan Jiang

Keywords Paper

Reinforcement Learning - Theory

0

0

0

0

14:20

18/07/2021

Reward Identification in Inverse Reinforcement Learning

Kuno Kim, Shivam Garg, Kiran Shiragur, Stefano Ermon

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

5:18

02/02/2021

Sequential Generative Exploration Model for Partially Observable Reinforcement Learning

Haiyan Yin, Jianda Chen, Sinno Jialin Pan, Sebastian Tschiatschek

Keywords Paper

0

0

0

0

14:40

19/04/2021

Exploring supervised and unsupervised rewards in machine translation

Julia Ive, Zixu Wang, Marina Fomicheva, Lucia Specia

Keywords Paper

0

0

0

0

10:52

06/12/2021

Adversarial Intrinsic Motivation for Reinforcement Learning

Ishan Durugkar, Mauricio Tec, Scott Niekum, Peter Stone

Keywords Paper

reinforcement learning and planning, generative model

0

0

0

0

13:11

06/12/2020

Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes

Yi Tian, Jian Qian, Suvrit Sra

Keywords Paper

0

0

0

0

3:14

02/02/2021

Robust Contextual Bandits via Bootstrapping

Qiao Tang, Hong Xie, Yunni Xia and
Jia Lee, Qingsheng Zhu

Keywords Paper

0

0

0

0

15:03

19/08/2021

Average-Reward Reinforcement Learning with Trust Region Methods

Xiaoteng Ma, Xiaohang Tang, Li Xia and
Jun Yang, Qianchuan Zhao

Keywords Paper

Machine Learning, Deep Reinforcement Learning, Reinforcement Learning, Markov Decision Processes

0

0

0

0

14:41

12/07/2020

What Can Learned Intrinsic Rewards Capture?

Zeyu Zheng, Junhyuk Oh, Matteo Hessel and
Zhongwen Xu, Manuel Kroiss, Hado van Hasselt, David Silver, Satinder Singh

Keywords Paper

Reinforcement Learning - Deep RL

0

0

0

0

14:47

06/12/2021

Identifiability in inverse reinforcement learning

Haoyang Cao, Samuel Cohen, Lukasz Szpruch

Keywords Paper

reinforcement learning and planning

0

0

0

0

15:07

02/02/2021

Temporal-Logic-Based Reward Shaping for Continuing Reinforcement Learning Tasks

Yuqian Jiang, Suda Bharadwaj, Bo Wu and
Rishi Shah, Ufuk Topcu, Peter Stone

Keywords Paper

0

0

0

0

15:40

18/07/2021

Provably Efficient Learning of Transferable Rewards

Alberto Maria Metelli, Giorgia Ramponi, Alessandro Concetti, Marcello Restelli

Keywords Paper

Optimization, Convex Optimization, Reinforcement Learning and Planning, Optimization, Combinatorial Optimization

0

0

0

0

5:14

02/02/2021

Bounded Risk-Sensitive Markov Games: Forward Policy Design and Inverse Reward Learning with Iterative Reasoning and Cumulative Prospect Theory

Ran Tian, Liting Sun, Masayoshi Tomizuka

Keywords Paper

0

0

0

0

16:28

02/02/2021

Combinatorial Pure Exploration with Full-Bandit or Partial Linear Feedback

Yihan Du, Yuko Kuroki, Wei Chen

Keywords Paper

0

0

0

0

17:13

18/07/2021

On Proximal Policy Optimization's Heavy-tailed Gradients

Saurabh Garg, Joshua Zhanson, Emilio Parisotto and
Adarsh Prasad, Zico Kolter, Zachary Lipton, Sivaraman Balakrishnan, Russ Salakhutdinov, Pradeep Ravikumar

Keywords Paper

Reinforcement Learning and Planning, Deep RL

0

0

0

0

5:34

02/02/2021

GaussianPath:A Bayesian Multi-Hop Reasoning Framework for Knowledge Graph Reasoning

Guojia Wan, Bo Du

Keywords Paper

0

0

0

0

13:52

06/12/2021

Inverse Reinforcement Learning in a Continuous State Space with Formal Guarantees

Gregory Dexter, Kevin Bello, Jean Honorio

Keywords Paper

theory, reinforcement learning and planning

0

0

0

0

14:49

26/08/2020

Nested-Wasserstein Self-Imitation Learning for Sequence Generation

Ruiyi Zhang, Changyou Chen, Zhe Gan and
Zheng Wen, Wenlin Wang, Lawrence Carin

Keywords Paper

0

0

0

0

11:18

06/12/2021

Bayesian decision-making under misspecified priors with applications to meta-learning

Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy and
Daniel Hsu, Thodoris Lykouris, Miro Dudik, Robert E Schapire

Keywords Paper

meta learning, bandits

0

0

0

0

14:58

03/05/2021

Regularized Inverse Reinforcement Learning

Wonseok Jeon, Chen-Yang Su, Paul Barde and
Thang Doan, Derek Nowrouzezahrai, Joelle Pineau

Keywords Paper

reinforcement learning, regularized markov decision processes, reward learning, inverse reinforcement learning

0

0

0

0

9:50

06/12/2021

On the Expressivity of Markov Reward

David Abel, Will Dabney, Anna Harutyunyan and
Mark Ho, Michael Littman, Doina Precup, Satinder Singh

Keywords Paper

reinforcement learning and planning

0

0

0

0

24:59