Meta-Learning Effective Exploration Strategies for Contextual Bandits

02/02/2021

Meta-Learning Effective Exploration Strategies for Contextual Bandits

Amr Sharaf, Hal Daumé III

Keywords:

Abstract Paper Similar Papers

Abstract: In contextual bandits, an algorithm must choose actions given ob- served contexts, learning from a reward signal that is observed only for the action chosen. This leads to an exploration/exploitation trade-off: the algorithm must balance taking actions it already believes are good with taking new actions to potentially discover better choices. We develop a meta-learning algorithm, Mêlée, that learns an exploration policy based on simulated, synthetic con- textual bandit tasks. Mêlée uses imitation learning against these simulations to train an exploration policy that can be applied to true contextual bandit tasks at test time. We evaluate Mêlée on both a natural contextual bandit problem derived from a learning to rank dataset as well as hundreds of simulated contextual ban- dit problems derived from classification tasks. Mêlée outperforms seven strong baselines on most of these datasets by leveraging a rich feature representation for learning an exploration strategy.

The video of this talk cannot be embedded. You can watch it here:

https://slideslive.com/38948422

(Link will open in new window)

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at AAAI 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

16/11/2020

DORB: Dynamically Optimizing Multiple Rewards with Bandits

Ramakanth Pasunuru, Han Guo, Mohit Bansal

Keywords Paper

language tasks, optimization rewards, nlg tasks, question generation

0

0

0

0

11:34

26/04/2020

Never Give Up: Learning Directed Exploration Strategies

Adrià Puigdomènech Badia, Pablo Sprechmann, Alex Vitvitskyi and
Daniel Guo, Bilal Piot, Steven Kapturowski, Olivier Tieleman, Martin Arjovsky, Alexander Pritzel, Andrew Bolt, Charles Blundell

Keywords Paper

deep reinforcement learning, exploration, intrinsic motivation

0

0

0

0

5:30

04/08/2021

Double Explore-then-Commit: Asymptotic Optimality and Beyond

Tianyuan Jin, Pan Xu, Xiaokui Xiao, Quanquan Gu

Keywords Paper

0

0

0

0

13:57

06/12/2021

Continuous Mean-Covariance Bandits

Yihan Du, Siwei Wang, Zhixuan Fang, Longbo Huang

Keywords Paper

bandits

0

0

0

0

11:33

12/07/2020

Reward-Free Exploration for Reinforcement Learning

Chi Jin, Akshay Krishnamurthy, Max Simchowitz, Tiancheng Yu

Keywords Paper

Reinforcement Learning - Theory

0

0

0

0

14:37

03/05/2021

Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments

Daochen Zha, Wenye Ma, Lei Yuan and
Xia Hu, Ji Liu

Keywords Paper

Exploration, Reinforcement Learning, Self-Imitation, Generalization of Reinforcement Learning

0

0

0

0

5:10

18/07/2021

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Jin Zhang, Jianhao Wang, Hao Hu and
Tong Chen, Yingfeng Chen, Changjie Fan, Chongjie Zhang

Keywords Paper

Algorithms, Multitask, Transfer, and Meta Learning

0

0

0

0

4:19

18/07/2021

Policy Gradient Bayesian Robust Optimization for Imitation Learning

Zaynah Javed, Daniel Brown, Satvik Sharma and
Jerry Zhu, Ashwin Balakrishna, Marek Petrik, Anca Dragan, Ken Goldberg

Keywords Paper

Social Aspects of Machine Learning, AI Safety

0

0

0

1

5:10

06/12/2020

Latent Bandits Revisited

Joey Hong, Branislav Kveton, Manzil Zaheer and
Yinlam Chow, Amr Ahmed, Craig Boutilier

Keywords Paper

0

0

0

0

3:11

06/12/2021

Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games

Yu Bai, Chi Jin, Huan Wang, Caiming Xiong

Keywords Paper

theory, reinforcement learning and planning, bandits

0

0

0

0

12:14

02/02/2021

Exploration by Maximizing Renyi Entropy for Reward-Free RL Framework

Chuheng Zhang, Yuanying Cai, Longbo Huang, Jian Li

Keywords Paper

0

0

0

0

16:03

06/12/2020

Differentiable Meta-Learning of Bandit Policies

Craig Boutilier, Chih-wei Hsu, Branislav Kveton and
Martin Mladenov, Csaba Szepesvari, Manzil Zaheer

Keywords Paper

0

0

0

0

3:10

26/08/2020

Competing Bandits in Matching Markets

Lydia T. Liu, Horia Mania, Michael Jordan

Keywords Paper

0

0

0

0

15:55

22/09/2020

Deep bayesian bandits: Exploring in online personalized recommendations

Dalin Guo, Sofia Ira Ktena, Pranay Kumar Myana and
Ferenc Huszar, Wenzhe Shi, Alykhan Tejani, Michael Kneier, Sourav Das

Keywords Paper

Contextual bandit, Recommender Systems, Algorithmic bias

0

0

0

0

2:59

04/08/2021

Instance-Dependent Complexity of Contextual Bandits and Reinforcement Learning: A Disagreement-Based Perspective

Dylan Foster, Alexander Rakhlin, David Simchi-Levi, Yunzong Xu

Keywords Paper

0

0

0

0

16:53

12/07/2020

Learning Human Objectives by Evaluating Hypothetical Behavior

Siddharth Reddy, Anca Dragan, Sergey Levine and
Shane Legg, Jan Leike

Keywords Paper

Reinforcement Learning - Deep RL

0

0

0

0

10:21

06/12/2021

Episodic Multi-agent Reinforcement Learning with Curiosity-driven Exploration

Lulu Zheng, Jiarui Chen, Jianhao Wang and
Jiamin He, Yujing Hu, Yingfeng Chen, Changjie Fan, Yang Gao, Chongjie Zhang

Keywords Paper

reinforcement learning and planning

0

0

0

0

12:25

13/04/2021

Contextual blocking bandits

Soumya Basu, Orestis Papadigenopoulos, Constantine Caramanis, Sanjay Shakkottai

Keywords Paper

0

0

0

0

2:47

02/02/2021

Stochastic Graphical Bandits with Adversarial Corruptions

Shiyin Lu, Guanghui Wang, Lijun Zhang

Keywords Paper

0

0

0

0

17:05

06/12/2021

Offline Meta Reinforcement Learning -- Identifiability Challenges and Effective Data Collection Strategies

Ron Dorfman, Idan Shenfeld, Aviv Tamar

Keywords Paper

reinforcement learning and planning

0

0

0

0

14:44

06/12/2021

Local policy search with Bayesian optimization

Sarah Müller, Alexander von Rohr, Sebastian Trimpe

Keywords Paper

theory, optimization, reinforcement learning and planning, active learning

0

0

0

0

11:42

18/07/2021

Meta-Thompson Sampling

Branislav Kveton, Mikhail Konobeev, Manzil Zaheer and
Chih-wei Hsu, Martin Mladenov, Craig Boutilier, Csaba Szepesvari

Keywords Paper

Reinforcement Learning and Planning, Bandits

0

0

0

0

5:14

02/02/2021

Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous Feedback

Siwei Wang, Haoyun Wang, Longbo Huang

Keywords Paper

0

0

0

0

19:29

12/07/2020

A distributional view on multi objective policy optimization

Abbas Abdolmaleki, Sandy Huang, Leonard Hasenclever and
Michael Neunert, Martina Zambelli, Murilo Martins, Francis Song, Nicolas Heess, Raia Hadsell, Martin Riedmiller

Keywords Paper

Reinforcement Learning - Deep RL

0

0

0

0

15:04

02/02/2021

Federated Multi-Armed Bandits

Chengshuai Shi, Cong Shen

Keywords Paper

0

0

0

0

15:26

02/02/2021

Exploration via State influence Modeling

Yongxin Kang, Enmin Zhao, Kai Li, Junliang Xing

Keywords Paper

0

0

0

0

14:03

02/02/2021

Sequential Generative Exploration Model for Partially Observable Reinforcement Learning

Haiyan Yin, Jianda Chen, Sinno Jialin Pan, Sebastian Tschiatschek

Keywords Paper

0

0

0

0

14:40

03/05/2021

Monte-Carlo Planning and Learning with Language Action Value Estimates

Youngsoo Jang, Seokin Seo, Jongmin Lee, Kee-Eung Kim

Keywords Paper

reinforcement learning, interactive fiction, Monte-Carlo tree search, natural language processing

0

0

0

0

4:57

18/07/2021

Backdoor Scanning for Deep Neural Networks through K-Arm Optimization

Guangyu Shen, Yingqi Liu, Guanhong Tao and
Shengwei An, Qiuling Xu, Siyuan Cheng, Shiqing Ma, Xiangyu Zhang

Keywords Paper

Social Aspects of Machine Learning, Privacy, Anonymity, and Security

0

0

0

0

5:12

16/11/2020

Positive-Unlabeled Reward Learning

Danfei Xu, Misha Denil

Keywords Paper

0

0

0

0

5:04

06/12/2020

Adversarial Style Mining for One-Shot Unsupervised Domain Adaptation

Yawei Luo, Ping Liu, Tao Guan and
Junqing Yu, Yi Yang

Keywords Paper

0

0

0

0

3:22

16/11/2020

f-IRL: Inverse Reinforcement Learning via State Marginal Matching

Tianwei Ni, Harshit Sikchi, Yufei Wang and
Tejus Gupta, Lisa Lee, Ben Eysenbach

Keywords Paper

0

0

0

0

5:07

04/08/2021

Corruption-robust exploration in episodic reinforcement learning

Thodoris Lykouris, Max Simchowitz, Alex Slivkins, Wen Sun

Keywords Paper

0

0

0

0

18:27

06/12/2020

Learning to Play Sequential Games versus Unknown Opponents

Pier Giuseppe Sessa, Ilija Bogunovic, Maryam Kamgarpour, Andreas Krause

Keywords Paper

0

0

0

0

3:04

12/07/2020

Influence Diagram Bandits

Tong Yu, Branislav Kveton, Zheng Wen and
Ruiyi Zhang, Ole J. Mengshoel

Keywords Paper

Online Learning, Active Learning, and Bandits

0

0

0

0

14:14

12/07/2020

Dual-Path Distillation: A Unified Framework to Improve Black-Box Attacks

Yonggang Zhang, Ya Li, Tongliang Liu, Xinmei Tian

Keywords Paper

Adversarial Examples

0

0

0

0

11:33

06/12/2021

There Is No Turning Back: A Self-Supervised Approach for Reversibility-Aware Reinforcement Learning

Nathan Grinsztajn, Johan Ferret, Olivier Pietquin and
philippe preux, Matthieu Geist

Keywords Paper

reinforcement learning and planning

0

0

0

0

14:31

02/02/2021

Self-Attention Attribution: Interpreting Information Interactions Inside Transformer

Yaru Hao, Li Dong, Furu Wei, Ke Xu

Keywords Paper

0

0

0

0

16:26

06/12/2021

Learning Collaborative Policies to Solve NP-hard Routing Problems

Minsu Kim, Jinkyoo Park, joungho kim

Keywords Paper

reinforcement learning and planning

0

0

0

0

15:03

12/07/2020

OPtions as REsponses: Grounding behavioural hierarchies in multi-agent reinforcement learning

Alexander Vezhnevets, Yuhuai Wu, Maria Eckstein and
Rémi Leblond, Joel Z Leibo

Keywords Paper

Reinforcement Learning - Deep RL

0

0

0

0

14:17