Meta-learning with Stochastic Linear Bandits

Abstract: We investigate meta-learning procedures in the setting of stochastic linear bandits tasks. The goal is to select a learning algorithm which works well on average over a class of bandits tasks, that are sampled from a task-distribution. Inspired by recent work on learning-to-learn linear regression, we consider a class of bandit algorithms that implement a regularized version of the well-known OFUL algorithm, where the regularization is a square euclidean distance to a bias vector. We first study the benefit of the biased OFUL algorithm in terms of regret minimization. We then propose two strategies to estimate the bias within the learning-to-learn setting. We show both theoretically and experimentally, that when the number of tasks grows and the variance of the task-distribution is small, our strategies have a significant advantage over learning the tasks in isolation.

18/07/2021

Meta-learning with Stochastic Linear Bandits

Leonardo Cella, Alessandro Lazaric, Massimiliano Pontil

Comments

Similar Papers

A Distribution-dependent Analysis of Meta Learning

Mikhail Konobeev, Ilja Kuzborskij, Csaba Szepesvari

Keywords Abstract Paper

Theory, Statistical Learning Theory

Reward-Biased Maximum Likelihood Estimation for Linear Stochastic Bandits

Yu-Heng Hung, Ping-Chun Hsieh, Xi Liu, P. R. Kumar

Keywords Abstract Paper

Adversarial Intrinsic Motivation for Reinforcement Learning

Ishan Durugkar, Mauricio Tec, Scott Niekum, Peter Stone

Keywords Abstract Paper

reinforcement learning and planning, generative model

Learning with risk-averse feedback under potentially heavy tails

Matthew Holland, El Mehdi Haress

Keywords Abstract Paper

Improved Corruption Robust Algorithms for Episodic Reinforcement Learning

Yifang Chen, Simon Du, Kevin Jamieson

Keywords Abstract Paper

, Optimization, Non-Convex Optimization, Theory, Online Learning Theory

On data efficiency of meta-learning

Maruan Al-Shedivat, Liam Li, Eric Xing, Ameet Talwalkar

Keywords Abstract Paper

Adaptive Importance Sampling for Finite-Sum Optimization and Sampling with Decreasing Step-Sizes

Ayoub El Hanchi, David Stephens

Keywords Abstract Paper

Differentiable Meta-Learning of Bandit Policies

Craig Boutilier, Chih-wei Hsu, Branislav Kveton and Martin Mladenov, Csaba Szepesvari, Manzil Zaheer

Keywords Abstract Paper

Learning from Similarity-Confidence Data

Yuzhou Cao, Lei Feng, Yitian Xu and Bo An, Gang Niu, Masashi Sugiyama

Keywords Abstract Paper

Algorithms, Semi-Supervised Learning

Stochastic Linear Contextual Bandits with Diverse Contexts

Weiqiang Wu, Jing Yang, Cong Shen

Keywords Abstract Paper

Double Trouble in Double Descent: Bias and Variance(s) in the Lazy Regime

Stéphane d'Ascoli, Maria Refinetti, Giulio Biroli, Florent Krzakala

Keywords Abstract Paper

Deep Learning - Theory

Large deviations for the perceptron model and consequences for active learning

Hugo Cui, Luca Saglietti, Lenka Zdeborova

Keywords Abstract Paper

Modeling and Optimization Trade-off in Meta-learning

Katelyn Gao, Ozan Sener

Keywords Abstract Paper

Online Markov Decision Processes with Aggregate Bandit Feedback

Alon Cohen, Haim Kaplan, Tomer Koren, Yishay Mansour

Keywords Abstract Paper

Learning where to learn: Gradient sparsity in meta and continual learning

Johannes von Oswald, Dominic Zhao, Seijin Kobayashi and Simon Schug, Massimo Caccia, Nicolas Zucchet, João Sacramento

Keywords Abstract Paper

deep learning, optimization, meta learning, continual learning, few shot learning

Learning Randomly Perturbed Structured Predictors for Direct Loss Minimization

Hedda Cohen Indelman, Tamir Hazan

Keywords Abstract Paper

Algorithms, Structured Prediction, Algorithms, Collaborative Filtering, Applications, Recommender Systems

Distributionally Robust Federated Averaging

Yuyang Deng, Mohammad Mahdi Kamani, Mehrdad Mahdavi

Keywords Abstract Paper

Supervised learning: no loss no cry

Richard Nock, Aditya Menon

Keywords Abstract Paper

Learning Theory

Ranking Policy Gradient

Kaixiang Lin, Jiayu Zhou

Keywords Abstract Paper

Sample-efficient reinforcement learning, off-policy learning.

Budgeted and non-budgeted causal bandits

Vineet Nair, Vishakha Patil, Gaurav Sinha

Keywords Abstract Paper

Learning from eXtreme Bandit Feedback

Romain Lopez, Inderjit S. Dhillon, Michael I. Jordan

Keywords Abstract Paper

The sample complexity of meta sparse regression

Zhanyu Wang, Jean Honorio

Keywords Abstract Paper

Breaking the Moments Condition Barrier: No-Regret Algorithm for Bandits with Super Heavy-Tailed Payoffs

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Craig Boutilier, Chih-wei Hsu, Branislav Kveton and
Martin Mladenov, Csaba Szepesvari, Manzil Zaheer

Keywords Paper

Yuzhou Cao, Lei Feng, Yitian Xu and
Bo An, Gang Niu, Masashi Sugiyama

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Johannes von Oswald, Dominic Zhao, Seijin Kobayashi and
Simon Schug, Massimo Caccia, Nicolas Zucchet, João Sacramento

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Joey Hong, Branislav Kveton, Manzil Zaheer and
Yinlam Chow, Amr Ahmed, Craig Boutilier

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Xiantong Zhen, Haoliang Sun, Yingjun Du and
Jun Xu, Yilong Yin, Ling Shao, Cees Snoek

Keywords Paper

Keywords Paper

Keywords Paper

Sean Sinclair, Tianyu Wang, Gauri Jain and
Sid Banerjee, Christina Yu

Keywords Paper

Keywords Paper

Kirill Struminsky, Artyom Gadetsky, Denis Rakitin and
Danil Karpushkin, Dmitry Vetrov

Keywords Paper