13/04/2021

A parameter-free algorithm for misspecified linear contextual bandits

Kei Takemura, Shinji Ito, Daisuke Hatano, Hanna Sumita, Takuro Fukunaga, Naonori Kakimura, Ken-ichi Kawarabayashi

Keywords:

Abstract: We investigate the misspecified linear contextual bandit (MLCB) problem, which is a generalization of the linear contextual bandit (LCB) problem. The MLCB problem is a decision-making problem in which a learner observes d-dimensional feature vectors, called arms, chooses an arm from K arms, and then obtains a reward from the chosen arm in each round. The learner aims to maximize the sum of the rewards over T rounds. In contrast to the LCB problem, the rewards in the MLCB problem may not be represented by a linear function in feature vectors; instead, it is approximated by a linear function with additive approximation parameter \varepsilon \geq 0. In this paper, we propose an algorithm that achieves \tilde{O}(\sqrt{dT\log(K)} + \varepsilon\sqrt{d}T) regret, where \tilde{O}(\cdot) ignores polylogarithmic factors in d and T. This is the first algorithm that guarantees a high-probability regret bound for the MLCB problem without knowledge of the approximation parameter \varepsilon.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at AISTATS 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd

Similar Papers