12/07/2020

Exploration Through Bias: Revisiting Biased Maximum Likelihood Estimation in Stochastic Multi-Armed Bandits

Xi Liu, Ping-Chun Hsieh, Yu Heng Hung, Anirban Bhattacharya, P. Kumar

Keywords: Online Learning, Active Learning, and Bandits

Abstract: We propose a new family of bandit algorithms, that are formulated in a general way based on the Biased Maximum Likelihood Estimation (BMLE) method originally appearing in the adaptive control literature. We design the reward-bias term to tackle the exploration and exploitation tradeoff for stochastic bandit problems. We provide a general recipe for the BMLE algorithm and derive a simple explicit closed-form expression for the index of an arm for exponential family reward distributions. We prove that the derived BMLE indices achieve a logarithmic finite-time regret bound and hence attain order-optimality, for both exponential families and the cases beyond parametric distributions. Through extensive simulations, we demonstrate that the proposed algorithms achieve regret performance comparable to the best of several state-of-the-art baseline methods, while being computationally efficient in comparison to other best-performing methods. The generality of the proposed approach makes it possible to address more complex models, including general adaptive control of Markovian systems.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at ICML 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd Characters remaining: 140

Similar Papers