12/07/2020

On Thompson Sampling with Langevin Algorithms

Eric Mazumdar, Aldo Pacchiano, Yian Ma, Michael Jordan, Peter Bartlett

Keywords: Online Learning, Active Learning, and Bandits

Abstract: Thompson sampling has been demonstrated both theoretically and empirically to enjoy favorable performance in tackling multi-armed bandits problems. Despite its successes, however, one key obstacle to its use in a much broader range of scenarios is the need for perfect samples from posterior distributions at every iteration, which is oftentimes not feasible in practice. We propose a Markov Chain Monte Carlo (MCMC) method tailored to Thompson sampling to address this issue. We construct a fast converging Langevin algorithm to generate approximate samples with accuracy guarantees. We then leverage novel posterior concentration rates to analyze the statistical risk of the overall Thompson sampling method. Finally, we specify the necessary hyperparameters and the required computational resources for the MCMC procedure to match the optimal risk. The resulting algorithm enjoys both optimal instance-dependent frequentist regret and appealing computation complexity.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at ICML 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd Characters remaining: 140

Similar Papers