13/04/2021

Smooth bandit optimization: Generalization to holder space

Yusha Liu, Yining Wang, Aarti Singh

Keywords:

Abstract: We consider bandit optimization of a smooth reward function, where the goal is cumulative regret minimization. This problem has been studied for \alpha-Holder continuous (including Lipschitz) functions with 0<\alpha\leq 1. Our main result is in generalization of the reward function to Holder space with exponent \alpha>1 to bridge the gap between Lipschitz bandits and infinitely-differentiable models such as linear bandits. For Holder continuous functions, approaches based on random sampling in bins of a discretized domain suffices as optimal. In contrast, we propose a class of two-layer algorithms that deploy misspecified linear/polynomial bandit algorithms in bins. We demonstrate that the proposed algorithm can exploit higher-order smoothness of the function by deriving a regret upper bound of \tilde{O}(T^\frac{d+\alpha}{d+2\alpha}) for when \alpha>1, which matches existing lower bound. We also study adaptation to unknown function smoothness over a continuous scale of Holder spaces indexed by \alpha, with a bandit model selection approach applied with our proposed two-layer algorithms. We show that it achieves regret rate that matches the existing lower bound for adaptation within the \alpha\leq 1 subset.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at AISTATS 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd

Similar Papers