04/08/2021

Tsallis-INF: Improved Robustness to Adversarial Corruptions in Stochastic Multi-armed Bandits and Beyond

Saeed Masoudian, Yevgeny Seldin

Keywords:

Abstract: We derive improved regret bounds for the Tsallis-INF algorithm of \citet{zimmert2020}. We show that in adversarial regimes with a $(\Delta,C,T)$ self-bounding constraint the algorithm achieves $\Ocal\lr{\lr{\sum_{i\neq i^*} \frac{1}{\Delta_i}}\log_+\lr{\frac{(K-1)T}{\lr{\sum_{i\neq i^*} \frac{1}{\Delta_i}}^2}}+\sqrt{C\lr{\sum_{i\neq i^*}\frac{1}{\Delta_i}}\log_+\lr{\frac{(K-1)T}{C\sum_{i\neq i^*}\frac{1}{\Delta_i}}}}}$ regret bound, where $T$ is the time horizon, $K$ is the number of arms, $\Delta_i$ are the suboptimality gaps, $i^*$ is the best arm, $C$ is the corruption magnitude, and $\log_+(x) = \max\lr{1,\log x}$. The regime includes stochastic bandits, stochastically constrained adversarial bandits, and stochastic bandits with adversarial corruptions as special cases. Additionally, we provide a general analysis, which allows to achieve the same kind of improvement for generalizations of Tsallis-INF to other settings beyond multiarmed bandits.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at COLT 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd

Similar Papers