Abstract:
In this paper, we study (locally) differentially private Combinatorial Semi-Bandits (CSB). Compared with private Multi-Armed Bandits (MAB), since the server receives more information from the user, it usually leads to additional dependence over the dimension of feedback, which is a notorious problem in private learning.
Somewhat surprisingly, we show that it is possible to remove this side-effect caused by privacy protection and nearly match corresponding non-private best results. In detail, for general CSB with $B$-bounded smooth reward function in the sense of Chen et al. 2016, we propose a novel algorithm that achieves regret bound $\tilde{\mc{O}}(mB^2\log T / \epsilon)$ over $T$ rounds under $\epsilon$-local differential privacy, where $m$ is the number of base arms. However, for Linear CSB, $B$ equals $K$, where $K$ is the maximum number of feedback in each round, and above bound has an additional $K$ compared with non-private optimal result. We then propose a different algorithm with nearly optimal regret bound $\tilde{\mc{O}}(mK\log T / \epsilon)$ if one cares about $\epsilon$-differential privacy rather than $\epsilon$-local differential privacy. Besides, we also prove some lower bounds in each setting.