12/07/2020

Private Query Release Assisted by Public Data

Raef Bassily, Albert Cheu, Shay Moran, Aleksandar Nikolov, Jonathan Ullman, Steven Wu

Keywords: Privacy-preserving Statistics and Machine Learning

Abstract: We study the problem of differentially private query release assisted by public data. Given a class $H$ of queries $h:X \rightarrow \{-1, 1\}$, and data set of i.i.d. samples from an unknown distribution $D$, a query-release algorithm is required to output a data structure $G: H \rightarrow [-1, 1]$ such that for any query $h\in H,$ $G(h)$ approximates $E_{x\sim D}[h(x)]$ up to some error $\alpha$. In this problem, the input data set consists of two types of samples: private and public. The algorithm is required to satisfy differential privacy only with respect to the private samples. We study the limits of this task in terms of private and public sample complexities. First, we show that this task is achievable for any query class of finite VC-dimension using only (roughly) $d/\alpha$ public samples and $\sqrt{p}d^{3/2}/\alpha^2$ private samples, where $d$ is the VC-dimension of the class, and $p$ is the dual VC-dimension. When there are no public samples, there are known examples of classes of VC-dimension one for which this task is impossible under differential privacy (e.g., the class of threshold functions over $R$). Moreover, our upper bound on the public sample complexity is non-trivial since, without private samples, it is known that this task is equivalent to uniform convergence over $H$ which requires at least $d/\alpha^2$ public samples. Next, we give lower bounds on private and public sample complexities with tight dependence on $p$ and $\alpha$. In particular, for the class of decision stumps, we give a lower bound of $\sqrt{p}/\alpha$ on the private sample complexity whenever the number of public samples is $<1/\alpha^2$. Given our upper bound, this shows that the dependence on $\sqrt{p}$ in the private sample complexity is necessary (in the non-trivial regime where the public samples are insufficient to solve the problem on its own). We also give a tight lower bound of $1/\alpha$ on the public sample complexity for a broad family of query classes.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at ICML 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd

Similar Papers