09/07/2020

Extrapolating the profile of a finite population

Yihong Wu, Yury Polyanskiy, Soham Jana

Keywords: Distribution learning/testing, High-dimensional statistics

Abstract: We study a prototypical problem in empirical Bayes. Namely,\n consider a population consisting of $k$ individuals each belonging to one of $k$ types (some types can be empty).\n Without any structural restrictions, it is impossible to learn the \n composition of the full population having observed only a small (random) subsample of size $m = o(k)$. \n Nevertheless, we show that in the sublinear regime of $m =\omega(k/\log k)$, it is possible to consistently estimate in total variation \n the \emph{profile} of the population, defined as the empirical\n distribution of the sizes of each type, which determines many symmetric properties of the population. We also prove\n that in the linear regime of $m=c k$ for any constant $c$ the optimal rate is $\Theta(1/\log k)$. \n Our estimator is based on Wolfowitz's minimum distance method, which entails solving a linear program (LP) of size $k$.\n We show that there is a single infinite-dimensional LP whose value simultaneously characterizes the risk of the minimum distance estimator and certifies its minimax optimality.\n The sharp convergence rate is obtained by evaluating this LP using complex-analytic techniques.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at COLT 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd

Similar Papers