26/08/2020

Optimal Deterministic Coresets for Ridge Regression

Praneeth Kacham, David Woodruff

Keywords:

Abstract: We consider the ridge regression problem, for which we are given an nxd matrix A of examples and a corresponding nxd' matrix B of labels, as well as a ridge parameter $\lambda \geq 0$, and would like to output an $X' \in R^{d \times d'}$ for which $$\|AX'-B\|_F^2 + \lambda \|X'\|_F^2 \leq (1+\epsilon)OPT,$$ where ${OPT} = \min_{Y \in \mathbb{R}^{d \times d'}} \|AY-B\|_F^2 + \lambda \|Y\|_F^2.$ In the special case of $\lambda = 0$, this is ordinary multi-response linear regression. Our focus is on deterministically constructing coresets for this problem. Here the goal is to select and re-weight a small subset of rows of $A$ and corresponding labels of $B$, denoted by $SA$ and $SB$, so that if $X'$ is the minimizer to $\min_{X'} \|SAX'-SB\|_F^2 + \lambda \|X'\|_F^2$, then $\|AX'-B\|_F^2 + \lambda \|X'\|_F^2 \leq (1+\epsilon)OPT$. We show how to efficiently(poly(n,d,1/\epsilon) time) and deterministically select $O({sd}_{\lambda}/\epsilon)$ rows of $A$ and $B$ to achieve this property, and prove a matching lower bound, showing that it is necessary to select $\Omega({sd}_{\lambda}/\epsilon)$ rows no matter what the weights are, for any $1 < 1/\epsilon \leq sd_{\lambda}$. Here ${sd}_{\lambda}$ is the statistical dimension of the input, and we assume $d' = O({sd}_{\lambda}) \leq d$. In the case of ordinary regression, this gives a deterministic algorithm achieving $O(d/\epsilon)$ rows and a matching lower bound for any $1 \leq 1/\epsilon \leq d$; for $1/\epsilon > d$ we show $\Theta(d^2)$ rows are sufficient. Finally we show our new coresets are mergeable, giving a deterministic protocol for ridge regression with $O({sd}_{\lambda}/\epsilon)$ words of communication per server, in the important case when the rows of $A$ and $B$ have a constant number of non-zero entries and there are a constant number of servers. Prior to our work the best deterministic protocols in this setting required $\Omega(min({sd}_{\lambda}^2,{sd}_{\lambda}/\epsilon^2))$ communication.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at AISTATS 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd

Similar Papers