06/12/2020

On the Optimal Weighted $\ell_2$ Regularization in Overparameterized Linear Regression

Denny Wu, Ji Xu

Keywords:

Abstract: We consider the linear model $\vy=\vX\vbeta_{\star}+\vepsilon$ with $\vX\in \mathbb{R}^{n\times p}$ in the overparameterized regime $p>n$. We estimate $\vbeta_{\star}$ via generalized (weighted) ridge regression: $\hat{\vbeta}_{\lambda}=\left(\vX^{\t}\vX+\lambda\vSigma_w\right)^{\dagger}\vX^{\t}\vy$, where $\vSigma_w$ is the weighting matrix. Assuming a random effects model with general data covariance $\vSigma_x$ and anisotropic prior on the true coefficients $\vbeta_{\star}$, i.e., $\bbE\vbeta_{\star}\vbeta_{\star}^{\t}=\vSigma_\beta$, we provide an exact characterization of the prediction risk $\mathbb{E}(y-\vx^{\t}\hat{\vbeta}_{\lambda})^2$ in the proportional asymptotic limit $p/n\rightarrow \gamma \in (1,\infty)$. Our general setup leads to a number of interesting findings. We outline precise conditions that decide the sign of the optimal setting $\lambda_{\opt}$ for the ridge parameter $\lambda$ and confirm the implicit $\ell_2$ regularization effect of overparameterization, which theoretically justifies the surprising empirical observation that $\lambda_{\opt}$ can be \textit{negative} in the overparameterized regime. We also characterize the double descent phenomenon for principal component regression (PCR) when $\vX$ and $\vbeta_{\star}$ are both anisotropic. Finally, we determine the optimal weighting matrix $\vSigma_w$ for both the ridgeless ($\lambda\to 0$) and optimally regularized ($\lambda = \lambda_{\opt}$) case, and demonstrate the advantage of the weighted objective over standard ridge regression and PCR.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at NeurIPS 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd

Similar Papers