Abstract:
It is often desirable to reduce the dimensionality of a large dataset
by projecting it onto a low-dimensional subspace. Matrix sketching
has emerged as a powerful technique for performing such dimensionality
reduction very efficiently. Even though there is an extensive
literature on the worst-case performance of sketching, existing
guarantees are typically very different from what is observed in
practice. We exploit recent developments in the spectral analysis of
random matrices to develop novel techniques that provide provably
accurate expressions for the expected value of random projection
matrices obtained via sketching. These expressions can be used to
characterize the performance of dimensionality reduction in a variety
of common machine learning tasks, ranging from low-rank approximation
to iterative stochastic optimization. Our results apply to several
popular sketching methods, including Gaussian and Rademacher
sketches, and they enable precise analysis of these methods in terms of
spectral properties of the data. Empirical results show that the
expressions we derive reflect the practical performance of these
sketching methods, down to lower-order effects and even constant factors.