06/12/2020

Variance Reduction via Accelerated Dual Averaging for Finite-Sum Optimization

Chaobing Song, Yong Jiang, Yi Ma

Keywords: Algorithms -> Boosting and Ensemble Methods; Applications -> Hardware and Systems; Applications -> Natural Language Processing;, Applications -> Privacy, Anonymity, and Security

Abstract: In this paper, we introduce a simplified and unified method for finite-sum convex optimization, named \emph{Variance Reduction via Accelerated Dual Averaging (VRADA)}. In the general convex and smooth setting, VRADA can attain an $O\big(\frac{1}{n}\big)$-accurate solution in $O(n\log\log n)$ number of stochastic gradient evaluations, where $n$ is the number of samples; meanwhile, VRADA matches the lower bound of this setting up to a $\log\log n$ factor. In the strongly convex and smooth setting, VRADA matches the lower bound in the regime $n \le \Theta(\kappa)$, while it improves the rate in the regime $n\gg \kappa$ to $O\big(n +\frac{n\log(1/\epsilon)}{\log(n/\kappa)}\big)$, where $\kappa$ is the condition number. Besides improving the best known complexity results, VRADA has more unified and simplified algorithmic implementation and convergence analysis for both the general convex and strongly convex settings. Through experiments on real datasets, we show the good performance of VRADA over existing methods for large-scale machine learning problems.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at NeurIPS 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd Characters remaining: 140

Similar Papers