14/09/2020

Orthant Based Proximal Stochastic Gradient Method for l1-Regularized Optimization

Tianyi Chen, Tianyu Ding, Bo Ji, Guanyi Wang, Yixin Shi, Jing Tian, Sheng Yi, Xiao Tu, Zhihui Zhu

Keywords: stochastic learning, sparsity, orthant prediction

Abstract: Sparsity-inducing regularization problems are ubiquitous in machine learning applications, ranging from feature selection to model compression. In this paper, we present a novel stochastic method – Orthant Based Proximal Stochastic Gradient Method (OBProx-SG) – to solve perhaps the most popular instance, i.e., the \(\ell _1\)-regularized problem. The OBProx-SG method contains two steps: (i) a proximal stochastic gradient step to predict a support cover of the solution; and (ii) an orthant step to aggressively enhance the sparsity level via orthant face projection. Compared to the state-of-the-art methods, e.g., Prox-SG, RDA and Prox-SVRG, the OBProx-SG not only converges comparably in both convex and non-convex scenarios, but also promotes the sparsity of the solutions substantially. Particularly, on a large number of convex problems, OBProx-SG outperforms the existing methods comprehensively in the aspect of sparsity exploration and objective values. Moreover, the experiments on non-convex deep neural networks, e.g., MobileNetV1 and ResNet18, further demonstrate its superiority by generating the solutions of much higher sparsity without sacrificing generalization accuracy, which further implies that OBProx-SG may achieve significant memory and energy savings. The source code is available at https://github.com/tianyic/obproxsg.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at ECML PKDD 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd Characters remaining: 140

Similar Papers