Improving the Sample and Communication Complexity for Decentralized Non-Convex Optimization: Joint Gradient Estimation and Tracking

Abstract: Many modern large-scale machine learning problems benefit from decentralized and stochastic optimization. Recent works have shown that utilizing both decentralized computing and local stochastic gradient estimates can outperform state-of-the-art centralized algorithms, in applications involving highly non-convex problems, such as training deep neural networks. In this work, we propose a decentralized stochastic algorithm to deal with certain smooth non-convex problems where there are $m$ nodes in the system, and each node has a large number of samples (denoted as $n$). Differently from the majority of the existing decentralized learning algorithms for either stochastic or finite-sum problems, our focus is given to {\it both} reducing the total communication rounds among the nodes, while accessing the minimum number of local data samples. In particular, we propose an algorithm named D-GET (decentralized gradient estimation and tracking), which jointly performs decentralized gradient estimation (which estimates the local gradient using a subset of local samples) {\it and} gradient tracking (which tracks the global full gradient using local estimates). We show that, to achieve certain $\epsilon$ stationary solution of the deterministic finite sum problem, the proposed algorithm achieves an $\mathcal{O}(mn^{1/2}\epsilon^{-1})$ sample complexity and an $\mathcal{O}(\epsilon^{-1})$ communication complexity. These bounds significantly improve upon the best existing bounds of $\mathcal{O}(mn\epsilon^{-1})$ and $\mathcal{O}(\epsilon^{-1})$, respectively. Similarly, for online problems, the proposed method achieves an $\mathcal{O}(m \epsilon^{-3/2})$ sample complexity and an $\mathcal{O}(\epsilon^{-1})$ communication complexity, while the best existing bounds are $\mathcal{O}(m\epsilon^{-2})$ and $\mathcal{O}(\epsilon^{-2})$.

18/07/2021

Deep Learning, Algorithms, Multitask and Transfer Learning; Algorithms, Online Learning, Social Aspects of Machine Learning, Privacy, Anonymity, and Security

17:27

03/05/2021

Improving the Sample and Communication Complexity for Decentralized Non-Convex Optimization: Joint Gradient Estimation and Tracking

Haoran Sun, Songtao Lu, Mingyi Hong

Comments

Similar Papers

Variance Reduction via Primal-Dual Accelerated Dual Averaging for Nonsmooth Convex Finite-Sums

Chaobing Song, Stephen Wright, Jelena Diakonikolas

Keywords Abstract Paper

Optimization, Convex Optimization

Private Stochastic Convex Optimization: Optimal Rates in L1 Geometry

Hilal Asi, Vitaly Feldman, Tomer Koren, Kunal Talwar

Keywords Abstract Paper

Deep Learning, Algorithms, Multitask and Transfer Learning; Algorithms, Online Learning, Social Aspects of Machine Learning, Privacy, Anonymity, and Security

New Bounds For Distributed Mean Estimation and Variance Reduction

Peter Davies, Vijaykrishna Gurunathan, Niusha Moshrefi and Saleh Ashkboos, Dan Alistarh

Keywords Abstract Paper

distributed machine learning, variance reduction, mean estimation, lattices

An Improved Analysis of Gradient Tracking for Decentralized Machine Learning

Anastasiia Koloskova, Tao Lin, Sebastian Stich

Keywords Abstract Paper

optimization, machine learning

Learning to Guide Random Search

Ozan Sener, Vladlen Koltun

Keywords Abstract Paper

Random search, Derivative-free optimization, Learning continuous control

PAGE: A Simple and Optimal Probabilistic Gradient Estimator for Nonconvex Optimization

Zhize Li, Hongyan Bao, Xiangliang Zhang, Peter Richtarik

Keywords Abstract Paper

Optimization

Leveraging Non-uniformity in First-order Non-convex Optimization

Jincheng Mei, Yue Gao, Bo Dai and Csaba Szepesvari, Dale Schuurmans

Keywords Abstract Paper

Theory, RL, Decisions and Control Theory

SGD: The Role of Implicit Regularization, Batch-size and Multiple-epochs

Ayush Sekhari, Karthik Sridharan, Satyen Kale

Keywords Abstract Paper

theory, deep learning, optimization

Simple Stochastic and Online Gradient Descent Algorithms for Pairwise Learning

ZHENHUAN YANG, Yunwen Lei, Puyu Wang and Tianbao Yang, Yiming Ying

Keywords Abstract Paper

optimization, machine learning, privacy

Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation

Devavrat Shah, Dogyoon Song, Zhi Xu, Yuzhe Yang

Keywords Abstract Paper

Communication efficient primal-dual algorithm for nonconvex nonsmooth distributed optimization

Congliang Chen, Jiawei Zhang, Li Shen and Peilin Zhao, Zhiquan Luo

Keywords Abstract Paper

Adaptive sampling for fast constrained maximization of submodular functions

Francesco Quinzan, Vanja Doskoc, Andreas Göbel, Tobias Friedrich

Keywords Abstract Paper

On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems

Darren Lin, Chi Jin, Michael Jordan

Keywords Abstract Paper

Optimization - Non-convex

Besov Function Approximation and Binary Classification on Low-Dimensional Manifolds Using Convolutional Residual Networks

Hao Liu, Minshuo Chen, Tuo Zhao, Wenjing Liao

Keywords Abstract Paper

Applications, Computer Vision, , Theory, Deep learning Theory

A dynamical view on optimization algorithms of overparameterized neural networks

Zhiqi Bu, Shiyun Xu, Kan Chen

Keywords Abstract Paper

Escaping Saddle-Point Faster under Interpolation-like Conditions

Abhishek Roy, Krishnakumar Balasubramanian, Saeed Ghadimi, Prasant Mohapatra

Keywords Abstract Paper

A Corrective View of Neural Networks: Representation, Memorization and Learning

Dheeraj M Nagaraj, Guy Bresler

Keywords Abstract Paper

Neural networks/deep learning, Learning with algebraic or combinatorial structure, Supervised learning

Optimal Sketching for Trace Estimation

Shuli Jiang, Hai Pham, David Woodruff, Richard Zhang

Keywords Abstract Paper

machine learning

Follow the Perturbed Leader: Optimism and Fast Parallel Algorithms for Smooth Minimax Games

Arun Suggala, Praneeth Netrapalli

Keywords Abstract Paper

Faster Non-asymptotic Convergence for Double Q-learning

Lin Zhao, Huaqing Xiong, Yingbin Liang

Keywords Abstract Paper

theory, reinforcement learning and planning

Parallel and Efficient Hierarchical k-Median Clustering

Vincent Cohen-Addad, Silvio Lattanzi, Ashkan Norouzi-Fard and Christian Sohler, Ola Svensson

Keywords Paper

Keywords Paper

Peter Davies, Vijaykrishna Gurunathan, Niusha Moshrefi and
Saleh Ashkboos, Dan Alistarh

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Jincheng Mei, Yue Gao, Bo Dai and
Csaba Szepesvari, Dale Schuurmans

Keywords Paper

Keywords Paper

ZHENHUAN YANG, Yunwen Lei, Puyu Wang and
Tianbao Yang, Yiming Ying

Keywords Paper

Keywords Paper

Congliang Chen, Jiawei Zhang, Li Shen and
Peilin Zhao, Zhiquan Luo

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Vincent Cohen-Addad, Silvio Lattanzi, Ashkan Norouzi-Fard and
Christian Sohler, Ola Svensson

Keywords Paper

Keywords Paper

Jiawei Huang, Ruomin Huang, wenjie liu and
Nikolaos Freris, Hu Ding

Keywords Paper

Yogesh Balaji, Mohammadmahdi Sajedi, Neha Kalibhat and
Mucong Ding, Dominik Stöger, Mahdi Soltanolkotabi, Soheil Feizi

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Zhuoran Yang, Chi Jin, Zhaoran Wang and
Mengdi Wang, Michael Jordan

Keywords Paper

Keywords Paper

Shuli Jiang, Dongyu Li, Irene Mengze Li and
Arvind Mahankali, David Woodruff

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper