12/07/2020

Improved Communication Cost in Distributed PageRank Computation – A Theoretical Study

Siqiang Luo

Keywords: Learning Theory

Abstract: PageRank is a widely used approach for measuring the importance of a node in a graph. Computing PageRank is a fundamental task in numerous applications including web search, machine learning and recommendation systems. The importance of computing PageRanks in a distributed environment has been recognized due to the rapid growth of the graph size in real world. However, only a few previous works can provide a provable complexity and accuracy for distributed PageRank computation. Given a constant $d>0$ and a graph of $n$ nodes and under the well-known congested-clique distributed model, the state-of-the-art approach, Radar-Push, uses $O(\log\log{n}+\log{d})$ communication rounds to approximate the PageRanks within a relative error $O(\frac{1}{\log^d{n}})$. However, Radar-Push entails as large as $O(\log^{2d+3}{n})$ bits of bandwidth (e.g., the communication cost between a pair of nodes per round) in the worst case. In this paper, we provide a new algorithm that uses asymptotically the same communication rounds while significantly improves the bandwidth from $O(\log^{2d+3}{n})$ bits to $O(d\log^3{n})$ bits. To the best of our knowledge, our distributed PageRank algorithm is the first to achieve $o(d\log{n})$ communication rounds with $O(d\log^3{n})$ bits of bandwidth in approximating PageRanks with relative error $O(\frac{1}{\log^d{n}})$ under the congested-clique model.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at ICML 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd Characters remaining: 140

Similar Papers