14/07/2020

Communication-efficient weighted reservoir sampling from fully distributed data streams

Lorenz Hübschle-Schneider, Peter Sanders

Keywords: mini-batch, data stream, weighted sampling, reservoir sampling, sampling, communication efficiency

Abstract: We consider weighted random sampling from distributed data streams presented as a sequence of mini-batches of items. This is a natural model for distributed streaming computation, and our goal is to showcase its usefulness. We present and analyze a fully distributed, communication-efficient algorithm for weighted reservoir sampling in this model. An experimental evaluation on up to 256 nodes (5120 processors) shows good speedups, while theoretical analysis promises further scaling to much larger machines.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at SPAA 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd

Similar Papers