18/07/2021

Hierarchical Clustering of Data Streams: Scalable Algorithms and Approximation Guarantees

Anand Rajagopalan, Fabio Vitale, Danny Vainstein, Gui Citovsky, Cecilia Procopiuc, Claudio Gentile

Keywords: Algorithms, Clustering, Theory, Spaces of Functions and Kernels, Algorithms, Density Estimation; Deep Learning, Adversarial Networks; Deep Learning, Generative Models; Theory, Frequent

Abstract: We investigate the problem of hierarchically clustering data streams containing metric data in R^d. We introduce a desirable invariance property for such algorithms, describe a general family of hyperplane-based methods enjoying this property, and analyze two scalable instances of this general family against recently popularized similarity/dissimilarity-based metrics for hierarchical clustering. We prove a number of new results related to the approximation ratios of these algorithms, improving in various ways over the literature on this subject. Finally, since our algorithms are principled but also very practical, we carry out an experimental comparison on both synthetic and real-world datasets showing competitive results against known baselines.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at ICML 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd Characters remaining: 140

Similar Papers