Faster DBSCAN via subsampled similarity queries

06/12/2020

Faster DBSCAN via subsampled similarity queries

Heinrich Jiang, Jennifer Jang, Jakub Lacki

Keywords:

Abstract Paper Similar Papers

Abstract: DBSCAN is a popular density-based clustering algorithm. It computes the $\epsilon$-neighborhood graph of a dataset and uses the connected components of the high-degree nodes to decide the clusters. However, the full neighborhood graph may be too costly to compute with a worst-case complexity of $O(n^2)$. In this paper, we propose a simple variant called SNG-DBSCAN, which clusters based on a subsampled $\epsilon$-neighborhood graph, only requires access to similarity queries for pairs of points and in particular avoids any complex data structures which need the embeddings of the data points themselves. The runtime of the procedure is $O(sn^2)$, where $s$ is the sampling rate. We show under some natural theoretical assumptions that $s \approx \log n/n$ is sufficient for statistical cluster recovery guarantees leading to an $O(n\log n)$ complexity. We provide an extensive experimental analysis showing that on large datasets, one can subsample as little as $0.1\%$ of the neighborhood graph, leading to as much as over 200x speedup and 250x reduction in RAM consumption compared to scikit-learn's implementation of DBSCAN, while still maintaining competitive clustering performance.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

18/07/2021

Consistent Nonparametric Methods for Network Assisted Covariate Estimation

Xueyu Mao, Deepayan Chakrabarti, Purnamrita Sarkar

Keywords Paper

Algorithms, Networks and Relational Learning

0

0

0

0

5:15

03/08/2020

Coresets for Estimating Means and Mean Square Error with Limited Greedy Samples

Saeed Vahidian, Baharan Mirzasoleiman, Alexander Cloninger

Keywords Paper

0

0

0

0

9:02

14/09/2020

Model-based Clustering with HDBSCAN*

Michael Strobl, Joerg Sander, Ricardo Campello, Osmar Zaiane

Keywords Paper

hierarchical clustering, expectation maximization, model selection

0

0

0

0

15:31

12/07/2020

Explainable k-Means and k-Medians Clustering

Michal Moshkovitz, Sanjoy Dasgupta, Cyrus Rashtchian, Nave Frost

Keywords Paper

Learning Theory

0

0

0

0

13:48

08/07/2020

Proportionally Fair Clustering Revisited

Evi Micha, Nisarg Shah

Keywords Paper

Fairness, Clustering, Facility location

0

0

0

0

24:22

12/07/2020

Coresets for Clustering in Graphs of Bounded Treewidth

Daniel Baker, Vladimir Braverman, Lingxiao Huang and
Shaofeng H.-C. Jiang, Robert Krauthgamer, Xuan Wu

Keywords Paper

Unsupervised and Semi-Supervised Learning

0

0

0

0

14:49

06/12/2020

Matrix Completion with Hierarchical Graph Side Information

Adel Elmahdy, Junhyung Ahn, Changho Suh, Soheil Mohajer

Keywords Paper

0

0

0

0

3:16

09/07/2020

Information Theoretic Optimal Learning of Gaussian Graphical Models

Sidhant Misra, Marc D Vuffray, Andrey Lokhov

Keywords Paper

High-dimensional statistics, Information theory, Probabilistic graphical models

0

0

0

0

13:39

02/02/2021

Adversarial Permutation Guided Node Representations for Link Prediction

Indradyumna Roy, Abir De, Soumen Chakrabarti

Keywords Paper

0

0

0

0

15:27

18/07/2021

On the price of explainability for some clustering problems

Eduardo Laber, Lucas Murtinho

Keywords Paper

Optimization, Combinatorial Optimization

0

0

0

0

16:52

03/05/2021

Simple Spectral Graph Convolution

Hao Zhu, Piotr Koniusz

Keywords Paper

Graph Convolutional Network, Oversmoothing

0

0

0

0

5:06

06/12/2021

Better Algorithms for Individually Fair $k$-Clustering

Maryam Negahbani, Deeparnab Chakrabarty

Keywords Paper

theory, self-supervised learning, clustering, fairness

0

0

0

0

14:02

18/07/2021

A Scalable Deterministic Global Optimization Algorithm for Clustering Problems

Kaixun Hua, Mingfei Shi, Yankai Cao

Keywords Paper

Algorithms, Clustering, Algorithms, AutoML, Optimization, Combinatorial Optimization

0

0

0

0

11:40

06/12/2020

Exact Recovery of Mangled Clusters with Same-Cluster Queries

Marco Bressan, Nicolò Cesa-Bianchi, Silvio Lattanzi, Andrea Paudice

Keywords Paper

Algorithms -> Image Segmentation; Applications -> Computer Vision; Applications -> Image Segmentation; Applications -> Visual S, Deep Learning -> Visualization or Exposition Techniques for Deep Networks

0

0

0

0

3:13

12/07/2020

Spectral Clustering with Graph Neural Networks for Graph Pooling

Filippo Maria Bianchi, Daniele Grattarola, Cesare Alippi

Keywords Paper

Sequential, Network, and Time-Series Modeling

0

0

0

0

13:28

12/07/2020

Sparse Subspace Clustering with Entropy-Norm

Liang Bai, Jiye Liang

Keywords Paper

Unsupervised and Semi-Supervised Learning

0

0

0

0

11:51

06/12/2021

Robust Online Correlation Clustering

Silvio Lattanzi, Benjamin Moseley, Sergei Vassilvitskii and
Yuyan Wang, Rudy Zhou

Keywords Paper

clustering

0

0

0

0

15:01

18/07/2021

A Hybrid Variance-Reduced Method for Decentralized Stochastic Non-Convex Optimization

Ran Xin, Usman Khan, Soummya Kar

Keywords Paper

Optimization, Distributed and Parallel Optimization

0

0

0

0

5:10

06/12/2021

On Margin-Based Cluster Recovery with Oracle Queries

Marco Bressan, Nicolò Cesa-Bianchi, Silvio Lattanzi, Andrea Paudice

Keywords Paper

theory, clustering, active learning

0

0

0

0

13:51

06/12/2021

Nearly-Tight and Oblivious Algorithms for Explainable Clustering

Buddhima Gamlath, Xinrui Jia, Adam Polak, Ola Svensson

Keywords Paper

optimization, clustering, interpretability

0

0

0

0

12:31

03/08/2020

Brief announcement: Deterministic lower bound for dynamic balanced graph partitioning

Maciej Pacut, Mahmoud Parham, Stefan Schmid

Keywords Paper

online algorithms, graph partitioning, self-adjusting networks

0

0

0

0

10:22

19/08/2021

GraphReach: Position-Aware Graph Neural Network using Reachability Estimations

Sunil Nishad, Shubhangi Agarwal, Arnab Bhattacharya, Sayan Ranu

Keywords Paper

Data Mining, Mining Graphs, Semi Structured Data, Complex Data

0

0

0

0

14:25

02/02/2021

Estimating the Number of Induced Subgraphs from Incomplete Data and Neighborhood Queries

Dimitris Fotakis, Thanasis Pittas, Stratis Skoulakis

Keywords Paper

0

0

0

0

18:35

04/08/2021

Exact Recovery of Clusters in Finite Metric Spaces Using Oracle Queries

Marco Bressan, Nicolò Cesa-Bianchi, Silvio Lattanzi, Andrea Paudice

Keywords Paper

0

0

0

0

16:56

16/11/2020

Structure Aware Negative Sampling in Knowledge Graphs

Kian Ahrabian, Aarash Feizi, Yasmin Salehi and
William L. Hamilton, Avishek Joey Bose

Keywords Paper

inferring patterns, learning representations, contrastive estimation, contrastive approaches

0

0

0

0

6:31

06/12/2021

Reliable Causal Discovery with Improved Exact Search and Weaker Assumptions

Ignavier Ng, Yujia Zheng, Jiji Zhang, Kun Zhang

Keywords Paper

generative model, graph learning, causality

0

0

0

0

7:46

23/08/2020

Average sensitivity of spectral clustering

Pan Peng, Yuichi Yoshida

Keywords Paper

spectral clustering, laplacian, average sensitivity

0

0

0

0

15:50

06/12/2021

Hierarchical Clustering: $O(1)$-Approximation for Well-Clustered Graphs

Bogdan-Adrian Manghiuc, He Sun

Keywords Paper

graph learning, clustering

0

0

0

0

14:05

13/04/2021

Consistent k-median: Simpler, better and robust

Xiangyu Guo, Janardhan Kulkarni, Shi Li, Jiayi Xian

Keywords Paper

0

0

0

0

3:15

06/12/2021

Fair Clustering Under a Bounded Cost

Seyed Esmaeili, Brian Brubach, Aravind Srinivasan, John P Dickerson

Keywords Paper

self-supervised learning, clustering, fairness

0

0

0

0

12:52

18/07/2021

Optimal Non-Convex Exact Recovery in Stochastic Block Model via Projected Power Method

Peng Wang, Huikang Liu, Zirui Zhou, Anthony Man-Cho So

Keywords Paper

Optimization, Non-Convex Optimization

0

0

0

0

5:53

03/08/2020

Robust $k$-means++

Amit Deshpande, Praneeth Kacham, Rameshwar Pratap

Keywords Paper

0

0

0

0

9:08

12/07/2020

Computational and Statistical Tradeoffs in Inferring Combinatorial Structures of Ising Model

Ying Jin, Zhaoran Wang, Junwei Lu

Keywords Paper

Probabilistic Inference - Models and Probabilistic Programming

0

0

0

0

13:08

12/07/2020

p-Norm Flow Diffusion for Local Graph Clustering

Kimon Fountoulakis, Di Wang, Shenghao Yang

Keywords Paper

Unsupervised and Semi-Supervised Learning

0

0

0

0

14:43

06/12/2021

Distributed Machine Learning with Sparse Heterogeneous Data

Dominic Richards, Sahand Negahban, Patrick Rebeschini

Keywords Paper

machine learning, graph learning, federated learning

0

0

0

0

8:11

14/06/2020

Learning to Cluster Faces via Confidence and Connectivity Estimation

Lei Yang, Dapeng Chen, Xiaohang Zhan and
Rui Zhao, Chen Change Loy, Dahua Lin

Keywords Paper

learnable clustering, vertex confidence, edge connectivity

0

0

0

0

1:01

03/05/2021

Effective Distributed Learning with Random Features: Improved Bounds and Algorithms

Yong Liu, Jiankun Liu, Shuqiang Wang

Keywords Paper

statistical learning theory, kernel methods, Risk bound

0

0

0

0

4:25

06/12/2020

Learning Search Space Partition for Black-box Optimization using Monte Carlo Tree Search

Linnan Wang, Rodrigo Fonseca, Yuandong Tian

Keywords Paper

0

1

0

0

3:21

06/12/2020

Graduated Assignment for Joint Multi-Graph Matching and Clustering with Application to Unsupervised Graph Matching Network Learning

Runzhong Wang, Junchi Yan, Xiaokang Yang

Keywords Paper

0

0

0

0

3:11

23/08/2020

In and out: Optimizing overall interaction in probabilistic graphs under clustering constraints

Domenico Mandaglio, Andrea Tagarelli, Francesco Gullo

Keywords Paper

correlation clustering, interaction loss, uncertain graphs

0

0

0

0

14:28