Coresets for clustering in euclidean spaces: Importance sampling is nearly optimal

22/06/2020

Coresets for clustering in euclidean spaces: Importance sampling is nearly optimal

Lingxiao Huang, Nisheeth K. Vishnoi

Keywords: Coresets, k-means, Importance sampling, Dimension reduction, Clustering, k-median

Abstract Paper Similar Papers

Abstract: Given a collection of n points in ℝd, the goal of the (k,z)-clustering problem is to find a subset of k “centers” that minimizes the sum of the z-th powers of the Euclidean distance of each point to the closest center. Special cases of the (k,z)-clustering problem include the k-median and k-means problems. Our main result is a unified two-stage importance sampling framework that constructs an ε-coreset for the (k,z)-clustering problem. Compared to the results for (k,z)-clustering in [Feldman and Langberg, STOC 2011], our framework saves a ε2 d factor in the coreset size. Compared to the results for (k,z)-clustering in [Sohler and Woodruff, FOCS 2018], our framework saves a poly(k) factor in the coreset size and avoids the exp(k/ε) term in the construction time. Specifically, our coreset for k-median (z=1) has size Õ(ε−4 k) which, when compared to the result in [Sohler and Woodruff, STOC 2018], saves a k factor in the coreset size. Our algorithmic results rely on a new dimensionality reduction technique that connects two well-known shape fitting problems: subspace approximation and clustering, and may be of independent interest. We also provide a size lower bound of Ω(k· min2z/20,d ) for a 0.01-coreset for (k,z)-clustering, which has a linear dependence of size on k and an exponential dependence on z that matches our algorithmic results.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at STOC 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

04/08/2021

Approximation Algorithms for Socially Fair Clustering

Yury Makarychev, Ali Vakilian

Keywords Paper

0

0

0

0

16:31

02/02/2021

Differentially Private k-Means via Exponential Mechanism and Max Cover

Huy L. Nguyen, Anamay Chaturvedi, Eric Z Xu

Keywords Paper

0

0

0

0

17:42

06/12/2021

Coresets for Clustering with Missing Values

Vladimir Braverman, Shaofeng Jiang, Robert Krauthgamer, Xuan Wu

Keywords Paper

clustering

0

0

0

0

10:33

08/07/2020

Proportionally Fair Clustering Revisited

Evi Micha, Nisarg Shah

Keywords Paper

Fairness, Clustering, Facility location

0

0

0

0

24:22

13/04/2021

Consistent k-median: Simpler, better and robust

Xiangyu Guo, Janardhan Kulkarni, Shi Li, Jiayi Xian

Keywords Paper

0

0

0

0

3:15

08/07/2020

Deterministic Sparse Fourier Transform with an 𝓁_{∞} Guarantee

Yi Li, Vasileios Nakos

Keywords Paper

Fourier sparse recovery, derandomization, incoherent matrices

0

0

0

0

19:52

09/07/2020

Private Mean Estimation of Heavy-Tailed Distributions

Gautam Kamath, Vikrant Singhal, Jonathan Ullman

Keywords Paper

Privacy, fairness, Distribution learning/testing

0

0

0

0

13:24

22/06/2020

Non-adaptive adaptive sampling on turnstile streams

Sepideh Mahabadi, Ilya Razenshteyn, David P. Woodruff, Samson Zhou

Keywords Paper

volume maximization, determinantal point processes, computational geometry, streaming algorithms

0

0

0

0

25:07

06/12/2021

Better Algorithms for Individually Fair $k$-Clustering

Maryam Negahbani, Deeparnab Chakrabarty

Keywords Paper

theory, self-supervised learning, clustering, fairness

0

0

0

0

14:02

06/12/2021

Nearly-Tight and Oblivious Algorithms for Explainable Clustering

Buddhima Gamlath, Xinrui Jia, Adam Polak, Ola Svensson

Keywords Paper

optimization, clustering, interpretability

0

0

0

0

12:31

08/07/2020

The Online Min-Sum Set Cover Problem

Dimitris Fotakis, Loukas Kavouras, Grigorios Koumoutsos and
Stratis Skoulakis, Manolis Vardas

Keywords Paper

Online Algorithms, Competitive Analysis, Min-Sum Set Cover

0

0

0

0

25:10

03/08/2020

Robust $k$-means++

Amit Deshpande, Praneeth Kacham, Rameshwar Pratap

Keywords Paper

0

0

0

0

9:08

18/07/2021

Meta Learning for Support Recovery in High-dimensional Precision Matrix Estimation

Qian Zhang, Yilin Zheng, Jean Honorio

Keywords Paper

Algorithms, Meta-Learning, Algorithms, Few-Shot Learning; Algorithms, Multitask and Transfer Learning, Theory, Statistical Learning Theory

0

0

0

0

5:03

06/12/2021

Improved Coresets and Sublinear Algorithms for Power Means in Euclidean Spaces

Vincent Cohen-Addad, David Saulpic, Chris Schwiegelshohn

Keywords Paper

clustering

0

0

0

0

16:06

06/12/2020

The Flajolet-Martin Sketch Itself Preserves Differential Privacy: Private Counting with Minimal Space

Adam Smith, Shuang Song, Abhradeep Guha Thakurta

Keywords Paper

0

0

0

0

3:17

06/12/2020

Exact Recovery of Mangled Clusters with Same-Cluster Queries

Marco Bressan, Nicolò Cesa-Bianchi, Silvio Lattanzi, Andrea Paudice

Keywords Paper

Algorithms -> Image Segmentation; Applications -> Computer Vision; Applications -> Image Segmentation; Applications -> Visual S, Deep Learning -> Visualization or Exposition Techniques for Deep Networks

0

0

0

0

3:13

18/07/2021

Near-Optimal Algorithms for Explainable k-Medians and k-Means

Kostya Makarychev, Liren Shan

Keywords Paper

Algorithms, Unsupervised Learning

0

0

0

0

5:12

08/07/2020

Improved Bounds for Matching in Random-Order Streams

Aaron Bernstein

Keywords Paper

Graph Algorithms, Sublinear Algorithms, Matching, Streaming

0

0

0

0

25:05

04/08/2021

The Bethe and Sinkhorn Permanents of Low Rank Matrices and Implications for Profile Maximum Likelihood

Nima Anari, Moses Charikar, Kirankumar Shiragur, Aaron Sidford

Keywords Paper

0

0

0

0

18:20

04/08/2021

Near-Optimal Entrywise Sampling of Numerically Sparse Matrices

Vladimir Braverman, Robert Krauthgamer, Aditya R Krishnan, Shay Sapir

Keywords Paper

0

0

0

0

16:59

13/04/2021

Learning-to-rank with partitioned preference: Fast estimation for the plackett-luce model

Jiaqi Ma, Xinyang Yi, Weijing Tang and
Zhe Zhao, Lichan Hong, Ed Chi, Qiaozhu Mei

Keywords Paper

0

0

0

0

3:03

18/07/2021

Streaming and Distributed Algorithms for Robust Column Subset Selection

Shuli Jiang, Dongyu Li, Irene Mengze Li and
Arvind Mahankali, David Woodruff

Keywords Paper

Algorithms, Deep Learning, Generative Models, Deep Learning, Predictive Models; Deep Learning, Recurrent Networks

0

0

0

0

7:26

06/12/2021

Dimensionality Reduction for Wasserstein Barycenter

Zachary Izzo, Sandeep Silwal, Samson Zhou

Keywords Paper

machine learning

0

0

0

0

11:10

06/12/2020

Faster DBSCAN via subsampled similarity queries

Heinrich Jiang, Jennifer Jang, Jakub Lacki

Keywords Paper

0

0

0

0

3:13

13/04/2021

Inductive mutual information estimation: A convex maximum-entropy copula approach

Yves-Laurent Kom Samo

Keywords Paper

0

0

0

0

2:57

06/12/2020

A Continuous-Time Mirror Descent Approach to Sparse Phase Retrieval

Fan Wu, Patrick Rebeschini

Keywords Paper

0

0

0

0

3:22

14/09/2020

Model-based Clustering with HDBSCAN*

Michael Strobl, Joerg Sander, Ricardo Campello, Osmar Zaiane

Keywords Paper

hierarchical clustering, expectation maximization, model selection

0

0

0

0

15:31

26/08/2020

Unconditional Coresets for Regularized Loss Minimization

Alireza Samadian, Kirk Pruhs, Benjamin Moseley and
Sungjin Im, Ryan Curtin

Keywords Paper

0

0

0

0

15:15

04/08/2021

Exact Recovery of Clusters in Finite Metric Spaces Using Oracle Queries

Marco Bressan, Nicolò Cesa-Bianchi, Silvio Lattanzi, Andrea Paudice

Keywords Paper

0

0

0

0

16:56

06/12/2020

Truncated Linear Regression in High Dimensions

Constantinos Daskalakis, Dhruv Rohatgi, Emmanouil Zampetakis

Keywords Paper

0

0

0

0

3:17

26/08/2020

Gaussian Sketching yields a J-L Lemma in RKHS

Samory Kpotufe, Bharath Sriperumbudur

Keywords Paper

0

0

0

0

15:17

22/06/2020

Fast sampling and counting 𝑘-SAT solutions in the local lemma regime

Weiming Feng, Heng Guo, Yitong Yin, Chihao Zhang

Keywords Paper

Theory of computation, Randomness, geometry and discrete structures, Random walks and Markov chains

0

0

0

0

21:08

18/07/2021

Dimensionality Reduction for the Sum-of-Distances Metric

Zhili Feng, Praneeth Kacham, David Woodruff

Keywords Paper

Neuroscience and Cognitive Science, Deep Learning, Biologically Plausible Deep Networks; Neuroscience and Cognitive Science, Connectomics; Neuroscience and Cog, Algorithms, Dimensionality Reduction

0

0

0

0

17:12

03/08/2020

Exponentially faster shortest paths in the congested clique

Michal Dory, Merav Parter

Keywords Paper

congested clique, shortest paths, near-additive emulator

0

0

0

0

23:50

18/07/2021

Approximate Group Fairness for Clustering

Bo Li, Lijun Li, Ankang Sun and
Chenhao Wang, Yingfan Wang

Keywords Paper

Social Aspects of Machine Learning, Fairness, Accountability, and Transparency

0

0

0

0

5:39

06/12/2021

Consistent Estimation for PCA and Sparse Regression with Oblivious Outliers

Tommaso d'Orsi, Chih-Hung Liu, Rajai Nasser and
Gleb Novikov, David Steurer, Stefan Tiegel

Keywords Paper

optimization

0

0

0

0

10:44

06/12/2020

Robust Sub-Gaussian Principal Component Analysis and Width-Independent Schatten Packing

Arun Jambulapati, Jerry Li, Kevin Tian

Keywords Paper

0

0

0

0

3:22

12/07/2020

Robust One-Bit Recovery via ReLU Generative Networks: Near-Optimal Statistical Rate and Global Landscape Analysis

Shuang Qiu, Xiaohan Wei, Zhuoran Yang

Keywords Paper

Optimization - Non-convex

0

0

0

0

15:06

06/12/2020

A novel variational form of the Schatten-$p$ quasi-norm

Paris Giampouras, Rene Vidal, Athanasios Rontogiannis, Benjamin Haeffele

Keywords Paper

0

0

0

0

3:14

18/07/2021

Sharper Generalization Bounds for Clustering

Shaojie Li, Yong Liu

Keywords Paper

Deep Learning, Algorithms, Clustering, Applications, Natural Language Processing

0

0

0

0

5:17