Sampling from a k-DPP without looking at all items

Abstract: Determinantal point processes (DPPs) are a useful probabilistic model for selecting a small diverse subset out of a large collection of items, with applications in summarization, recommendation, stochastic optimization, experimental design and more. Given a kernel function and a subset size k, our goal is to sample k out of n items with probability proportional to the determinant of the kernel matrix induced by the subset (a.k.a. k-DPP). Existing k-DPP sampling algorithms require an expensive preprocessing step which involves multiple passes over all n items, making it infeasible for large datasets. A naïve heuristic addressing this problem is to uniformly subsample a fraction of the data and perform k-DPP sampling only on those items, however this method offers no guarantee that the produced sample will even approximately resemble the target distribution over the original dataset. In this paper, we develop alpha-DPP, an algorithm which adaptively builds a sufficiently large uniform sample of data that is then used to efficiently generate a smaller set of k items, while ensuring that this set is drawn exactly from the target distribution defined on all n items. We show empirically that our algorithm produces a k-DPP sample after observing only a small fraction of all elements, leading to several orders of magnitude faster performance compared to the state-of-the-art. Our implementation of alpha-DPP is provided at https://github.com/guilgautier/DPPy/.

19/01/2020

Algorithms, Meta-Learning, Algorithms, Few-Shot Learning; Algorithms, Multitask and Transfer Learning, Theory, Statistical Learning Theory

5:03

03/05/2021

Sampling from a k-DPP without looking at all items

Daniele Calandriello, Michal Derezinski, Michal Valko

Comments

Similar Papers

Optimal Approximate Sampling From Discrete Probability Distributions

Feras Saad, Cameron Freer, Martin Rinard, Vikash Mansinghka

Keywords Abstract Paper

random variate generation, discrete random variables

Sequential Density Ratio Estimation for Simultaneous Optimization of Speed and Accuracy

Akinori Ebihara, Taiki Miyagawa, Kazuyuki Sakurai, Hitoshi Imaoka

Keywords Abstract Paper

Density ratio estimation, Early classification, Sequential probability ratio test

List-Decodable Mean Estimation in Nearly-PCA Time

Ilias Diakonikolas, Daniel Kane, Daniel Kongsgaard and Jerry Li, Kevin Tian

Keywords Abstract Paper

theory, clustering

Identity testing for Mallows model

Róbert Busa-Fekete, Dimitris Fotakis, Balazs Szorenyi, Emmanouil Zampetakis

Keywords Abstract Paper

Locally Private Hypothesis Selection

Sivakanth Gopi, Gautam Kamath, Janardhan D Kulkarni and Aleksandar Nikolov, Steven Wu, Huanyu Zhang

Keywords Abstract Paper

Privacy, fairness, Distribution learning/testing

Towards multi-sequence MR image recovery from undersampled k-space data

Cheng Peng, Wei-An Lin, Rama Chellappa, S. Kevin Zhou

Keywords Abstract Paper

Determinantal point processes based on orthogonal polynomials for sampling minibatches in SGD

Rémi Bardenet, Subhroshekhar Ghosh, Meixia LIN

Keywords Abstract Paper

optimization, machine learning

A Fast Spectral Algorithm for Mean Estimation with Sub-Gaussian Rates

Zhixian Lei, Kyle Luh, Prayaag Venkat, Fred Zhang

Keywords Abstract Paper

High-dimensional statistics, Adversarial learning and robustness

Non-adaptive adaptive sampling on turnstile streams

Sepideh Mahabadi, Ilya Razenshteyn, David P. Woodruff, Samson Zhou

Keywords Abstract Paper

volume maximization, determinantal point processes, computational geometry, streaming algorithms

Meta Learning for Support Recovery in High-dimensional Precision Matrix Estimation

Qian Zhang, Yilin Zheng, Jean Honorio

Keywords Abstract Paper

Algorithms, Meta-Learning, Algorithms, Few-Shot Learning; Algorithms, Multitask and Transfer Learning, Theory, Statistical Learning Theory

Effective Distributed Learning with Random Features: Improved Bounds and Algorithms

Yong Liu, Jiankun Liu, Shuqiang Wang

Keywords Abstract Paper

statistical learning theory, kernel methods, Risk bound

Consistent Estimation for PCA and Sparse Regression with Oblivious Outliers

Tommaso d'Orsi, Chih-Hung Liu, Rajai Nasser and Gleb Novikov, David Steurer, Stefan Tiegel

Keywords Abstract Paper

optimization

Learning-to-rank with partitioned preference: Fast estimation for the plackett-luce model

Jiaqi Ma, Xinyang Yi, Weijing Tang and Zhe Zhao, Lichan Hong, Ed Chi, Qiaozhu Mei

Keywords Abstract Paper

List-Decodable Mean Estimation via Iterative Multi-Filtering

Ilias Diakonikolas, Daniel Kane, Daniel Kongsgaard

Keywords Abstract Paper

Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model

Gen Li, Yuting Wei, Yuejie Chi and Yuantao Gu, Yuxin Chen

Keywords Abstract Paper

Source Identification for Mixtures of Product Distributions

Spencer Gordon, Bijan H Mazaheri, Yuval Rabani, Leonard Schulman

Keywords Abstract Paper

MAP Inference for Customized Determinantal Point Processes via Maximum Inner Product Search

Insu Han, Jennifer Gillenwater

Keywords Abstract Paper

Efficiently learning structured distributions from untrusted batches

Sitan Chen, Jerry Li, Ankur Moitra

Keywords Abstract Paper

sum-of-squares, federated learning, VC complexity, Robust statistics

Gradient descent in RKHS with importance labeling

Tomoya Murata, Taiji Suzuki

Keywords Abstract Paper

Realistic evaluation of transductive few-shot learning

Olivier Veilleux, Malik Boudiaf, Pablo Piantanida, Ismail Ben Ayed

Keywords Abstract Paper

optimization, machine learning, few shot learning

One-Bit Compressed Sensing via One-Shot Hard Thresholding

Jie Shen

Keywords Abstract Paper

High-dimensional Experimental Design and Kernel Bandits

Keywords Paper

Keywords Paper

Ilias Diakonikolas, Daniel Kane, Daniel Kongsgaard and
Jerry Li, Kevin Tian

Keywords Paper

Keywords Paper

Sivakanth Gopi, Gautam Kamath, Janardhan D Kulkarni and
Aleksandar Nikolov, Steven Wu, Huanyu Zhang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Tommaso d'Orsi, Chih-Hung Liu, Rajai Nasser and
Gleb Novikov, David Steurer, Stefan Tiegel

Keywords Paper

Jiaqi Ma, Xinyang Yi, Weijing Tang and
Zhe Zhao, Lichan Hong, Ed Chi, Qiaozhu Mei

Keywords Paper

Keywords Paper

Gen Li, Yuting Wei, Yuejie Chi and
Yuantao Gu, Yuxin Chen

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Shuang Cui, Kai Han, Tianshuai Zhu and
Jing Tang, Benwei Wu, He Huang

Keywords Paper

Keywords Paper

Keywords Paper

Gen Li, Laixi Shi, Yuxin Chen and
Yuantao Gu, Yuejie Chi

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Ping Ma, Xinlian Zhang, Xin Xing and
Jingyi Ma, Michael Mahoney

Keywords Paper

Keywords Paper

Alireza Samadian, Kirk Pruhs, Benjamin Moseley and
Sungjin Im, Ryan Curtin

Keywords Paper