Sequential Algorithms for Testing Closeness of Distributions

06/12/2021

Sequential Algorithms for Testing Closeness of Distributions

Aadil Oufkir, Omar Fawzi, Nicolas Flammarion, Aurélien Garivier

Keywords: theory

Abstract Paper Similar Papers

Abstract: What advantage do sequential procedures provide over batch algorithms for testing properties of unknown distributions? Focusing on the problem of testing whether two distributions $\mathcal{D}_1$ and $\mathcal{D}_2$ on $\{1,\dots, n\}$ are equal or $\epsilon$-far, we give several answers to this question. We show that for a small alphabet size $n$, there is a sequential algorithm that outperforms any batch algorithm by a factor of at least $4$ in terms sample complexity. For a general alphabet size $n$, we give a sequential algorithm that uses no more samples than its batch counterpart, and possibly fewer if the actual distance between $\mathcal{D}_1$ and $\mathcal{D}_2$ is larger than $\epsilon$. As a corollary, letting $\epsilon$ go to $0$, we obtain a sequential algorithm for testing closeness (with no a priori bound on the distance between $\mathcal{D}_1$ and $\mathcal{D}_2$) with a sample complexity $\tilde{\mathcal{O}}(\frac{n^{2/3}}{TV(\mathcal{D}_1, \mathcal{D}_2)^{4/3}})$: this improves over the $\tilde{\mathcal{O}}(\frac{n/\log n}{TV(\mathcal{D}_1, \mathcal{D}_2)^{2} })$ tester of [Daskalakis and Kawase 2017] and is optimal up to multiplicative constants. We also establish limitations of sequential algorithms for the problem of testing closeness: they can improve the worst case number of samples by at most a constant factor.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

04/08/2021

Near-Optimal Entrywise Sampling of Numerically Sparse Matrices

Vladimir Braverman, Robert Krauthgamer, Aditya R Krishnan, Shay Sapir

Keywords Paper

0

0

0

0

16:59

06/12/2021

Improved Coresets and Sublinear Algorithms for Power Means in Euclidean Spaces

Vincent Cohen-Addad, David Saulpic, Chris Schwiegelshohn

Keywords Paper

clustering

0

0

0

0

16:06

06/12/2021

Consistent Estimation for PCA and Sparse Regression with Oblivious Outliers

Tommaso d'Orsi, Chih-Hung Liu, Rajai Nasser and
Gleb Novikov, David Steurer, Stefan Tiegel

Keywords Paper

optimization

0

0

0

0

10:44

22/06/2020

Non-adaptive adaptive sampling on turnstile streams

Sepideh Mahabadi, Ilya Razenshteyn, David P. Woodruff, Samson Zhou

Keywords Paper

volume maximization, determinantal point processes, computational geometry, streaming algorithms

0

0

0

0

25:07

09/07/2020

Locally Private Hypothesis Selection

Sivakanth Gopi, Gautam Kamath, Janardhan D Kulkarni and
Aleksandar Nikolov, Steven Wu, Huanyu Zhang

Keywords Paper

Privacy, fairness, Distribution learning/testing

0

0

0

0

14:58

06/12/2021

List-Decodable Mean Estimation in Nearly-PCA Time

Ilias Diakonikolas, Daniel Kane, Daniel Kongsgaard and
Jerry Li, Kevin Tian

Keywords Paper

theory, clustering

0

0

0

0

14:21

06/12/2020

An Optimal Elimination Algorithm for Learning a Best Arm

Avinatan Hassidim, Ron Kupfer, Yaron Singer

Keywords Paper

0

0

0

0

3:23

06/12/2021

Optimal Sketching for Trace Estimation

Shuli Jiang, Hai Pham, David Woodruff, Richard Zhang

Keywords Paper

machine learning

0

0

0

0

15:14

04/08/2021

Breaking The Dimension Dependence in Sparse Distribution Estimation under Communication Constraints

Wei-Ning Chen, Peter Kairouz, Ayfer Ozgur

Keywords Paper

0

0

0

0

15:28

09/07/2020

Bessel Smoothing and Multi-Distribution Property Estimation

Yi Hao, Ping Li

Keywords Paper

Distribution learning/testing, High-dimensional statistics, Information theory

0

0

0

0

14:48

22/06/2020

Algorithms for heavy-tailed statistics: Regression, covariance estimation, and beyond

Yeshwanth Cherapanamjeri, Samuel B. Hopkins, Tarun Kathuria and
Prasad Raghavendra, Nilesh Tripuraneni

Keywords Paper

Sum-of-squares, Algorithms, Heavy-Tailed Estimation

0

0

0

0

20:29

22/06/2020

Efficiently learning structured distributions from untrusted batches

Sitan Chen, Jerry Li, Ankur Moitra

Keywords Paper

sum-of-squares, federated learning, VC complexity, Robust statistics

0

0

0

0

24:38

06/12/2020

Robust Gaussian Covariance Estimation in Nearly-Matrix Multiplication Time

Jerry Li, Guanghao Ye

Keywords Paper

0

0

0

0

3:13

18/07/2021

Sample Complexity of Robust Linear Classification on Separated Data

Robi Bhattacharjee, Somesh Jha, Kamalika Chaudhuri

Keywords Paper

Theory, Statistical Learning Theory

0

0

0

0

5:26

03/05/2021

Sequential Density Ratio Estimation for Simultaneous Optimization of Speed and Accuracy

Akinori Ebihara, Taiki Miyagawa, Kazuyuki Sakurai, Hitoshi Imaoka

Keywords Paper

Density ratio estimation, Early classification, Sequential probability ratio test

0

0

0

0

9:55

06/12/2020

Linear-Sample Learning of Low-Rank Distributions

Ayush Jain, Alon Orlitsky

Keywords Paper

0

0

0

0

3:22

09/07/2020

How Good is SGD with Random Shuffling?

Itay M Safran, Ohad Shamir

Keywords Paper

Convex optimization,

0

0

0

0

11:50

06/12/2020

The Flajolet-Martin Sketch Itself Preserves Differential Privacy: Private Counting with Minimal Space

Adam Smith, Shuang Song, Abhradeep Guha Thakurta

Keywords Paper

0

0

0

0

3:17

06/12/2020

On Adaptive Distance Estimation

Yeshwanth Cherapanamjeri, Jelani Nelson

Keywords Paper

0

0

0

0

3:16

09/07/2020

Better Algorithms for Estimating Non-Parametric Models in Crowd-Sourcing and Rank Aggregation

Allen X Liu, Ankur Moitra

Keywords Paper

Matrix/tensor estimation, Learning with algebraic or combinatorial structure, Ranking and preference learning

0

0

0

0

14:09

06/12/2021

Towards Understanding Why Lookahead Generalizes Better Than SGD and Beyond

Pan Zhou, Hanshu Yan, Xiaotong Yuan and
Jiashi Feng, Shuicheng Yan

Keywords Paper

deep learning, optimization

0

0

0

0

11:43

04/08/2021

The Bethe and Sinkhorn Permanents of Low Rank Matrices and Implications for Profile Maximum Likelihood

Nima Anari, Moses Charikar, Kirankumar Shiragur, Aaron Sidford

Keywords Paper

0

0

0

0

18:20

06/12/2021

Tight High Probability Bounds for Linear Stochastic Approximation with Fixed Stepsize

Alain Durmus, Eric Moulines, Alexey Naumov and
Sergey Samsonov, Kevin Scaman, Hoi-To Wai

Keywords Paper

machine learning

0

0

0

0

12:53

06/12/2021

Corruption Robust Active Learning

Yifang Chen, Simon Du, Kevin Jamieson

Keywords Paper

machine learning, robustness, active learning

0

0

0

0

14:31

18/07/2021

PAGE: A Simple and Optimal Probabilistic Gradient Estimator for Nonconvex Optimization

Zhize Li, Hongyan Bao, Xiangliang Zhang, Peter Richtarik

Keywords Paper

Optimization

0

0

0

0

11:53

09/07/2020

Balancing Gaussian vectors in high dimension

Paxton M Turner, Raghu Meka, Philippe Rigollet

Keywords Paper

Combinatorial optimization, Approximation algorithms, Concentration inequalities, High-dimensional statistics, Stochastic optimization

0

0

0

0

13:39

09/07/2020

Taking a hint: How to leverage loss predictors in contextual bandits?

Chen-Yu Wei, Haipeng Luo, Alekh Agarwal

Keywords Paper

Bandit problems, Online learning

0

0

0

0

14:35

06/12/2021

Coresets for Clustering with Missing Values

Vladimir Braverman, Shaofeng Jiang, Robert Krauthgamer, Xuan Wu

Keywords Paper

clustering

0

0

0

0

10:33

12/07/2020

Improving the Sample and Communication Complexity for Decentralized Non-Convex Optimization: Joint Gradient Estimation and Tracking

Haoran Sun, Songtao Lu, Mingyi Hong

Keywords Paper

Optimization - Non-convex

0

0

0

0

13:56

18/07/2021

Meta Learning for Support Recovery in High-dimensional Precision Matrix Estimation

Qian Zhang, Yilin Zheng, Jean Honorio

Keywords Paper

Algorithms, Meta-Learning, Algorithms, Few-Shot Learning; Algorithms, Multitask and Transfer Learning, Theory, Statistical Learning Theory

0

0

0

0

5:03

06/12/2021

Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning

Gen Li, Laixi Shi, Yuxin Chen and
Yuantao Gu, Yuejie Chi

Keywords Paper

theory, reinforcement learning and planning

0

0

0

0

15:32

03/05/2021

Faster Binary Embeddings for Preserving Euclidean Distances

Jinjie Zhang, Rayan Saab

Keywords Paper

Binary Embeddings, Sigma Delta Quantization, Johnson-Lindenstrauss Transforms

0

0

0

0

4:32

06/12/2020

Robust Sub-Gaussian Principal Component Analysis and Width-Independent Schatten Packing

Arun Jambulapati, Jerry Li, Kevin Tian

Keywords Paper

0

0

0

0

3:22

18/07/2021

Consistent regression when oblivious outliers overwhelm

Tommaso d'Orsi, Gleb Novikov, David Steurer

Keywords Paper

Theory, Game Theory and Computational Economics, Theory, Theory, Computational Complexity

0

0

0

0

4:42

12/07/2020

Near-optimal sample complexity bounds for learning Latent $k-$polytopes and applications to Ad-Mixtures

Chiranjib Bhattacharyya, Ravindran Kannan

Keywords Paper

Learning Theory

0

0

0

0

15:04

06/12/2021

Instance-Dependent Bounds for Zeroth-order Lipschitz Optimization with Error Certificates

Francois Bachoc, Tom Cesari, Sébastien Gerchinovitz

Keywords Paper

theory, optimization

0

0

0

0

14:51

12/07/2020

Adaptive Sampling for Estimating Probability Distributions

Shubhanshu Shekhar, Tara Javidi, Mohammad Ghavamzadeh

Keywords Paper

Online Learning, Active Learning, and Bandits

0

0

0

0

15:11

09/07/2020

Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations

Yossi Arjevani, Yair Carmon, John Duchi and
Dylan Foster, Ayush Sekhari, Karthik Sridharan

Keywords Paper

Non-convex optimization, Stochastic optimization

0

0

0

0

11:57

06/12/2021

Better Algorithms for Individually Fair $k$-Clustering

Maryam Negahbani, Deeparnab Chakrabarty

Keywords Paper

theory, self-supervised learning, clustering, fairness

0

0

0

0

14:02

22/06/2020

Top-𝑘-convolution and the quest for near-linear output-sensitive subset sum

Karl Bringmann, Vasileios Nakos

Keywords Paper

Subset Sum, pseudopolynomial, output-sensitive, convolution, restricted sumset

0

0

0

0

25:48