Sample Amplification: Increasing Dataset Size even when Learning is Impossible

12/07/2020

Sample Amplification: Increasing Dataset Size even when Learning is Impossible

Brian Axelrod, Shivam Garg, Vatsal Sharan, Gregory Valiant

Keywords: Learning Theory

Abstract Paper Similar Papers

Abstract: Given data drawn from an unknown distribution, D, to what extent is it possible to ``amplify'' this dataset and faithfully output an even larger set of samples that appear to have been drawn from D? We formalize this question as follows: an (n,m) amplification procedure takes as input n independent draws from an unknown distribution D, and outputs a set of m > n ``samples'' which must be indistinguishable from m samples drawn iid from D. We consider this sample amplification problem in two fundamental settings: the case where D is an arbitrary discrete distribution supported on k elements, and the case where D is a d-dimensional Gaussian with unknown mean, and fixed covariance matrix. Perhaps surprisingly, we show a valid amplification procedure exists for both of these settings, even in the regime where the size of the input dataset, n, is significantly less than what would be necessary to learn distribution D to non-trivial accuracy. We also show that our procedures are optimal up to constant factors. Beyond these results, we describe potential applications of such data amplification, and formalize a number of curious directions for future research along this vein.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICML 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2020

Truncated Linear Regression in High Dimensions

Constantinos Daskalakis, Dhruv Rohatgi, Emmanouil Zampetakis

Keywords Paper

0

0

0

0

3:17

06/12/2021

Consistent Estimation for PCA and Sparse Regression with Oblivious Outliers

Tommaso d'Orsi, Chih-Hung Liu, Rajai Nasser and
Gleb Novikov, David Steurer, Stefan Tiegel

Keywords Paper

optimization

0

0

0

0

10:44

18/07/2021

High-dimensional Experimental Design and Kernel Bandits

Romain Camilleri, Kevin Jamieson, Julian Katz-Samuels

Keywords Paper

Reinforcement Learning and Planning, Bandits

0

0

0

0

16:38

13/04/2021

Robust mean estimation on highly incomplete data with arbitrary outliers

Lunjia Hu, Omer Reingold

Keywords Paper

0

0

0

0

2:56

23/08/2020

Imputing various incomplete attributes via distance likelihood maximization

Shaoxu Song, Yu Sun

Keywords Paper

distance likelihood, incomplete data, data imputation

0

0

0

0

11:45

26/08/2020

A Unified Statistically Efficient Estimation Framework for Unnormalized Models

Masatoshi Uehara, Takafumi Kanamori, Takashi Takenouchi, Takeru Matsuda

Keywords Paper

0

0

0

0

13:58

13/04/2021

Efficient statistics for sparse graphical models from truncated samples

Arnab Bhattacharyya, Rathin Desai, Sai Ganesh Nagarajan, Ioannis Panageas

Keywords Paper

0

0

0

0

2:56

06/12/2021

Loss function based second-order Jensen inequality and its application to particle variational inference

Futoshi Futami, Tomoharu Iwata, naonori ueda and
Issei Sato, Masashi Sugiyama

Keywords Paper

optimization, generative model

0

0

0

0

14:09

06/12/2020

Constraining Variational Inference with Geometric Jensen-Shannon Divergence

Jacob Deasy, Nikola Simidjievski, Pietro Lió

Keywords Paper

Theory -> Control Theory, Algorithms -> Online Learning

0

0

0

0

3:11

18/07/2021

Exact Gap between Generalization Error and Uniform Convergence in Random Feature Models

Zitong Yang, Yu Bai, Song Mei

Keywords Paper

Theory, Deep learning Theory

0

0

0

0

5:40

06/12/2020

The Generalized Lasso with Nonlinear Observations and Generative Priors

Zhaoqiang Liu, Jonathan Scarlett

Keywords Paper

0

0

0

0

3:13

26/08/2020

Asymptotic Analysis of Sampling Estimators for Randomized Numerical Linear Algebra Algorithms

Ping Ma, Xinlian Zhang, Xin Xing and
Jingyi Ma, Michael Mahoney

Keywords Paper

0

0

0

0

16:03

22/06/2020

Extractors for adversarial sources via extremal hypergraphs

Eshan Chattopadhyay, Jesse Goodman, Vipul Goyal, Xin Li

Keywords Paper

randomness extractors, non-malleable extractors, extremal hypergraphs, explicit constructions, cap sets, Ramsey graphs

0

0

0

0

28:16

19/08/2021

Improved Guarantees and a Multiple-descent Curve for Column Subset Selection and the Nystrom Method (Extended Abstract)

Michał Dereziński, Rajiv Khanna, Michael W. Mahoney

Keywords Paper

Machine Learning, Dimensionality Reduction, Explainable/Interpretable Machine Learning, Kernel Methods, Unsupervised Learning

0

0

0

0

13:48

12/07/2020

Curse of Dimensionality on Randomized Smoothing for Certifiable Robustness

Aounon Kumar, Alexander Levine, Tom Goldstein, Soheil Feizi

Keywords Paper

Adversarial Examples

0

0

0

0

14:48

06/12/2020

Estimation and Imputation in Probabilistic Principal Component Analysis with Missing Not At Random Data

Aude Sportisse, Claire Boyer, Julie Josse

Keywords Paper

, Algorithms -> Online Learning

0

0

0

0

3:20

06/12/2020

Quantile Propagation for Wasserstein-Approximate Gaussian Processes

Rui Zhang, Christian Walder, Edwin Bonilla and
Marian-Andrei Rizoiu, Lexing Xie

Keywords Paper

0

0

0

0

3:17

06/12/2021

Tighter Expected Generalization Error Bounds via Wasserstein Distance

Borja Rodríguez Gálvez, German Bassi, Ragnar Thobaben, Mikael Skoglund

Keywords Paper

0

0

0

0

14:11

06/12/2021

Towards Sample-Optimal Compressive Phase Retrieval with Sparse and Generative Priors

Zhaoqiang Liu, Subhroshekhar Ghosh, Jonathan Scarlett

Keywords Paper

theory, optimization, generative model

0

0

0

0

10:41

06/12/2020

Faster Wasserstein Distance Estimation with the Sinkhorn Divergence

Lénaïc Chizat, Pierre Roussillon, Flavien Léger and
François-Xavier Vialard, Gabriel Peyré

Keywords Paper

0

0

1

1

3:21

06/12/2021

Determinantal point processes based on orthogonal polynomials for sampling minibatches in SGD

Rémi Bardenet, Subhroshekhar Ghosh, Meixia LIN

Keywords Paper

optimization, machine learning

0

0

0

0

14:51

13/04/2021

Hadamard wirtinger flow for sparse phase retrieval

Fan Wu, Patrick Rebeschini

Keywords Paper

0

0

0

0

3:01

06/12/2021

Near-optimal Offline and Streaming Algorithms for Learning Non-Linear Dynamical Systems

Suhas Kowshik, Dheeraj Nagaraj, Prateek Jain, Praneeth Netrapalli

Keywords Paper

theory

0

0

0

0

14:43

06/12/2021

Nonparametric estimation of continuous DPPs with kernel methods

Michaël Fanuel, Rémi Bardenet

Keywords Paper

optimization, machine learning, kernel methods, interpretability

0

0

0

0

13:48

06/12/2020

Sampling from a k-DPP without looking at all items

Daniele Calandriello, Michal Derezinski, Michal Valko

Keywords Paper

0

0

0

0

3:23

06/12/2020

Improved guarantees and a multiple-descent curve for Column Subset Selection and the Nystrom method

Michal Derezinski, Rajiv Khanna, Michael W Mahoney

Keywords Paper

0

0

0

0

3:30

13/04/2021

Convergence of gaussian-smoothed optimal transport distance with sub-gamma distributions and dependent samples

Yixing Zhang, Xiuyuan Cheng, Galen Reeves

Keywords Paper

0

0

0

0

3:06

13/04/2021

Inductive mutual information estimation: A convex maximum-entropy copula approach

Yves-Laurent Kom Samo

Keywords Paper

0

0

0

0

2:57

14/09/2020

Weak approximation of transformed stochastic gradient MCMC

Soma Yokoi, Takuma Otsuka, Issei Sat

Keywords Paper

0

0

0

0

13:39

03/05/2021

Faster Binary Embeddings for Preserving Euclidean Distances

Jinjie Zhang, Rayan Saab

Keywords Paper

Binary Embeddings, Sigma Delta Quantization, Johnson-Lindenstrauss Transforms

0

0

0

0

4:32

06/12/2020

Probabilistic Circuits for Variational Inference in Discrete Graphical Models

Andy Shih, Stefano Ermon

Keywords Paper

0

0

0

0

3:18

06/12/2021

Covariance-Aware Private Mean Estimation Without Private Covariance Estimation

Gavin Brown, Marco Gaboardi, Adam Smith and
Jonathan Ullman, Lydia Zakynthinou

Keywords Paper

theory, privacy

0

0

0

0

14:33

09/07/2020

On the Multiple Descent of Minimum-Norm Interpolants and Restricted Lower Isometry of Kernels

Tengyuan Liang, Alexander Rakhlin, Xiyu Zhai

Keywords Paper

Supervised learning, Excess risk bounds and generalization error bounds, High-dimensional statistics, Kernel methods, Regression

0

0

0

0

14:56

18/07/2021

Consistent regression when oblivious outliers overwhelm

Tommaso d'Orsi, Gleb Novikov, David Steurer

Keywords Paper

Theory, Game Theory and Computational Economics, Theory, Theory, Computational Complexity

0

0

0

0

4:42

12/07/2020

Optimal Statistical Guaratees for Adversarially Robust Gaussian Classification

Chen Dan, Yuting Wei, Pradeep Ravikumar

Keywords Paper

Learning Theory

0

0

0

0

14:36

06/12/2021

MCMC Variational Inference via Uncorrected Hamiltonian Annealing

Tomas Geffner, Justin Domke

Keywords Paper

generative model

0

0

0

0

4:22

06/12/2020

Efficient Distance Approximation for Structured High-Dimensional Distributions via Learning

Arnab Bhattacharyya, Sutanu Gayen, Kuldeep S Meel, N. V. Vinodchandran

Keywords Paper

0

0

0

0

2:56

13/04/2021

Completing the picture: Randomized smoothing suffers from the curse of dimensionality for a large family of distributions

Yihan Wu, Aleksandar Bojchevski, Aleksei Kuvshinov, Stephan Günnemann

Keywords Paper

0

0

0

0

3:04

06/12/2021

Analytic Study of Families of Spurious Minima in Two-Layer ReLU Neural Networks: A Tale of Symmetry II

Yossi Arjevani, Michael Field

Keywords Paper

theory, deep learning, optimization

0

0

0

0

8:40

06/12/2021

Improved Coresets and Sublinear Algorithms for Power Means in Euclidean Spaces

Vincent Cohen-Addad, David Saulpic, Chris Schwiegelshohn

Keywords Paper

clustering

0

0

0

0

16:06