Statistical Bias in Dataset Replication

12/07/2020

Statistical Bias in Dataset Replication

Logan Engstrom, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Jacob Steinhardt, Aleksander Madry

Keywords: Trustworthy Machine Learning

Abstract Paper Similar Papers

Abstract: Dataset replication is a useful tool for assessing whether models have overfit to a specific validation set or the exact circumstances under which it was generated. In this paper, we highlight the importance of statistical modeling in dataset replication: we present unintuitive yet pervasive ways in which statistical bias, when left unmitigated, can skew results. Specifically, we examine ImageNet-v2, a replication of the ImageNet dataset that induces a significant drop in model accuracy, presumed to be caused by a benign distribution shift between the datasets. We show, however, that by identifying and accounting for the aforementioned bias, we can explain the vast majority of this accuracy drop. We conclude with concrete recommendations for recognizing and avoiding bias in dataset replication.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICML 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

19/10/2020

A framework for analyzing the impact of missing data in predictive models

Fabiola Santore, Eduardo C. Almeida, Wagner H. Bonat and
Eduardo H. M. Pena, Luiz Eduardo S. Oliveira

Keywords Paper

predictive model, missing data, data simulation

0

0

0

0

6:59

03/05/2021

Deciphering and Optimizing Multi-Task Learning: a Random Matrix Approach

Malik Tiomoko, Hafiz Tiomoko Ali, Romain Couillet

Keywords Paper

Transfer Learning, Random Matrix Theory, Multi Task Learning

0

0

0

0

11:15

03/05/2021

Combining Ensembles and Data Augmentation Can Harm Your Calibration

Yeming Wen, Ghassen Jerfel, Rafael Müller and
Michael W Dusenberry, Jasper Snoek, Balaji Lakshminarayanan, Dustin Tran

Keywords Paper

Uncertainty estimates, Ensembles, Calibration

0

0

0

0

6:10

03/05/2021

Understanding the failure modes of out-of-distribution generalization

Vaishnavh Nagarajan, Anders J Andreassen, Behnam Neyshabur

Keywords Paper

theoretical study, spurious correlations, out-of-distribution generalization, empirical risk minimization

0

1

0

1

5:12

26/08/2020

Mitigating Overfitting in Supervised Classification from Two Unlabeled Datasets: A Consistent Risk Correction Approach

Nan Lu, Tianyi Zhang, Gang Niu, Masashi Sugiyama

Keywords Paper

0

0

0

0

10:16

06/12/2021

Understanding the Limits of Unsupervised Domain Adaptation via Data Poisoning

Akshay Mehra, Bhavya Kailkhura, Pin-Yu Chen, Jihun Hamm

Keywords Paper

robustness, domain adaptation

0

0

0

0

13:34

06/12/2021

Evaluating model performance under worst-case subpopulations

Mike Li, Hongseok Namkoong, Shangzhou Xia

Keywords Paper

robustness, fairness

0

0

0

0

5:45

06/12/2021

Realistic evaluation of transductive few-shot learning

Olivier Veilleux, Malik Boudiaf, Pablo Piantanida, Ismail Ben Ayed

Keywords Paper

optimization, machine learning, few shot learning

0

0

0

0

10:21

26/08/2020

Semi-Modular Inference: enhanced learning in multi-modular models by tempering the influence of components

Christian Carmona, Geoff Nicholls

Keywords Paper

0

0

0

0

14:59

06/12/2020

Simplify and Robustify Negative Sampling for Implicit Collaborative Filtering

Jingtao Ding, Yuhan Quan, Quanming Yao and
Yong Li, Depeng Jin

Keywords Paper

1

0

0

0

3:20

12/07/2020

Missing Data Imputation using Optimal Transport

Boris Muzellec, Julie Josse, Claire Boyer, Marco Cuturi

Keywords Paper

Unsupervised and Semi-Supervised Learning

0

0

0

1

13:22

13/04/2021

Comparing the value of labeled and unlabeled data in method-of-moments latent variable estimation

Mayee Chen, Benjamin Cohen-Wang, Stephen Mussmann and
Frederic Sala, Christopher Re

Keywords Paper

0

0

0

0

3:04

26/08/2020

A Theoretical and Practical Framework for Regression and Classification from Truncated Samples

Andrew Ilyas, Emmanouil Zampetakis, Constantinos Daskalakis

Keywords Paper

0

0

0

0

15:28

06/12/2020

Functional Regularization for Representation Learning: A Unified Theoretical Perspective

Siddhant Garg, Yingyu Liang

Keywords Paper

0

0

0

0

3:19

06/12/2020

Randomized tests for high-dimensional regression: A more efficient and powerful solution

Yue Li, Ilmun Kim, Yuting Wei

Keywords Paper

0

0

0

0

3:24

19/08/2021

On Sampled Metrics for Item Recommendation (Extended Abstract)

Walid Krichene, Steffen Rendle

Keywords Paper

Machine Learning, Recommender Systems

0

0

0

0

15:39

03/05/2021

Contemplating Real-World Object Classification

Ali Borji

Keywords Paper

Robustness, object recognition, deep learning, ObjectNet

0

0

0

0

5:12

03/08/2020

Flexible Approximate Inference via Stratified Normalizing Flows

Chris Cundy, Stefano Ermon

Keywords Paper

0

0

0

0

7:33

26/08/2020

A Robust Univariate Mean Estimator is All You Need

Adarsh Prasad, Sivaraman Balakrishnan, Pradeep Ravikumar

Keywords Paper

0

0

0

0

13:59

26/08/2020

Robust Learning from Discriminative Feature Feedback

Sanjoy Dasgupta, Sivan Sabato

Keywords Paper

0

0

0

0

14:37

03/05/2021

Tomographic Auto-Encoder: Unsupervised Bayesian Recovery of Corrupted Data

Francesco Tonolini, Pablo Garcia Moreno, Andreas Damianou, Roderick Murray-Smith

Keywords Paper

Missing value imputation, variational auto-encoders, variational inference

0

0

0

0

5:09

18/07/2021

Examining and Combating Spurious Features under Distribution Shift

Chunting Zhou, Xuezhe Ma, Paul Michel, Graham Neubig

Keywords Paper

Deep Learning, Embedding and Representation learning

0

0

0

0

5:53

14/06/2020

Deep Generative Model for Robust Imbalance Classification

Xinyue Wang, Yilin Lyu, Liping Jing

Keywords Paper

imbalance classification, deep generative classifier, generative modelrobust classification

0

0

0

0

1:01

06/12/2021

Overcoming the curse of dimensionality with Laplacian regularization in semi-supervised learning

Vivien Cabannes, Loucas Pillaud-Vivien, Francis Bach, Alessandro Rudi

Keywords Paper

machine learning, kernel methods, semi-supervised learning

0

0

0

0

14:24

23/08/2020

On sampled metrics for item recommendation

Walid Krichene, Steffen Rendle

Keywords Paper

item recommendation, sampled metric, evaluation, metrics

0

0

0

0

16:46

12/07/2020

Learning with Multiple Complementary Labels

LEI FENG, Takuo Kaneko, Bo Han and
Gang Niu, Bo An, Masashi Sugiyama

Keywords Paper

Unsupervised and Semi-Supervised Learning

0

0

0

0

10:19

06/12/2021

Consistency Regularization for Variational Auto-Encoders

Samarth Sinha, Adji Bousso Dieng

Keywords Paper

deep learning, machine learning, self-supervised learning, generative model, contrastive learning, representation learning

0

0

0

0

10:52

06/12/2020

Variational Bayesian Unlearning

Quoc Phong Nguyen, Bryan Kian Hsiang Low, Patrick Jaillet

Keywords Paper

0

0

0

0

3:11

14/06/2020

ViewAL: Active Learning With Viewpoint Entropy for Semantic Segmentation

Yawar Siddiqui, Julien Valentin, Matthias Nießner

Keywords Paper

active learning, semantic segmentation, deep learning, view consistency

0

0

0

0

1:01

06/12/2021

Learning latent causal graphs via mixture oracles

Bohdan Kivva, Goutham Rajendran, Pradeep Ravikumar, Bryon Aragam

Keywords Paper

graph learning

0

0

0

0

12:33

13/04/2021

Principal component regression with semirandom observations via matrix completion

Aditya Bhaskara, Aravinda Kanchana Ruwanpathirana, Maheshakya Wijewardena

Keywords Paper

0

0

0

0

2:48

06/12/2021

Center Smoothing: Certified Robustness for Networks with Structured Outputs

Aounon Kumar, Tom Goldstein

Keywords Paper

machine learning, robustness, adversarial robustness and security, vision, generative model

0

0

0

0

8:54

02/02/2021

Semi-Supervised Learning with Variational Bayesian Inference and Maximum Uncertainty Regularization

Kien Do, Truyen Tran, Svetha Venkatesh

Keywords Paper

0

0

0

0

16:56

06/12/2021

Unbiased Classification through Bias-Contrastive and Bias-Balanced Learning

Youngkyu Hong, Eunho Yang

Keywords Paper

machine learning, contrastive learning, fairness

0

0

0

0

11:29

07/09/2020

Non-Probabilistic Cosine Similarity Loss for Few-Shot Image Classification

Joonhyuk Kim, Inug Yoon, Gyeong-Moon Park, Jong-Hwan Kim

Keywords Paper

few-shot learning, image classification, NPC loss

0

0

0

0

4:59

14/06/2020

Probabilistic Pixel-Adaptive Refinement Networks

Anne S. Wannenwetsch, Stefan Roth

Keywords Paper

dense prediction, refinement, probability, confidences, adaptive convolution, optical flow, semantic segmentation

0

0

0

0

1:00

06/12/2021

Bayesian Adaptation for Covariate Shift

Aurick Zhou, Sergey Levine

Keywords Paper

deep learning, machine learning, robustness, vision, domain adaptation

0

0

0

0

8:21

06/12/2021

Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective

Zhengzhuo Xu, Zenghao Chai, Chun Yuan

Keywords Paper

theory, machine learning

0

0

0

0

4:23

12/07/2020

Adaptive Sampling for Estimating Probability Distributions

Shubhanshu Shekhar, Tara Javidi, Mohammad Ghavamzadeh

Keywords Paper

Online Learning, Active Learning, and Bandits

0

0

0

0

15:11

26/08/2020

Feature relevance quantification in explainable AI: A causal problem

Dominik Janzing, Lenon Minorics, Patrick Bloebaum

Keywords Paper

0

0

0

0

14:50