Discrepancy Ratio: Evaluating Model Performance When Even Experts Disagree on the Truth

Abstract: In most machine learning tasks unambiguous ground truth labels can easily be acquired. However, this luxury is often not afforded to many high-stakes, real-world scenarios such as medical image interpretation, where even expert human annotators typically exhibit very high levels of disagreement with one another. While prior works have focused on overcoming noisy labels during training, the question of how to evaluate models when annotators disagree about ground truth has remained largely unexplored. To address this, we propose the discrepancy ratio: a novel, task-independent and principled framework for validating machine learning models in the presence of high label noise. Conceptually, our approach evaluates a model by comparing its predictions to those of human annotators, taking into account the degree to which annotators disagree with one another. While our approach is entirely general, we show that in the special case of binary classification, our proposed metric can be evaluated in terms of simple, closed-form expressions that depend only on aggregate statistics of the labels and not on any individual label. Finally, we demonstrate how this framework can be used effectively to validate machine learning models using two real-world tasks from medical imaging. The discrepancy ratio metric reveals what conventional metrics do not: that our models not only vastly exceed the average human performance, but even exceed the performance of the best human experts in our datasets.

Discrepancy Ratio: Evaluating Model Performance When Even Experts Disagree on the Truth

Igor Lovchinsky, Alon Daks, Israel Malkin, Pouya Samangouei, Ardavan Saeedi, Yang Liu, Swami Sankaranarayanan, Tomer Gafner, Ben Sternlieb, Patrick Maher, Nathan Silberman

Comments

Similar Papers

Learning Precise Temporal Point Event Detection with Misaligned Labels

Julien Schroeter, Kirill Sidorov, David Marshall

Keywords Abstract Paper

Disentangling Human Error from Ground Truth in Segmentation of Medical Images

Le Zhang, Ryu Tanno, Moucheng Xu and Chen Jin, Joseph Jacob, Olga Cicarrelli, Frederik Barkhof, Daniel Alexander

Keywords Abstract Paper

Stochastic Segmentation Networks: Modelling Spatially Correlated Aleatoric Uncertainty

Miguel Monteiro, Loic Le Folgoc, Daniel Coelho de Castro and Nick Pawlowski, Bernardo Marques, Konstantinos Kamnitsas, Mark van der Wilk, Ben Glocker

Keywords Abstract Paper

Consistent Right-Invariant Fixed-Lag Smoother with Application to Visual Inertial SLAM

Jianzhu Huai, Yukai Lin, Yuan Zhuang, Min Shi

Keywords Abstract Paper

Representation learning for improved interpretability and classification accuracy of clinical factors from EEG

Garrett Honke, Irina Higgins, Nina Thigpen and Vladimir Miskovic, Katie Link, Sunny Duan, Pramod Gupta, Julia Klawohn, Greg Hajcak

Keywords Abstract Paper

representation learning, beta-VAE, depression, electroencephalography, ERP, EEG, disentanglement

Looking at the whole picture: constrained unsupervised anomaly segmentation

Julio Silva-Rodríguez, Valery Naranjo, Jose Dolz

Keywords Abstract Paper

unsueprvised anomaly localization, brain lesion segmentation, constrained segmentation, size-constrained loss, class-activations maps, CAMs, log-barrier extension, BRATS19

Robust Meta-learning for Mixed Linear Regression with Small Batches

Weihao Kong, Raghav Somani, Sham Kakade, Sewoong Oh

Keywords Abstract Paper

Learning with Instance-Dependent Label Noise: A Sample Sieve Approach

Hao Cheng, Zhaowei Zhu, Xingyu Li and Yifei Gong, Xing Sun, Yang Liu

Keywords Abstract Paper

deep neural networks., instance-based label noise, Learning with noisy labels

Ensembling Low Precision Models for Binary Biomedical Image Segmentation

Tianyu Ma, Hang Zhang, Hanley Ong and Amar Vora, Thanh D. Nguyen, Ajay Gupta, Yi Wang, Mert R. Sabuncu

Keywords Abstract Paper

Uncertainty-Aware Training of Neural Networks for Selective Medical Image Segmentation

Yukun Ding, Jinglan Liu, Xiaowei Xu and Meiping Huang, Jian Zhuang, Jinjun Xiong, Yiyu Shi

Keywords Abstract Paper

Overinterpretation reveals image classification model pathologies

Brandon Carter, Siddhartha Jain, Jonas Mueller, David Gifford

Keywords Abstract Paper

deep learning, machine learning, robustness, adversarial robustness and security, vision, interpretability

Single-Step Adversarial Training With Dropout Scheduling

Vivek B.S., R. Venkatesh Babu

Keywords Abstract Paper

adversarial training, robustness, efficient training, representation learning, generalization, supervised learning, recognition, classification, neural networks, deep learning

SAM: The Sensitivity of Attribution Methods to Hyperparameters

Naman Bansal, Chirag Agarwal, Anh Nguyen

Keywords Abstract Paper

xai, explainable, attribution, sensitivity, robustness, explanation, hyperparameters

NORESQA: A Framework for Speech Quality Assessment using Non-Matching References

Pranay Manocha, Buye Xu, Anurag Kumar

Keywords Abstract Paper

deep learning, robustness, self-supervised learning

Generative Models for Effective ML on Private, Decentralized Datasets

Sean Augenstein, H. Brendan McMahan, Daniel Ramage and Swaroop Ramaswamy, Peter Kairouz, Mingqing Chen, Rajiv Mathews, Blaise Aguera y Arcas

Keywords Abstract Paper

generative models, federated learning, decentralized learning, differential privacy, privacy, security, GAN

Addressing The False Negative Problem of Deep Learning MRI Reconstruction Models by Adversarial Attacks and Robust Training

Kaiyang Cheng, Francesco Calivá, Rutwik Shah and Misung Han, Sharmila Majumdar, Valentina Pedoia

Keywords Abstract Paper

Estimation and Imputation in Probabilistic Principal Component Analysis with Missing Not At Random Data

Aude Sportisse, Claire Boyer, Julie Josse

Keywords Abstract Paper

, Algorithms -> Online Learning

Representation Learning With Statistical Independence to Mitigate Bias

Ehsan Adeli, Qingyu Zhao, Adolf Pfefferbaum and Edith V. Sullivan, Li Fei-Fei, Juan Carlos Niebles, Kilian M. Pohl

Keywords Abstract Paper

Approximate Cross-Validation with Low-Rank Data in High Dimensions

Will Stephenson, Madeleine Udell, Tamara Broderick

Keywords Abstract Paper

Noise Estimation Using Density Estimation for Self-Supervised Multimodal Learning

Elad Amrani, Rami Ben-Ari, Daniel Rotman, Alex Bronstein

Keywords Abstract Paper

Risk Bounds for Over-parameterized Maximum Margin Classification on Sub-Gaussian Mixtures

Yuan Cao, Quanquan Gu, Mikhail Belkin

Keywords Abstract Paper

deep learning, machine learning

ESCAPED: Efficient Secure and Private Dot Product Framework for Kernel-based Machine Learning Algorithms with Applications in Healthcare

Ali Burak Ünal, Mete Akgün, Nico Pfeifer

Keywords Abstract Paper

Keywords Paper

Le Zhang, Ryu Tanno, Moucheng Xu and
Chen Jin, Joseph Jacob, Olga Cicarrelli, Frederik Barkhof, Daniel Alexander

Keywords Paper

Miguel Monteiro, Loic Le Folgoc, Daniel Coelho de Castro and
Nick Pawlowski, Bernardo Marques, Konstantinos Kamnitsas, Mark van der Wilk, Ben Glocker

Keywords Paper

Keywords Paper

Garrett Honke, Irina Higgins, Nina Thigpen and
Vladimir Miskovic, Katie Link, Sunny Duan, Pramod Gupta, Julia Klawohn, Greg Hajcak

Keywords Paper

Keywords Paper

Keywords Paper

Hao Cheng, Zhaowei Zhu, Xingyu Li and
Yifei Gong, Xing Sun, Yang Liu

Keywords Paper

Tianyu Ma, Hang Zhang, Hanley Ong and
Amar Vora, Thanh D. Nguyen, Ajay Gupta, Yi Wang, Mert R. Sabuncu

Keywords Paper

Yukun Ding, Jinglan Liu, Xiaowei Xu and
Meiping Huang, Jian Zhuang, Jinjun Xiong, Yiyu Shi

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Sean Augenstein, H. Brendan McMahan, Daniel Ramage and
Swaroop Ramaswamy, Peter Kairouz, Mingqing Chen, Rajiv Mathews, Blaise Aguera y Arcas

Keywords Paper

Kaiyang Cheng, Francesco Calivá, Rutwik Shah and
Misung Han, Sharmila Majumdar, Valentina Pedoia

Keywords Paper

Keywords Paper

Ehsan Adeli, Qingyu Zhao, Adolf Pfefferbaum and
Edith V. Sullivan, Li Fei-Fei, Juan Carlos Niebles, Kilian M. Pohl

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Joshua Chang, Patrick A Fletcher, Jungmin Han and
Ted Chang, Shashaank Vattikuti, Bart Desmet, Ayah Zirikly, Carson Chow

Keywords Paper

Chao Jia, Yinfei Yang, Ye Xia and
Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc Le, Yun-Hsuan Sung, Zhen Li, Tom Duerig

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Sravanti Addepalli, Vivek B.S., Arya Baburaj and
Gaurang Sriramanan, R. Venkatesh Babu

Keywords Paper

Keywords Paper

Jun Wang, Shaoguo Wen, Jianghua Yu and
Kaixing Chen, Xin Zhou, Peng Gao, Guotong Xie, Changsheng Li

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper