Not All Claims are Created Equal: Choosing the Right Statistical Approach to Assess Hypotheses

04/07/2020

Not All Claims are Created Equal: Choosing the Right Statistical Approach to Assess Hypotheses

Erfan Sadeqi Azer, Daniel Khashabi, Ashish Sabharwal, Dan Roth

Keywords: Natural Processing, NLP, NLP research, Bayesian hypotheses

Abstract Paper Similar Papers

Abstract: Empirical research in Natural Language Processing (NLP) has adopted a narrow set of principles for assessing hypotheses, relying mainly on p-value computation, which suffers from several known issues. While alternative proposals have been well-debated and adopted in other fields, they remain rarely discussed or used within the NLP community. We address this gap by contrasting various hypothesis assessment techniques, especially those not commonly used in the field (such as evaluations based on Bayesian inference). Since these statistical techniques differ in the hypotheses they can support, we argue that practitioners should first decide their target hypothesis before choosing an assessment method. This is crucial because common fallacies, misconceptions, and misinterpretation surrounding hypothesis assessment methods often stem from a discrepancy between what one would like to claim versus what the method used actually assesses. Our survey reveals that these issues are omnipresent in the NLP research community. As a step forward, we provide best practices and guidelines tailored to NLP research, as well as an easy-to-use package for Bayesian assessment of hypotheses, complementing existing tools.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ACL 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

16/11/2020

Interpretable Multi-dataset Evaluation for Named Entity Recognition

Jinlan Fu, Pengfei Liu, Graham Neubig

Keywords Paper

natural tasks, interpretable evaluation, named task, analysis tool

0

0

0

0

11:11

02/02/2021

LIREx: Augmenting Language Inference with Relevant Explanations

Xinyan Zhao, V.G.Vinod Vydiswaran

Keywords Paper

0

0

0

0

18:56

06/12/2021

Debiased Visual Question Answering from Feature and Sample Perspectives

Zhiquan Wen, Guanghui Xu, Mingkui Tan and
Qingyao Wu, Qi Wu

Keywords Paper

vision

0

0

0

0

11:20

04/07/2020

Predictive Biases in Natural Language Processing Models: A Conceptual Framework and Overview

Deven Santosh Shah, H. Andrew Schwartz, Dirk Hovy

Keywords Paper

NLP, Natural Models, Conceptual Framework, mitigation techniques

0

0

0

0

11:52

08/12/2020

Best Practices for Data-Efficient Modeling in NLG:How to Train Production-Ready Neural Models with Less Data

Ankit Arun, Soumya Batra, Vikas Bhardwaj and
Ashwini Challa, Pinar Donmez, Peyman Heidari, Hakan Inan, Shashank Jain, Anuj Kumar, Shawn Mei, Karthik Mohan, Michael White

Keywords Paper

0

0

0

0

15:01

16/11/2020

Pareto Probing: Trading Off Accuracy for Complexity

Tiago Pimentel, Naomi Saphra, Adina Williams, Ryan Cotterell

Keywords Paper

simplistic tasks, pos labeling, dependency labeling, full parsing

0

0

0

0

13:03

04/07/2020

Evaluating Explanation Methods for Neural Machine Translation

Jierui Li, Lemao Liu, Huayang Li and
Guanlin Li, Guoping Huang, Shuming Shi

Keywords Paper

Neural Translation, translation tasks, Explanation Methods, black-box models

0

0

0

0

10:55

18/07/2021

Towards Rigorous Interpretations: a Formalisation of Feature Attribution

Darius Afchar, Vincent Guigue, Romain Hennequin

Keywords Paper

Social Aspects of Machine Learning, Fairness, Accountability, and Transparency

0

0

0

0

5:20

08/12/2020

Intrinsic Quality Assessment of Arguments

Henning Wachsmuth, Till Werner

Keywords Paper

0

0

0

0

9:13

04/07/2020

On Importance Sampling-Based Evaluation of Latent Language Models

Robert L Logan IV, Matt Gardner, Sameer Singh

Keywords Paper

Importance Models, likelihood-based evaluation, Language models, importance sampling

0

0

0

0

7:20

08/12/2020

Situated Data, Situated Systems: A Methodology to Engage with Power Relations in Natural Language Processing Research

Lucy Havens, Melissa Terras, Benjamin Bach, Beatrice Alex

Keywords Paper

0

0

0

0

13:37

02/02/2021

How Linguistically Fair Are Multilingual Pre-Trained Language Models?

Monojit Choudhury, Amit Deshpande

Keywords Paper

0

0

0

0

17:57

08/12/2020

Is MAP Decoding All You Need? The Inadequacy of the Mode in Neural Machine Translation

Bryan Eikema, Wilker Aziz

Keywords Paper

0

0

0

0

12:03

04/07/2020

Towards Robustifying NLI Models Against Lexical Dataset Biases

Xiang Zhou, Mohit Bansal

Keywords Paper

Natural Inference, data augmentation, Robustifying Models, deep models

0

0

0

0

11:34

02/02/2021

REM-Net: Recursive Erasure Memory Network for Commonsense Evidence Refinement

Yinya Huang, Meng Fang, Xunlin Zhan and
Qingxing Cao, Xiaodan Liang

Keywords Paper

0

0

0

0

14:15

02/02/2021

A Case Study of the Shortcut Effects in Visual Commonsense Reasoning

Keren Ye, Adriana Kovashka

Keywords Paper

0

0

0

0

14:26

14/06/2020

On the General Value of Evidence, and Bilingual Scene-Text Visual Question Answering

Xinyu Wang, Yuliang Liu, Chunhua Shen and
Chun Chet Ng, Canjie Luo, Lianwen Jin, Chee Seng Chan, Anton van den Hengel, Liangwei Wang

Keywords Paper

visual question answering, scene text, ocr

0

0

0

0

1:01

19/08/2021

Cardinality Queries over DL-Lite Ontologies

Meghyn Bienvenu, Quentin Manière, Michaël Thomazo

Keywords Paper

Knowledge Representation and Reasoning, Computational Complexity of Reasoning, Description Logics and Ontologies

0

0

0

0

15:02

04/07/2020

Mind the Trade-off: Debiasing NLU Models without Degrading the In-distribution Performance

Prasetya Ajie Utama, Nafise Sadat Moosavi, Iryna Gurevych

Keywords Paper

Debiasing Models, natural tasks, NLU tasks, debiasing methods

0

0

0

1

11:09

04/07/2020

Predicting Performance for Natural Language Processing Tasks

Mengzhou Xia, Antonios Anastasopoulos, Ruochen Xu and
Yiming Yang, Graham Neubig

Keywords Paper

Natural Tasks, natural research, NLP research, NLP tasks

0

0

0

0

11:48

26/04/2020

Neural Module Networks for Reasoning over Text

Nitish Gupta, Kevin Lin, Dan Roth and
Sameer Singh, Matt Gardner

Keywords Paper

question answering, compositionality, neural module networks, multi-step reasoning, reading comprehension

0

0

0

0

4:36

04/07/2020

To Test Machine Comprehension, Start by Defining Comprehension

Jesse Dunietz, Greg Burnham, Akash Bharadwaj and
Owen Rambow, Jennifer Chu-Carroll, Dave Ferrucci

Keywords Paper

Machine Comprehension, Defining Comprehension, MRC, narrative understanding

0

0

0

0

11:50

14/06/2020

SAM: The Sensitivity of Attribution Methods to Hyperparameters

Naman Bansal, Chirag Agarwal, Anh Nguyen

Keywords Paper

xai, explainable, attribution, sensitivity, robustness, explanation, hyperparameters

0

0

0

0

8:50

08/12/2020

Classifier Probes May Just Learn from Linear Context Features

Jenny Kunz, Marco Kuhlmann

Keywords Paper

0

0

0

0

14:33

23/08/2020

GRACE: Generating concise and informative contrastive sample to explain neural network model’s prediction

Thai Le, Suhang Wang, Dongwon Lee

Keywords Paper

contrastive samples, counterfactual samples, neural networks, data generation, interpretability, deep learning, explainability

0

0

0

0

19:07

04/07/2020

Don’t Say That! Making Inconsistent Dialogue Unlikely with Unlikelihood Training

Margaret Li, Stephen Roller, Ilia Kulikov and
Sean Welleck, Y-Lan Boureau, Kyunghyun Cho, Jason Weston

Keywords Paper

dialogue tasks, Unlikelihood Training, Generative models, maximum training

0

0

0

0

11:26

22/06/2020

Ranking vs. Classifying: Measuring Knowledge Base Completion Quality

Marina Speranskaya, Martin Schmitt, Benjamin Roth

Keywords Paper

knowledge base completion, knowledge graph embedding, classification, ranking

0

0

0

0

4:37

06/12/2021

Is Automated Topic Model Evaluation Broken? The Incoherence of Coherence

Alexander Hoyle, Pranav Goel, Andrew Hian-Cheong and
Denis Peskov, Jordan Boyd-Graber, Philip Resnik

Keywords Paper

0

0

0

0

15:00

04/07/2020

Towards Transparent and Explainable Attention Models

Akash Kumar Mohankumar, Preksha Nema, Sharan Narasimhan and
Mitesh M. Khapra, Balaji Vasan Srinivasan, Balaraman Ravindran

Keywords Paper

interpretability distributions, attention mechanisms, Human evaluations, Transparent Models

0

0

0

0

11:58

04/07/2020

ERASER: A Benchmark to Evaluate Rationalized NLP Models

Jay DeYoung, Sarthak Jain, Nazneen Fatema Rajani and
Eric Lehman, Caiming Xiong, Richard Socher, Byron C. Wallace

Keywords Paper

NLP, Evaluating Reasoning, ERASER, Rationalized Models

0

0

0

0

9:04

22/06/2020

CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning

Bill Yuchen Lin, Ming Shen, Wangchunshu Zhou and
Pei Zhou, Chandra Bhagavatula, Yejin Choi, Xiang Ren

Keywords Paper

0

0

0

0

4:29

02/02/2021

Adaptive Prior-Dependent Correction Enhanced Reinforcement Learning for Natural Language Generation

Wei Cheng, Ziyan Luo, Qiyue Yin

Keywords Paper

0

0

0

0

13:53

19/08/2021

AdaVQA: Overcoming Language Priors with Adapted Margin Cosine Loss

Yangyang Guo, Liqiang Nie, Zhiyong Cheng and
Feng Ji, Ji Zhang, Alberto Del Bimbo

Keywords Paper

Computer Vision, Language and Vision, Deep Learning

0

0

0

0

12:36

16/11/2020

A Rigorous Study on Named Entity Recognition: Can Fine-tuning Pretrained Model Lead to the Promised Land?

Hongyu Lin, Yaojie Lu, Jialong Tang and
Xianpei Han, Le Sun, Zhicheng Wei, Nicholas Jing Yuan

Keywords Paper

randomization test, fine-tuning model, ner, creditable approaches

0

0

0

0

10:12

02/02/2021

Learning to Rationalize for Nonmonotonic Reasoning with Distant Supervision

Faeze Brahman, Vered Shwartz, Rachel Rudinger, Yejin Choi

Keywords Paper

0

0

0

0

18:33

16/11/2020

An Analysis of Natural Language Inference Benchmarks through the Lens of Negation

Md Mosharaf Hossain, Venelin Kovatchev, Pranoy Dutta and
Tiffany Kao, Elizabeth Wei, Eduardo Blanco

Keywords Paper

natural inference, inference judgments, transformers, negation

0

0

0

0

12:01

04/07/2020

WinoWhy: A Deep Diagnosis of Essential Commonsense Knowledge for Answering Winograd Schema Challenge

Hongming Zhang, Xinran Zhao, Yangqiu Song

Keywords Paper

Deep Knowledge, Answering Challenge, WinoWhy, commonsense reasoning

0

0

0

0

11:58

16/11/2020

Universal Natural Language Processing with Limited Annotations: Try Few-shot Textual Entailment as a Start

Wenpeng Yin, Nazneen Fatema Rajani, Dragomir Radev and
Richard Socher, Caiming Xiong

Keywords Paper

nlp problems, textual entailment, nlp task, downstream tasks

0

0

0

0

12:08

02/02/2021

What's the Best Place for an AI Conference, Vancouver or _______: Why Completing Comparative Questions is Difficult

‪Avishai Zagoury‬, Einat Minkov, Idan Szpektor, William W. Cohen

Keywords Paper

0

0

0

0

15:15

06/12/2021

Learning to Generate Visual Questions with Noisy Supervision

Shen Kai, Lingfei Wu, Siliang Tang and
Yueting Zhuang, zhen he, Zhuoye Ding, Yun Xiao, Bo Long

Keywords Paper

generative model

0

0

0

0

14:54