Counterfactual Samples Synthesizing for Robust Visual Question Answering

Abstract: Despite Visual Question Answering (VQA) has realized impressive progress over the last few years, today's VQA models tend to capture superficial linguistic correlations in the train set and fail to generalize to the test set with different QA distributions. To reduce the language biases, several recent works introduce an auxiliary question-only model to regularize the training of targeted VQA model, and achieve dominating performance on VQA-CP. However, since the complexity of design, current methods are unable to equip the ensemble-based models with two indispensable characteristics of an ideal VQA model: 1) visual-explainable: the model should rely on the right visual regions when making decisions. 2) question-sensitive: the model should be sensitive to the linguistic variations in question. To this end, we propose a model-agnostic Counterfactual Samples Synthesizing (CSS) training scheme. The CSS generates numerous counterfactual training samples by masking critical objects in images or words in questions, and assigning different ground-truth answers. After training with the complementary samples (ie, the original and generated samples), the VQA models are forced to focus on all critical objects and words, which significantly improves both visual-explainable and question-sensitive abilities. In return, the performance of these models is further boosted. Extensive ablations have shown the effectiveness of CSS. Particularly, by building on top of the model LMH, we achieve a record-breaking performance of 58.95% on VQA-CP v2, with 6.5% gains.

14/06/2020

Counterfactual Samples Synthesizing for Robust Visual Question Answering

Long Chen, Xin Yan, Jun Xiao, Hanwang Zhang, Shiliang Pu, Yueting Zhuang

Comments

Similar Papers

ProAlignNet: Unsupervised Learning for Progressively Aligning Noisy Contours

VSR Veeravasarapu, Abhishek Goel, Deepak Mittal, Maneesh Singh

Keywords Abstract Paper

shape alignment, label refinement, chamfer loss, unsupervised alignment, convnets

A Case Study of the Shortcut Effects in Visual Commonsense Reasoning

Keren Ye, Adriana Kovashka

Keywords Abstract Paper

MASKER: Masked Keyword Regularization for Reliable Text Classification

Seung Jun Moon, Sangwoo Mo, Kimin Lee and Jaeho Lee, Jinwoo Shin

Keywords Abstract Paper

New Protocols and Negative Results for Textual Entailment Data Collection

Samuel R. Bowman, Jennimaria Palomaki, Livio Baldini Soares, Emily Pitler

Keywords Abstract Paper

benchmarking, language understanding, transfer applications, crowdsourcing protocol

Improving Dialog Evaluation with a Multi-reference Adversarial Dataset and Large Scale Pretraining

Ananya Sai, Akash Mohan Kumar, Siddhartha Arora, Mitesh Khapra

Keywords Abstract Paper

large pretraining, embedding metrics, n-gram metrics, deb

How Should Pre-Trained Language Models Be Fine-Tuned Towards Adversarial Robustness?

Xinshuai Dong, Anh Tuan Luu, Min Lin and Shuicheng Yan, Hanwang Zhang

Keywords Abstract Paper

robustness, adversarial robustness and security, language

From Saturation to Zero-Shot Visual Relationship Detection Using Local Context

Nikolaos Gkanatsios, Vassilis Pitsikalis, Petros Maragos

Keywords Abstract Paper

Visual Relationship Detection, Scene Graph Generation, Zero-shot Classification, Local Context, Language Bias

Counterfactual Vision and Language Learning

Ehsan Abbasnejad, Damien Teney, Amin Parvaneh and Javen Shi, Anton van den Hengel

Keywords Abstract Paper

counterfactual reasoning vision and language tasks vqa

Multimodal Prototypical Networks for Few-Shot Learning

Frederik Pahde, Mihai Puscas, Tassilo Klein, Moin Nabi

Keywords Abstract Paper

Fairness via Representation Neutralization

Mengnan Du, Subhabrata Mukherjee, Guanchu Wang and Ruixiang Tang, Ahmed Awadallah, Xia Hu

Keywords Abstract Paper

machine learning, fairness, interpretability

The World is Not Binary: Learning to Rank with Grayscale Data for Dialogue Response Selection

Zibo Lin, Deng Cai, Yan Wang and Xiaojiang Liu, Haitao Zheng, Shuming Shi

Keywords Abstract Paper

response selection, retrieval-based systems, learning-to-rank problem, learning-to-rank

Robust Local Features for Improving the Generalization of Adversarial Training

Chuanbiao Song, Kun He, Jiadong Lin and Liwei Wang, John E. Hopcroft

Keywords Abstract Paper

adversarial robustness, adversarial training, adversarial example, deep learning

Revisiting Mahalanobis Distance for Transformer-Based Out-of-Domain Detection

Alexander Podolskiy, Dmitry Lipin, Andrey Bout and Ekaterina Artemova, Irina Piontkovskaya

Keywords Abstract Paper

NORESQA: A Framework for Speech Quality Assessment using Non-Matching References

Pranay Manocha, Buye Xu, Anurag Kumar

Keywords Abstract Paper

deep learning, robustness, self-supervised learning

Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision

Chao Jia, Yinfei Yang, Ye Xia and Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc Le, Yun-Hsuan Sung, Zhen Li, Tom Duerig

Keywords Abstract Paper

Deep Learning, Embedding and Representation learning

You Only Need Adversarial Supervision for Semantic Image Synthesis

Edgar Schoenfeld, Vadim Sushko, Dan Zhang and Juergen Gall, Bernt Schiele, Anna Khoreva

Keywords Abstract Paper

GANs, Semantic Image Synthesis, Image Generation, Deep Learning

EvidentialMix: Learning With Combined Open-Set and Closed-Set Noisy Labels

Ragav Sachdeva, Filipe R. Cordeiro, Vasileios Belagiannis and Ian Reid, Gustavo Carneiro

Keywords Abstract Paper

A pairwise probe for understanding BERT fine-tuning on machine reading comprehension

Jie Cai, Zhengzhou Zhu, Ping Nie, Qian Liu

Keywords Abstract Paper

machine reading comprehension, pairwise, fine-tune, BERT

Few-Shot Segmentation via Cycle-Consistent Transformer

Gengwei Zhang, Guoliang Kang, Yi Yang, Yunchao Wei

Keywords Abstract Paper

transformers, vision, few shot learning

Towards Robustifying NLI Models Against Lexical Dataset Biases

Xiang Zhou, Mohit Bansal

Keywords Abstract Paper

Natural Inference, data augmentation, Robustifying Models, deep models

Learning to Generate Visual Questions with Noisy Supervision

Keywords Paper

Keywords Paper

Seung Jun Moon, Sangwoo Mo, Kimin Lee and
Jaeho Lee, Jinwoo Shin

Keywords Paper

Keywords Paper

Keywords Paper

Xinshuai Dong, Anh Tuan Luu, Min Lin and
Shuicheng Yan, Hanwang Zhang

Keywords Paper

Keywords Paper

Ehsan Abbasnejad, Damien Teney, Amin Parvaneh and
Javen Shi, Anton van den Hengel

Keywords Paper

Keywords Paper

Mengnan Du, Subhabrata Mukherjee, Guanchu Wang and
Ruixiang Tang, Ahmed Awadallah, Xia Hu

Keywords Paper

Zibo Lin, Deng Cai, Yan Wang and
Xiaojiang Liu, Haitao Zheng, Shuming Shi

Keywords Paper

Chuanbiao Song, Kun He, Jiadong Lin and
Liwei Wang, John E. Hopcroft

Keywords Paper

Alexander Podolskiy, Dmitry Lipin, Andrey Bout and
Ekaterina Artemova, Irina Piontkovskaya

Keywords Paper

Keywords Paper

Chao Jia, Yinfei Yang, Ye Xia and
Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc Le, Yun-Hsuan Sung, Zhen Li, Tom Duerig

Keywords Paper

Edgar Schoenfeld, Vadim Sushko, Dan Zhang and
Juergen Gall, Bernt Schiele, Anna Khoreva

Keywords Paper

Ragav Sachdeva, Filipe R. Cordeiro, Vasileios Belagiannis and
Ian Reid, Gustavo Carneiro

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Shen Kai, Lingfei Wu, Siliang Tang and
Yueting Zhuang, zhen he, Zhuoye Ding, Yun Xiao, Bo Long

Keywords Paper

Keywords Paper

Keywords Paper

Yu Liu, Lianghua Huang, Pan Pan and
Bin Wang, Yinghui Xu, Rong Jin

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Thomas Scialom, Paul-Alexis Dray, Sylvain Lamprier and
Benjamin Piwowarski, Jacopo Staiano

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Weili Nie, Tero Karras, Animesh Garg and
Shoubhik Debnath, Anjul Patney, Ankit Patel, Anima Anandkumar

Keywords Paper

Keywords Paper

Yubei Xiao, Ke Gong, Pan Zhou and
Guolin Zheng, Xiaodan Liang, Liang Lin

Keywords Paper

Huihan Yao, Ying Chen, Qinyuan Ye and
Xisen Jin, Xiang Ren

Keywords Paper

Keywords Paper