SSMBA: Self-Supervised Manifold Based Data Augmentation for Improving Out-of-Domain Robustness

16/11/2020

SSMBA: Self-Supervised Manifold Based Data Augmentation for Improving Out-of-Domain Robustness

Nathan Ng, Kyunghyun Cho, Marzyeh Ghassemi

Keywords: data augmentation, ood generalization, robustness benchmarks, ssmba

Abstract Paper Similar Papers

Abstract: Models that perform well on a training domain often fail to generalize to out-of-domain (OOD) examples. Data augmentation is a common method used to prevent overfitting and improve OOD generalization. However, in natural language, it is difficult to generate new examples that stay on the underlying data manifold. We introduce SSMBA, a data augmentation method for generating synthetic training examples by using a pair of corruption and reconstruction functions to move randomly on a data manifold. We investigate the use of SSMBA in the natural language domain, leveraging the manifold assumption to reconstruct corrupted text with masked language models. In experiments on robustness benchmarks across 3 tasks and 9 datasets, SSMBA consistently outperforms existing data augmentation methods and baseline models on both in-domain and OOD data, achieving gains of 0.8% on OOD Amazon reviews, 1.8% accuracy on OOD MNLI, and 1.4 BLEU on in-domain IWSLT14 German-English.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at EMNLP 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2020

Improving Generalization in Reinforcement Learning with Mixture Regularization

KAIXIN WANG, Bingyi Kang, Jie Shao, Jiashi Feng

Keywords Paper

0

0

0

1

3:14

02/02/2021

DecAug: Out-of-Distribution Generalization via Decomposed Feature Representation and Semantic Augmentation

Haoyue Bai, Rui Sun, Lanqing Hong and
Fengwei Zhou, Nanyang Ye, Han-Jia Ye, S.-H. Gary Chan, Zhenguo Li

Keywords Paper

0

0

0

0

15:59

26/04/2020

Measuring Compositional Generalization: A Comprehensive Method on Realistic Data

Daniel Keysers, Nathanael Schärli, Nathan Scales and
Hylke Buisman, Daniel Furrer, Sergii Kashubin, Nikola Momchev, Danila Sinopalnikov, Lukasz Stafiniak, Tibor Tihon, Dmitry Tsarkov, Xiao Wang, Marc van Zee, Olivier Bousquet

Keywords Paper

compositionality, generalization, natural language understanding, benchmark, compositional generalization, compositional modeling, semantic parsing, generalization measurement

0

0

0

0

4:57

14/09/2020

Using Error Decay Prediction to Overcome Practical Issues of Deep Active Learning for Named Entity Recognition

Haw-Shiuan Chang, Shankar Vembu, Sunil Mohan and
Rheeya Uppaal, Andrew McCallum

Keywords Paper

0

0

0

0

15:03

04/07/2020

Mind the Trade-off: Debiasing NLU Models without Degrading the In-distribution Performance

Prasetya Ajie Utama, Nafise Sadat Moosavi, Iryna Gurevych

Keywords Paper

Debiasing Models, natural tasks, NLU tasks, debiasing methods

0

0

0

1

11:09

06/12/2021

Automatic Data Augmentation for Generalization in Reinforcement Learning

Roberta Raileanu, Maxwell Goldstein, Denis Yarats and
Ilya Kostrikov, Rob Fergus

Keywords Paper

reinforcement learning and planning, machine learning

0

0

0

0

14:26

14/09/2020

Inductive Generalized Zero-shot Learning with Adversarial Relation Network

Guanyu Yang, Kaizhu Huang, Rui Zhang and
John Goulermas, Amir Hussain

Keywords Paper

zero-shot learning, adversarial examples, gradient penalty

0

0

0

0

13:03

19/08/2021

A Sequence-to-Set Network for Nested Named Entity Recognition

Zeqi Tan, Yongliang Shen, Shuai Zhang and
Weiming Lu, Yueting Zhuang

Keywords Paper

Natural Language Processing, Information Extraction, Named Entities

0

0

0

0

10:38

16/11/2020

An Empirical Study on Large-Scale Multi-Label Text Classification Including Few and Zero-Shot Labels

Ilias Chalkidis, Manos Fergadiotis, Sotiris Kotitsas and
Prodromos Malakasiotis, Nikolaos Aletras, Ion Androutsopoulos

Keywords Paper

flat classification, hierarchical approaches, zero-shot learning, few learning

0

0

0

0

12:21

06/12/2021

Data-Efficient Instance Generation from Instance Discrimination

Ceyuan Yang, Yujun Shen, Yinghao Xu, Bolei Zhou

Keywords Paper

machine learning, generative model

0

0

0

0

6:53

08/12/2020

Emergent Communication Pretraining for Few-Shot Machine Translation

Yaoyiran Li, Edoardo Maria Ponti, Ivan Vulić, Anna Korhonen

Keywords Paper

0

0

0

0

14:42

26/04/2020

Network Randomization: A Simple Technique for Generalization in Deep Reinforcement Learning

Kimin Lee, Kibok Lee, Jinwoo Shin, Honglak Lee

Keywords Paper

Deep reinforcement learning, Generalization in visual domains

0

0

0

0

5:03

30/11/2020

Few-Shot Zero-Shot Learning: Knowledge Transfer with Less Supervision

Nanyi Fei, Jiechao Guan, Zhiwu Lu, Yizhao Gao

Keywords Paper

0

0

0

0

7:37

14/06/2020

Towards Causal VQA: Revealing and Reducing Spurious Correlations by Invariant and Covariant Semantic Editing

Vedika Agarwal, Rakshith Shetty, Mario Fritz

Keywords Paper

robustness, vqa, causality, gan, dataset, evaluation, automated semantic scene editing, data augmentation, invariance, covariance

0

0

0

0

1:00

03/05/2021

Large Scale Image Completion via Co-Modulated Generative Adversarial Networks

Shengyu Zhao, Jonathan Cui, Yilun Sheng and
Yue Dong, Xiao Liang, Eric Chang, Yan Xu

Keywords Paper

co-modulation, image completion, generative adversarial networks

0

0

0

0

10:10

06/12/2020

Differentiable Augmentation for Data-Efficient GAN Training

Shengyu Zhao, Zhijian Liu, Ji Lin and
Jun-Yan Zhu, Song Han

Keywords Paper

0

0

0

0

3:22

12/07/2020

Aligned Cross Entropy for Non-Autoregressive Machine Translation

Marjan Ghazvininejad, Vladimir Karpukhin, Luke Zettlemoyer, Omer Levy

Keywords Paper

Applications - Language, Speech and Dialog

0

0

0

0

14:43

16/11/2020

New Protocols and Negative Results for Textual Entailment Data Collection

Samuel R. Bowman, Jennimaria Palomaki, Livio Baldini Soares, Emily Pitler

Keywords Paper

benchmarking, language understanding, transfer applications, crowdsourcing protocol

0

0

0

0

12:27

26/04/2020

Towards Verified Robustness under Text Deletion Interventions

Johannes Welbl, Po-Sen Huang, Robert Stanforth and
Sven Gowal, Krishnamurthy (Dj) Dvijotham, Martin Szummer, Pushmeet Kohli

Keywords Paper

natural language processing, specification, verification, model undersensitivity, adversarial, interval bound propagation

0

0

0

0

5:01

02/02/2021

Self-Domain Adaptation for Face Anti-Spoofing

Jingjing Wang, Jingyi Zhang, Ying Bian and
Youyi Cai, Chunmao Wang, Shiliang Pu

Keywords Paper

0

0

0

0

14:02

14/06/2020

Learning Meta Face Recognition in Unseen Domains

Jianzhu Guo, Xiangyu Zhu, Chenxu Zhao and
Dong Cao, Zhen Lei, Stan Z. Li

Keywords Paper

face recognition, meta learning, domain generalization, metric learning

0

0

0

0

5:01

04/07/2020

Towards Robustifying NLI Models Against Lexical Dataset Biases

Xiang Zhou, Mohit Bansal

Keywords Paper

Natural Inference, data augmentation, Robustifying Models, deep models

0

0

0

0

11:34

06/12/2021

ReSSL: Relational Self-Supervised Learning with Weak Augmentation

Mingkai Zheng, Shan You, Fei Wang and
Chen Qian, Changshui Zhang, Xiaogang Wang, Chang Xu

Keywords Paper

self-supervised learning, contrastive learning

0

0

0

0

6:35

14/06/2020

MSeg: A Composite Dataset for Multi-Domain Semantic Segmentation

John Lambert, Zhuang Liu, Ozan Sener and
James Hays, Vladlen Koltun

Keywords Paper

segmentation, domain generalization, semantic segmentation, dataset, cross-dataset generalization

0

0

0

0

1:01

19/04/2021

Neural data-to-text generation with LM-based text augmentation

Ernie Chang, Xiaoyu Shen, Dawei Zhu and
Vera Demberg, Hui Su

Keywords Paper

0

0

0

0

7:32

30/11/2020

dpVAEs: Fixing Sample Generation for Regularized VAEs

Riddhish Bhalodia, Iain Lee, Shireen Elhabian

Keywords Paper

0

0

0

0

7:54

14/06/2020

Suppressing Uncertainties for Large-Scale Facial Expression Recognition

Kai Wang, Xiaojiang Peng, Jianfei Yang and
Shijian Lu, Yu Qiao

Keywords Paper

emotion recognition, self-cure network, uncertainties

0

0

0

0

1:01

01/07/2020

Zero-Resource Cross-Domain Named Entity Recognition

Zihan Liu, Genta Indra Winata, Pascale Fung

Keywords Paper

0

0

0

0

5:15

04/07/2020

End-to-End Bias Mitigation by Modelling Biases in Corpora

Rabeeh Karimi Mahabadi, Yonatan Belinkov, James Henderson

Keywords Paper

End-to-End Mitigation, real-world scenarios, training, large-scale benchmarks

0

0

0

0

10:57

12/07/2020

Domain Aggregation Networks for Multi-Source Domain Adaptation

Junfeng Wen, Russell Greiner, Dale Schuurmans

Keywords Paper

Transfer, Multitask and Meta-learning

0

0

0

0

14:22

16/11/2020

DAGA: Data Augmentation with a Generation Approach forLow-resource Tagging Tasks

Bosheng Ding, Linlin Liu, Lidong Bing and
Canasai Kruengkrai, Thien Hai Nguyen, Shafiq Joty, Luo Si, Chunyan Miao

Keywords Paper

machine learning, generalization, low-resource tasks, named recognition

0

0

0

0

11:09

17/08/2020

Learning temporal coherence via self-supervision for GAN-based video generation

Mengyu Chu, You Xie, Jonas Mayer and
Laura Leal-Taixé, Nils Thuerey

Keywords Paper

self-supervision, temporal cycle-consistency, video super-resolution, generative adversarial network, unpaired video translation

0

0

0

0

16:59

03/05/2021

Learning to Recombine and Resample Data For Compositional Generalization

Ekin Akyürek, Afra Feyza Akyürek, Jacob Andreas

Keywords Paper

sequence models, language processing, compositional generalization, data augmentation, generative modeling

0

0

0

0

6:14

15/06/2020

Blended, precise semantic program embeddings

Ke Wang, Zhendong Su

Keywords Paper

Static and Dynamic Program Features, Attention Network, Semantic Program Embedding

0

0

0

0

15:39

16/11/2020

Evaluating the Factual Consistency of Abstractive Text Summarization

Wojciech Kryscinski, Bryan McCann, Caiming Xiong, Richard Socher

Keywords Paper

assessing algorithms, natural inference, fact checking, auxiliary tasks

0

0

0

0

12:05

14/06/2020

Domain Decluttering: Simplifying Images to Mitigate Synthetic-Real Domain Shift and Improve Depth Estimation

Yunhan Zhao, Shu Kong, Daeyun Shin, Charless Fowlkes

Keywords Paper

monocular depth prediction, real-synthetic domain shift, synthetic training data, domain adaptation, image inpainting, high-level domain gaps

0

0

0

0

1:01

16/11/2020

Uncertainty-Aware Semantic Augmentation for Neural Machine Translation

Xiangpeng Wei, Heng Yu, Yue Hu and
Rongxiang Weng, Luxi Xing, Weihua Luo

Keywords Paper

sequence-to-sequence task, nmt, inference, translation tasks

0

0

0

0

11:11

05/01/2021

AdarGCN: Adaptive Aggregation GCN for Few-Shot Learning

Jianhong Zhang, Manli Zhang, Zhiwu Lu, Tao Xiang

Keywords Paper

0

0

0

0

4:45

04/07/2020

Cross-Lingual Unsupervised Sentiment Classification with Multi-View Transfer Learning

Hongliang Fei, Ping Li

Keywords Paper

Cross-Lingual Classification, sentiment classification, unsupervised system, classification

0

0

0

0

12:23

26/04/2020

Adversarial AutoAugment

Xinyu Zhang, Qiang Wang, Jian Zhang, Zhao Zhong

Keywords Paper

Automatic Data Augmentation, Adversarial Learning, Reinforcement Learning

0

0

0

0

4:30