Language Models are Few-Shot Learners

06/12/2020

Language Models are Few-Shot Learners

Tom B Brown, Ben Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen M Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei

Keywords:

Abstract Paper Similar Papers

Abstract: We demonstrate that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even becoming competitive with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks. We also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

16/11/2020

KGPT: Knowledge-Grounded Pre-Training for Data-to-Text Generation

Wenhu Chen, Yu Su, Xifeng Yan, William Yang Wang

Keywords Paper

data-to-text generation, data-to-text tasks, fully-supervised setting, pre-training learning

0

0

0

0

11:10

02/02/2021

SARG: A Novel Semi Autoregressive Generator for Multi-turn Incomplete Utterance Restoration

Mengzuo Huang, Feng Li, Wuhe Zou, Weidong Zhang

Keywords Paper

0

0

0

0

14:50

04/07/2020

Leveraging Monolingual Data with Self-Supervision for Multilingual Neural Machine Translation

Aditya Siddhant, Ankur Bapna, Yuan Cao and
Orhan Firat, Mia Chen, Sneha Kudugunta, Naveen Arivazhagan, Yonghui Wu

Keywords Paper

Multilingual Translation, Multilingual , low-resource translation, low-resource NMT

1

1

0

0

6:51

04/07/2020

Learning Spoken Language Representations with Neural Lattice Language Modeling

Chao-Wei Huang, Yun-Nung Chen

Keywords Paper

NLP tasks, spoken tasks, intent detection, Spoken Representations

0

0

0

0

6:39

19/04/2021

First align, then predict: Understanding the cross-lingual ability of multilingual BERT

Benjamin Muller, Yanai Elazar, Benoı̂t Sagot, Djamé Seddah

Keywords Paper

0

0

0

0

7:18

06/12/2020

A Simple Language Model for Task-Oriented Dialogue

Ehsan Hosseini-Asl, Bryan McCann, Chien-Sheng Wu and
Semih Yavuz, Richard Socher

Keywords Paper

0

0

0

0

3:21

26/04/2020

Cross-Lingual Ability of Multilingual BERT: An Empirical Study

Karthikeyan K, Zihan Wang, Stephen Mayhew, Dan Roth

Keywords Paper

Cross-Lingual Learning, Multilingual BERT

0

0

0

0

4:31

19/04/2021

PPT: Parsimonious parser transfer for unsupervised cross-lingual adaptation

Kemal Kurniawan, Lea Frermann, Philip Schulz, Trevor Cohn

Keywords Paper

0

0

0

0

11:52

06/12/2020

Bayesian Multi-type Mean Field Multi-agent Imitation Learning

Fan Yang, Alina Vereshchaka, Changyou Chen, Wen Dong

Keywords Paper

0

0

0

0

3:23

06/12/2020

Unsupervised Data Augmentation for Consistency Training

Qizhe Xie, Zihang Dai, Eduard Hovy and
Thang Luong, Quoc V Le

Keywords Paper

0

0

0

0

3:29

16/11/2020

Multilingual Denoising Pre-training for Neural Machine Translation

Jiatao Gu, Yinhan Liu, Naman Goyal and
Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer

Keywords Paper

machine tasks, pre-training, multilingual pre-training, mbart

0

0

0

0

10:32

05/12/2020

Vocabulary matters: A simple yet effective approach to paragraph-level question generation

Vishwajeet Kumar, Manish Joshi, Ganesh Ramakrishnan, Yuan-Fang Li

Keywords Paper

0

0

0

0

8:36

08/12/2020

TableGPT: Few-shot Table-to-Text Generation with Table Structure Reconstruction and Content Matching

Heng Gong, Yawei Sun, Xiaocheng Feng and
Bing Qin, Wei Bi, Xiaojiang Liu, Ting Liu

Keywords Paper

0

0

0

0

8:45

04/07/2020

Multi-source Meta Transfer for Low Resource Multiple-Choice Question Answering

Ming Yan, Hao Zhang, Di Jin, Joey Tianyi Zhou

Keywords Paper

Multi-source Transfer, Low Answering, Multiple-choice answering, machine comprehension

0

0

0

0

7:40

14/06/2020

OrigamiNet: Weakly-Supervised, Segmentation-Free, One-Step, Full Page Text Recognition by learning to unfold

Mohamed Yousef, Tom E. Bishop

Keywords Paper

text recognition, weakly supervised, handwriting recognition, convolutional neural network fully convolutional, ctc

0

0

0

0

1:00

02/02/2021

Revisiting Mahalanobis Distance for Transformer-Based Out-of-Domain Detection

Alexander Podolskiy, Dmitry Lipin, Andrey Bout and
Ekaterina Artemova, Irina Piontkovskaya

Keywords Paper

0

0

0

0

16:08

06/12/2020

Incorporating BERT into Parallel Sequence Decoding with Adapters

Junliang Guo, Zhirui Zhang, Linli Xu and
Hao-Ran Wei, Boxing Chen, Enhong Chen

Keywords Paper

0

0

0

0

3:17

15/06/2020

Faster general parsing through context-free memoization

Grzegorz Herman

Keywords Paper

Earley, context-free, generalized parsing, GLL, GLR, memoization

0

0

0

0

12:21

22/11/2021

One-Shot Deep Model for End-to-End Multi-Person Activity Recognition

Shuhei Tarashima

Keywords Paper

Group Activity Recognition, Action Recognition, Multi-Object Tracking, Multi-task Learning

0

0

0

0

2:50

06/12/2021

Align before Fuse: Vision and Language Representation Learning with Momentum Distillation

Junnan Li, Ramprasaath Selvaraju, Akhilesh Gotmare and
Shafiq Joty, Caiming Xiong, Steven Chu Hong Hoi

Keywords Paper

transformers, vision, representation learning

0

0

0

0

9:40

18/07/2021

Parameterless Transductive Feature Re-representation for Few-Shot Learning

Wentao Cui, Yuhong Guo

Keywords Paper

Algorithms, Multitask, Transfer, and Meta Learning

0

0

0

0

5:10

07/06/2020

Learning Cross-Lingual Word Embeddings from Twitter via Distant Supervision

Jose Camacho-Collados, Yerai Doval Mosquera, Eugenio Martínez-Cámara and
Luis Espinosa-Anke, Francesco Barbieri, Steven Schockaert

Keywords Paper

embedding spaces, embeddings, languages, learning, performance, representations, shared, spaces, texts, twitter, word embeddings, words

0

0

0

0

10:39

05/12/2020

Mixed-lingual pre-training for cross-lingual summarization

Ruochen Xu, Chenguang Zhu, Yu Shi and
Michael Zeng, Xuedong Huang

Keywords Paper

0

0

0

0

11:49

08/12/2020

Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation with Semantic Fidelity

Hamza Harkous, Isabel Groves, Amir Saffari

Keywords Paper

0

0

0

0

14:37

26/04/2020

Pre-training Tasks for Embedding-based Large-scale Retrieval

Wei-Cheng Chang, Felix X. Yu, Yin-Wen Chang and
Yiming Yang, Sanjiv Kumar

Keywords Paper

natural language processing, large-scale retrieval, unsupervised representation learning, paragraph-level pre-training, two-tower Transformer models

0

0

0

1

4:39

06/12/2020

Modular Meta-Learning with Shrinkage

Yutian Chen, Abe Friesen, Feryal Behbahani and
Arnaud Doucet, David Budden, Matthew Hoffman, Nando de Freitas

Keywords Paper

0

0

0

0

3:21

08/12/2020

SentiX: A Sentiment-Aware Pre-Trained Model for Cross-Domain Sentiment Analysis

Jie Zhou, Junfeng Tian, Rui Wang and
Yuanbin Wu, Wenming Xiao, Liang He

Keywords Paper

0

0

0

0

12:42

12/07/2020

Countering Language Drift with Seeded Iterated Learning

Yuchen Lu, Soumye Singhal, Florian Strub and
Aaron Courville, Olivier Pietquin

Keywords Paper

Deep Learning - Algorithms

0

0

0

0

14:25

06/12/2020

Compositional Generalization via Neural-Symbolic Stack Machines

Xinyun Chen, Chen Liang, Adams Wei Yu and
Dawn Song, Denny Zhou

Keywords Paper

Applications -> Computer Vision; Applications -> Visual Scene Analysis and Interpretation; Deep Learning -> Adversarial Network, Deep Learning -> Generative Models

0

0

0

0

3:26

04/07/2020

Injecting Numerical Reasoning Skills into Language Models

Mor Geva, Ankit Gupta, Jonathan Berant

Keywords Paper

numerical reasoning, automatic generation, RC tasks, automatic augmentation

0

0

0

0

11:21

16/11/2020

DAGA: Data Augmentation with a Generation Approach forLow-resource Tagging Tasks

Bosheng Ding, Linlin Liu, Lidong Bing and
Canasai Kruengkrai, Thien Hai Nguyen, Shafiq Joty, Luo Si, Chunyan Miao

Keywords Paper

machine learning, generalization, low-resource tasks, named recognition

0

0

0

0

11:09

26/04/2020

LAMOL: LAnguage MOdeling for Lifelong Language Learning

Fan-Keng Sun, Cheng-Hao Ho, Hung-Yi Lee

Keywords Paper

NLP, Deep Learning, Lifelong Learning

0

0

0

0

4:44

16/11/2020

Event Extraction as Machine Reading Comprehension

Jian Liu, Yubo Chen, Kang Liu and
Wei Bi, Xiaojiang Liu

Keywords Paper

event extraction, ee, information task, classification task

0

0

0

0

11:15

08/12/2020

FASTMATCH: Accelerating the Inference of BERT-based Text Matching

Shuai Pang, Jianqiang Ma, Zeyu Yan and
Yang Zhang, Jianping Shen

Keywords Paper

0

0

0

0

15:01

18/07/2021

Straight to the Gradient: Learning to Use Novel Tokens for Neural Text Generation

Xiang Lin, Simeng Han, Shafiq Joty

Keywords Paper

Applications, Natural Language Processing

0

0

0

0

16:00

19/08/2021

Improving Context-Aware Neural Machine Translation with Source-side Monolingual Documents

Linqing Chen, Junhui Li, Zhengxian Gong and
Xiangyu Duan, Boxing Chen, Weihua Luo, Min Zhang, Guodong Zhou

Keywords Paper

Natural Language Processing, Machine Translation

0

0

0

0

12:48

04/07/2020

DoQA - Accessing Domain-Specific FAQs via Conversational QA

Jon Ander Campos, Arantxa Otegi, Aitor Soroa and
Jan Deriu, Mark Cieliebak, Eneko Agirre

Keywords Paper

DoQA FAQs, conversational interfaces, information scenario, IR scenario

0

0

0

0

12:35

04/07/2020

BLEURT: Learning Robust Metrics for Text Generation

Thibault Sellam, Dipanjan Das, Ankur Parikh

Keywords Paper

Learning Metrics, Text Generation, WMT task, pre-training scheme

0

0

0

0

11:46

16/11/2020

Plug and Play Autoencoders for Conditional Text Generation

Florian Mai, Nikolaos Pappas, Ivan Montero and
Noah A. Smith, James Henderson

Keywords Paper

conditional tasks, style transfer, style tasks, text autoencoders

0

0

0

0

9:23

26/04/2020

Reducing Transformer Depth on Demand with Structured Dropout

Angela Fan, Edouard Grave, Armand Joulin

Keywords Paper

reduction, regularization, pruning, dropout, transformer

0

0

0

0

5:01