A Mathematical Exploration of Why Language Models Help Solve Downstream Tasks

Abstract: Autoregressive language models, pretrained using large text corpora to do well on next word prediction, have been successful at solving many downstream tasks, even with zero-shot usage. However, there is little theoretical understanding of this success. This paper initiates a mathematical study of this phenomenon for the downstream task of text classification by considering the following questions: (1) What is the intuitive connection between the pretraining task of next word prediction and text classification? (2) How can we mathematically formalize this connection and quantify the benefit of language modeling? For (1), we hypothesize, and verify empirically, that classification tasks of interest can be reformulated as sentence completion tasks, thus making language modeling a meaningful pretraining task. With a mathematical formalization of this hypothesis, we make progress towards (2) and show that language models that are $\epsilon$-optimal in cross-entropy (log-perplexity) learn features that can linearly solve such classification tasks with $\mathcal{O}(\sqrt{\epsilon})$ error, thus demonstrating that doing well on language modeling can be beneficial for downstream tasks. We experimentally verify various assumptions and theoretical findings, and also use insights from the analysis to design a new objective function that performs well on some classification tasks.

12/07/2020

A Mathematical Exploration of Why Language Models Help Solve Downstream Tasks

Nikunj Saunshi, Sadhika Malladi, Sanjeev Arora

Comments

Similar Papers

Educating Text Autoencoders: Latent Representation Guidance via Denoising

Tianxiao Shen, Jonas Mueller, Regina Barzilay, Tommi Jaakkola

Keywords Abstract Paper

Deep Learning - Generative Models and Autoencoders

Refining Language Models with Compositional Explanations

Huihan Yao, Ying Chen, Qinyuan Ye and Xisen Jin, Xiang Ren

Keywords Abstract Paper

machine learning, fairness, language

Low-Resource Generation of Multi-hop Reasoning Questions

Jianxing Yu, Wei Liu, Shuang Qiu and Qinliang Su, Kai Wang, Xiaojun Quan, Jian Yin

Keywords Abstract Paper

Low-Resource Questions, generating questions, machine comprehension, multi-hop model

Self-Supervised Meta-Learning for Few-Shot Natural Language Classification Tasks

Trapit Bansal, Rishikesh Jha, Tsendsuren Munkhdalai, Andrew McCallum

Keywords Abstract Paper

nlp applications, fine-tuning, meta-learning problem, supervised tasks

Jointly Learning to Align and Summarize for Neural Cross-Lingual Summarization

Yue Cao, Hui Liu, Xiaojun Wan

Keywords Abstract Paper

Neural Summarization, Cross-lingual summarization, cross-lingual training, pipeline methods

Do Response Selection Models Really Know What’s Next? Utterance Manipulation Strategies for Multi-turn Response Selection

Taesun Whang, Dongyub Lee, Dongsuk Oh and Chanhee Lee, Kijong Han, Dong-hun Lee, Saebyeok Lee

Keywords Abstract Paper

Local Explanation of Dialogue Response Generation

Yi-Lin Tuan, Connor Pryor, Wenhu Chen and Lise Getoor, William Yang Wang

Keywords Abstract Paper

machine learning

Keep learning: Self-supervised meta-learning for learning from inference

Akhil Kedia, Sai Chetan Chinthakindi

Keywords Abstract Paper

Residual Energy-Based Models for Text Generation

Yuntian Deng, Anton Bakhtin, Myle Ott and Arthur Szlam, Marc'Aurelio Ranzato

Keywords Abstract Paper

energy-based models, text generation

Learning Sparse Prototypes for Text Generation

Junxian He, Taylor Berg-Kirkpatrick, Graham Neubig

Keywords Abstract Paper

Mathematical Reasoning via Self-supervised Skip-tree Training

Markus Rabe, Dennis Lee, Kshitij Bansal, Christian Szegedy

Keywords Abstract Paper

theorem proving, reasoning, self-supervised learning, mathematics, language modeling

X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained Language Models

Zhengbao Jiang, Antonios Anastasopoulos, Jun Araki and Haibo Ding, Graham Neubig

Keywords Abstract Paper

factual retrieval, language models, lms, probing methods

Syntactic Structure Distillation Pretraining for Bidirectional Encoders

Adhiguna Kuncoro, Lingpeng Kong, Daniel Fried and Dani Yogatama, Laura Rimell, Chris Dyer, Phil Blunsom

Keywords Abstract Paper

bert pretraining, structured tasks, natural understanding, textual learners

Enabling Language Models to Fill in the Blanks

Chris Donahue, Mina Lee, Percy Liang

Keywords Abstract Paper

text infilling, predicting text, writing tools, language modeling

Using Context in Neural Machine Translation Training Objectives

Danielle Saunders, Felix Stahlberg, Bill Byrne

Keywords Abstract Paper

Neural training, NMT training, document-level training, NMT objective

FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition

Yichong Leng, Xu Tan, Linchen Zhu and Jin Xu, Renqian Luo, Linquan Liu, Tao Qin, Xiangyang Li, Edward Lin, Tie-Yan Liu

Keywords Abstract Paper

Logical Natural Language Generation from Open-Domain Tables

Wenhu Chen, Jianshu Chen, Yu Su and Zhiyu Chen, William Yang Wang

Keywords Abstract Paper

Logical Generation, neural NLG, surface-level realizations, logical inference

An exploratory study on multilingual quality estimation

Shuo Sun, Marina Fomicheva, Frédéric Blain and Vishrav Chaudhary, Ahmed El-Kishky, Adithya Renduchintala, Francisco Guzmán, Lucia Specia

Keywords Abstract Paper

A pairwise probe for understanding BERT fine-tuning on machine reading comprehension

Jie Cai, Zhengzhou Zhu, Ping Nie, Qian Liu

Keywords Abstract Paper

machine reading comprehension, pairwise, fine-tune, BERT

Injecting Numerical Reasoning Skills into Language Models

Mor Geva, Ankit Gupta, Jonathan Berant

Keywords Abstract Paper

numerical reasoning, automatic generation, RC tasks, automatic augmentation

Improving Segmentation for Technical Support Problems

Keywords Paper

Huihan Yao, Ying Chen, Qinyuan Ye and
Xisen Jin, Xiang Ren

Keywords Paper

Jianxing Yu, Wei Liu, Shuang Qiu and
Qinliang Su, Kai Wang, Xiaojun Quan, Jian Yin

Keywords Paper

Keywords Paper

Keywords Paper

Taesun Whang, Dongyub Lee, Dongsuk Oh and
Chanhee Lee, Kijong Han, Dong-hun Lee, Saebyeok Lee

Keywords Paper

Yi-Lin Tuan, Connor Pryor, Wenhu Chen and
Lise Getoor, William Yang Wang

Keywords Paper

Keywords Paper

Yuntian Deng, Anton Bakhtin, Myle Ott and
Arthur Szlam, Marc'Aurelio Ranzato

Keywords Paper

Keywords Paper

Keywords Paper

Zhengbao Jiang, Antonios Anastasopoulos, Jun Araki and
Haibo Ding, Graham Neubig

Keywords Paper

Adhiguna Kuncoro, Lingpeng Kong, Daniel Fried and
Dani Yogatama, Laura Rimell, Chris Dyer, Phil Blunsom

Keywords Paper

Keywords Paper

Keywords Paper

Yichong Leng, Xu Tan, Linchen Zhu and
Jin Xu, Renqian Luo, Linquan Liu, Tao Qin, Xiangyang Li, Edward Lin, Tie-Yan Liu

Keywords Paper

Wenhu Chen, Jianshu Chen, Yu Su and
Zhiyu Chen, William Yang Wang

Keywords Paper

Shuo Sun, Marina Fomicheva, Frédéric Blain and
Vishrav Chaudhary, Ahmed El-Kishky, Adithya Renduchintala, Francisco Guzmán, Lucia Specia

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Alexander K. Lew, Marco Cusumano-Towner, Benjamin Sherman and
Michael Carbin, Vikash Mansinghka

Keywords Paper

Keywords Paper

Edoardo Barba, Luigi Procopio, Caterina Lacerra and
Tommaso Pasini, Roberto Navigli

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Xiyue Zhang, Xiaoning Du, Xiaofei Xie and
Lei Ma, Yang Liu, Meng Sun

Keywords Paper

Keywords Paper

Keywords Paper