Multi-timescale Representation Learning in LSTM Language Models

Abstract: Language models must capture statistical dependencies between words at timescales ranging from very short to very long. Earlier work has demonstrated that dependencies in natural language tend to decay with distance between words according to a power law. However, it is unclear how this knowledge can be used for analyzing or designing neural network language models. In this work, we derived a theory for how the memory gating mechanism in long short-term memory (LSTM) language models can capture power law decay. We found that unit timescales within an LSTM, which are determined by the forget gate bias, should follow an Inverse Gamma distribution. Experiments then showed that LSTM language models trained on natural English text learn to approximate this theoretical distribution. Further, we found that explicitly imposing the theoretical distribution upon the model during training yielded better language model perplexity overall, with particular improvements for predicting low-frequency (rare) words. Moreover, the explicit multi-timescale model selectively routes information about different types of words through units with different timescales, potentially improving model interpretability. These results demonstrate the importance of careful, theoretically-motivated analysis of memory and timescale in language models.

03/05/2021

supervised contrastive learning, pre-trained language model fine-tuning, natural language understanding, generalization, few-shot learning, robustness

4:44

06/12/2021

Multi-timescale Representation Learning in LSTM Language Models

Shivangi Mahto, Vy Vo, Javier Turek, Alexander Huth

Comments

Similar Papers

Supervised Contrastive Learning for Pre-trained Language Model Fine-tuning

Beliz Gunel, Jingfei Du, Alexis Conneau, Veselin Stoyanov

Keywords Abstract Paper

supervised contrastive learning, pre-trained language model fine-tuning, natural language understanding, generalization, few-shot learning, robustness

Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning

Colin Wei, Sang Michael Xie, Tengyu Ma

Keywords Abstract Paper

theory, machine learning, self-supervised learning, generative model, representation learning, language

Implicit unlikelihood training: Improving neural text generation with reinforcement learning

Evgeny Lagutin, Daniil Gavrilov, Pavel Kalaidin

Keywords Abstract Paper

Linguistic Features for Readability Assessment

Tovly Deutsch, Masoud Jasbi, Stuart Shieber

Keywords Abstract Paper

Refining Language Models with Compositional Explanations

Huihan Yao, Ying Chen, Qinyuan Ye and Xisen Jin, Xiang Ren

Keywords Abstract Paper

machine learning, fairness, language

Keep learning: Self-supervised meta-learning for learning from inference

Akhil Kedia, Sai Chetan Chinthakindi

Keywords Abstract Paper

Improving Neural Language Generation with Spectrum Control

Lingxiao Wang, Jing Huang, Kevin Huang and Ziniu Hu, Guangtao Wang, Quanquan Gu

Keywords Abstract Paper

Do Response Selection Models Really Know What’s Next? Utterance Manipulation Strategies for Multi-turn Response Selection

Taesun Whang, Dongyub Lee, Dongsuk Oh and Chanhee Lee, Kijong Han, Dong-hun Lee, Saebyeok Lee

Keywords Abstract Paper

Residual Energy-Based Models for Text Generation

Yuntian Deng, Anton Bakhtin, Myle Ott and Arthur Szlam, Marc'Aurelio Ranzato

Keywords Abstract Paper

energy-based models, text generation

Pre-trained Language Model Based Active Learning for Sentence Matching

Guirong Bai, Shizhu He, Kang Liu and Jun Zhao, Zaiqing Nie

Keywords Abstract Paper

Towards Robustifying NLI Models Against Lexical Dataset Biases

Xiang Zhou, Mohit Bansal

Keywords Abstract Paper

Natural Inference, data augmentation, Robustifying Models, deep models

Cross-Lingual Unsupervised Sentiment Classification with Multi-View Transfer Learning

Hongliang Fei, Ping Li

Keywords Abstract Paper

Cross-Lingual Classification, sentiment classification, unsupervised system, classification

Learning Which Features Matter: RoBERTa Acquires a Preference for Linguistic Generalizations (Eventually)

Alex Warstadt, Yian Zhang, Xiaocheng Li and Haokun Liu, Samuel R. Bowman

Keywords Abstract Paper

self-supervised tasks, language understanding, ambiguous tasks, finetuning

Educating Text Autoencoders: Latent Representation Guidance via Denoising

Tianxiao Shen, Jonas Mueller, Regina Barzilay, Tommi Jaakkola

Keywords Abstract Paper

Deep Learning - Generative Models and Autoencoders

Syntactic Structure Distillation Pretraining for Bidirectional Encoders

Adhiguna Kuncoro, Lingpeng Kong, Daniel Fried and Dani Yogatama, Laura Rimell, Chris Dyer, Phil Blunsom

Keywords Abstract Paper

bert pretraining, structured tasks, natural understanding, textual learners

Contrastive Learning with Adversarial Perturbations for Conditional Text Generation

Seanie Lee, Dong Bok Lee, Sung Ju Hwang

Keywords Abstract Paper

contrastive learning, conditional text generation

An Information Bottleneck Approach for Controlling Conciseness in Rationale Extraction

Bhargavi Paranjape, Mandar Joshi, John Thickstun and Hannaneh Hajishirzi, Luke Zettlemoyer

Keywords Abstract Paper

language understanding, semi-supervised setting, complex models, explainer

Generating Dialogue Responses from a Semantic Latent Space

Wei-Jen Ko, Avik Ray, Yilin Shen, Hongxia Jin

Keywords Abstract Paper

generation responses, regression task, open-domain models, end-to-end classification

Show Me How To Revise: Improving Lexically Constrained Sentence Generation with XLNet

Xingwei He, Victor O.K. Li

Keywords Abstract Paper

FairFil: Contrastive Neural Debiasing Method for Pretrained Text Encoders

Pengyu Cheng, Weituo Hao, Siyang Yuan and Shijing Si, Lawrence Carin

Keywords Abstract Paper

Mutual Information, Pretrained Text Encoders, Contrastive Learning, Fairness

Domain Knowledge Empowered Structured Neural Net for End-to-End Event Temporal Relation Extraction

Rujun Han, Yichao Zhou, Nanyun Peng

Keywords Abstract Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Huihan Yao, Ying Chen, Qinyuan Ye and
Xisen Jin, Xiang Ren

Keywords Paper

Keywords Paper

Lingxiao Wang, Jing Huang, Kevin Huang and
Ziniu Hu, Guangtao Wang, Quanquan Gu

Keywords Paper

Taesun Whang, Dongyub Lee, Dongsuk Oh and
Chanhee Lee, Kijong Han, Dong-hun Lee, Saebyeok Lee

Keywords Paper

Yuntian Deng, Anton Bakhtin, Myle Ott and
Arthur Szlam, Marc'Aurelio Ranzato

Keywords Paper

Guirong Bai, Shizhu He, Kang Liu and
Jun Zhao, Zaiqing Nie

Keywords Paper

Keywords Paper

Keywords Paper

Alex Warstadt, Yian Zhang, Xiaocheng Li and
Haokun Liu, Samuel R. Bowman

Keywords Paper

Keywords Paper

Adhiguna Kuncoro, Lingpeng Kong, Daniel Fried and
Dani Yogatama, Laura Rimell, Chris Dyer, Phil Blunsom

Keywords Paper

Keywords Paper

Bhargavi Paranjape, Mandar Joshi, John Thickstun and
Hannaneh Hajishirzi, Luke Zettlemoyer

Keywords Paper

Keywords Paper

Keywords Paper

Pengyu Cheng, Weituo Hao, Siyang Yuan and
Shijing Si, Lawrence Carin

Keywords Paper

Keywords Paper

Qiyu Wu, Chen Xing, Yatao Li and
Guolin Ke, Di He, Tie-Yan Liu

Keywords Paper

Keywords Paper

Yuchen Lu, Soumye Singhal, Florian Strub and
Olivier Pietquin, Aaron Courville

Keywords Paper

Jianxing Yu, Wei Liu, Shuang Qiu and
Qinliang Su, Kai Wang, Xiaojun Quan, Jian Yin

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper