Mogrifier LSTM

Abstract: Many advances in Natural Language Processing have been based upon more expressive models for how inputs interact with the context in which they occur. Recurrent networks, which have enjoyed a modicum of success, still lack the generalization and systematicity ultimately required for modelling language. In this work, we propose an extension to the venerable Long Short-Term Memory in the form of mutual gating of the current input and the previous output. This mechanism affords the modelling of a richer space of interactions between inputs and their context. Equivalently, our model can be viewed as making the transition function given by the LSTM context-dependent. Experiments demonstrate markedly improved generalization on language modelling in the range of 3–4 perplexity points on Penn Treebank and Wikitext-2, and 0.01–0.05 bpc on four character-based datasets. We establish a new state of the art on all datasets with the exception of Enwik8, where we close a large gap between the LSTM and Transformer models.

08/12/2020

Mogrifier LSTM

Gábor Melis, Tomáš Kočiský, Phil Blunsom

Comments

Similar Papers

Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation with Semantic Fidelity

Hamza Harkous, Isabel Groves, Amir Saffari

Keywords Abstract Paper

How Self-Attention Improves Rare Class Performance in a Question-Answering Dialogue Agent

Adam Stiff, Qi Song, Eric Fosler-Lussier

Keywords Abstract Paper

Knowledge-driven Data Construction for Zero-shot Evaluation in Commonsense Question Answering

Kaixin Ma, Filip Ilievski, Jonathan Francis and Yonatan Bisk, Eric Nyberg, Alessandro Oltramari

Keywords Abstract Paper

Task-oriented Domain-specific Meta-Embedding for Text Classification

Xin Wu, Yi Cai, Yang Kai and Tao Wang, Qing Li

Keywords Abstract Paper

natural tasks, downstream tasks, meta-embedding learning, meta-embedding methods

Emergent Communication Pretraining for Few-Shot Machine Translation

Yaoyiran Li, Edoardo Maria Ponti, Ivan Vulić, Anna Korhonen

Keywords Abstract Paper

Improving Neural Language Generation with Spectrum Control

Lingxiao Wang, Jing Huang, Kevin Huang and Ziniu Hu, Guangtao Wang, Quanquan Gu

Keywords Abstract Paper

Learning to generate reformulation actions for scalable conversational query understanding

Zihan Xu, Jiangang Zhu, Ling Geng and Yang Yang, Bojia Lin, Daxin Jiang

Keywords Abstract Paper

contextual query reformulation, question answering, conversational query understanding

Contextualized Sparse Representations for Real-Time Open-Domain Question Answering

Jinhyuk Lee, Minjoon Seo, Hannaneh Hajishirzi, Jaewoo Kang

Keywords Abstract Paper

Real-Time Answering, Open-domain answering, phrase problem, Contextualized Representations

Do Explicit Alignments Robustly Improve Multilingual Encoders?

Shijie Wu, Mark Dredze

Keywords Abstract Paper

multilingual, unsupervised encoders, cross-lingual representation, contrastive objective

Align before Fuse: Vision and Language Representation Learning with Momentum Distillation

Junnan Li, Ramprasaath Selvaraju, Akhilesh Gotmare and Shafiq Joty, Caiming Xiong, Steven Chu Hong Hoi

Keywords Abstract Paper

transformers, vision, representation learning

KERMIT: Complementing Transformer Architectures with Encoders of Explicit Syntactic Interpretations

Fabio Massimo Zanzotto, Andrea Santilli, Leonardo Ranaldi and Dario Onorati, Pierfrancesco Tommasino, Francesca Fallucchi

Keywords Abstract Paper

natural understanding, inference, syntactic parsers, large-scale learners

Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model

Wenhan Xiong, Jingfei Du, William Yang Wang, Veselin Stoyanov

Keywords Abstract Paper

Text Graph Transformer for Document Classification

Haopeng Zhang, Jiawei Zhang

Keywords Abstract Paper

text classification, natural processing, text task, graph techniques

SARG: A Novel Semi Autoregressive Generator for Multi-turn Incomplete Utterance Restoration

Mengzuo Huang, Feng Li, Wuhe Zou, Weidong Zhang

Keywords Abstract Paper

Lexical normalization for code-switched data and its effect on POS tagging

Rob Goot, Özlem Çetinoğlu

Keywords Abstract Paper

CoDA: Contrast-enhanced and Diversity-promoting Data Augmentation for Natural Language Understanding

Yanru Qu, Dinghan Shen, Yelong Shen and Sandra Sajeev, Weizhu Chen, Jiawei Han

Keywords Abstract Paper

consistency training, contrastive learning, data augmentation, natural language understanding

Reducing Transformer Depth on Demand with Structured Dropout

Angela Fan, Edouard Grave, Armand Joulin

Keywords Abstract Paper

reduction, regularization, pruning, dropout, transformer

G2T: Generating fluent descriptions for knowledge graph

Yunzhou Shi, Zhiling Luo, Pengcheng Zhu and Feng Ji, Wei Zhou, Haiqing Chen, Yujiu Yang

Keywords Abstract Paper

natural language generation, knowledge representation, knowledge graph

Heads-up! Unsupervised constituency parsing via self-attention heads

Bowen Li, Taeuk Kim, Reinald Kim Amplayo, Frank Keller

Keywords Abstract Paper

Sequence-Level Mixed Sample Data Augmentation

Demi Guo, Yoon Kim, Alexander Rush

Keywords Abstract Paper

sequence-to-sequence problems, scan, semantic parsing, neural networks

Contrastive Model Invertion for Data-Free Knolwedge Distillation

Gongfan Fang, Jie Song, Xinchao Wang and Chengchao Shen, Xingen Wang, Mingli Song

Keywords Abstract Paper

Machine Learning, Deep Learning, Explainable/Interpretable Machine Learning, Transfer, Adaptation, Multi-task Learning

Learning a Simple and Effective Model for Multi-turn Response Generation with Auxiliary Tasks

Keywords Paper

Keywords Paper

Kaixin Ma, Filip Ilievski, Jonathan Francis and
Yonatan Bisk, Eric Nyberg, Alessandro Oltramari

Keywords Paper

Xin Wu, Yi Cai, Yang Kai and
Tao Wang, Qing Li

Keywords Paper

Keywords Paper

Lingxiao Wang, Jing Huang, Kevin Huang and
Ziniu Hu, Guangtao Wang, Quanquan Gu

Keywords Paper

Zihan Xu, Jiangang Zhu, Ling Geng and
Yang Yang, Bojia Lin, Daxin Jiang

Keywords Paper

Keywords Paper

Keywords Paper

Junnan Li, Ramprasaath Selvaraju, Akhilesh Gotmare and
Shafiq Joty, Caiming Xiong, Steven Chu Hong Hoi

Keywords Paper

Fabio Massimo Zanzotto, Andrea Santilli, Leonardo Ranaldi and
Dario Onorati, Pierfrancesco Tommasino, Francesca Fallucchi

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Yanru Qu, Dinghan Shen, Yelong Shen and
Sandra Sajeev, Weizhu Chen, Jiawei Han

Keywords Paper

Keywords Paper

Yunzhou Shi, Zhiling Luo, Pengcheng Zhu and
Feng Ji, Wei Zhou, Haiqing Chen, Yujiu Yang

Keywords Paper

Keywords Paper

Keywords Paper

Gongfan Fang, Jie Song, Xinchao Wang and
Chengchao Shen, Xingen Wang, Mingli Song

Keywords Paper

Keywords Paper

Keywords Paper

Ningyu Zhang, Shumin Deng, Xu Cheng and
Xi Chen, Yichi Zhang, Wei Zhang, Huajun Chen

Keywords Paper

Xiexiong Lin, Weiyu Jian, Jianshan He and
Taifeng Wang, Wei Chu

Keywords Paper

Sameera Ramasinghe, Kanchana Ranasinghe, Salman Khan and
Nick Barnes, Stephen Gould

Keywords Paper

Haiqin Yang, Xiaoyuan Yao, Yiqun Duan and
Jianping Shen, Jie Zhong, Kun Zhang

Keywords Paper

Keywords Paper

Keywords Paper

Srinadh Bhojanapalli, Chulhee Yun, Ankit Singh Rawat and
Sashank Jakkam Reddi, Sanjiv Kumar

Keywords Paper

Guanyue Li, Qianfen Jiao, Sheng Qian and
Si Wu, Hau-San Wong

Keywords Paper

Keywords Paper

Matthew Khoury, Rumen Dangovski, Longwu Ou and
Preslav Nakov, Yichen Shen, Li Jing

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper