Less is Better: A cognitively inspired unsupervised model for language segmentation

08/12/2020

Less is Better: A cognitively inspired unsupervised model for language segmentation

Jinbiao Yang, Stefan L. Frank, Antal van den Bosch

Keywords:

Abstract Paper Similar Papers

Abstract: Language users process utterances by segmenting them into many cognitive units, which vary in their sizes and linguistic levels. Although we can do such unitization/segmentation easily, its cognitive mechanism is still not clear. This paper proposes an unsupervised model, Less-is-Better (LiB), to simulate the human cognitive process with respect to language unitization/segmentation. LiB follows the principle of least effort and aims to build a lexicon which minimizes the number of unit tokens (alleviating the effort of analysis) and number of unit types (alleviating the effort of storage) at the same time on any given corpus. LiB’s workflow is inspired by empirical cognitive phenomena. The design makes the mechanism of LiB cognitively plausible and the computational requirement light-weight. The lexicon generated by LiB performs the best among different types of lexicons (e.g. ground-truth words) both from an information-theoretical view and a cognitive view, which suggests that the LiB lexicon may be a plausible proxy of the mental lexicon.

The video of this talk cannot be embedded. You can watch it here:

https://underline.io/lecture/6475-less-is-better-a-cognitively-inspired-unsupervised-model-for-language-segmentation

(Link will open in new window)

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at COLING Workshops 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

25/07/2020

Combining contextualized and non-contextualized query translations to improve CLIR

Suraj Nair, Petra Galuscakova, Douglas W. Oard

Keywords Paper

CLIR, machine translation

0

0

0

0

8:39

04/07/2020

Low-Resource Generation of Multi-hop Reasoning Questions

Jianxing Yu, Wei Liu, Shuang Qiu and
Qinliang Su, Kai Wang, Xiaojun Quan, Jian Yin

Keywords Paper

Low-Resource Questions, generating questions, machine comprehension, multi-hop model

0

0

0

0

11:54

16/11/2020

Word Frequency Does Not Predict Grammatical Knowledge in Language Models

Charles Yu, Ryan Sie, Nicolas Tedeschi, Leon Bergen

Keywords Paper

reflexive anaphora, grammatical tasks, neural models, language models

0

0

0

0

11:12

08/12/2020

Understanding the effects of word-level linguistic annotations in under-resourced neural machine translation

Víctor M. Sánchez-Cartagena, Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez

Keywords Paper

0

0

0

0

14:59

04/07/2020

On the Linguistic Representational Power of Neural Machine Translation Models

Yonatan Belinkov, Nadir Durrani, Fahim Dalvi and
Hassan Sajjad, James Glass

Keywords Paper

Linguistic Models, natural processing, artificial intelligence, translating languages

0

0

0

0

19:17

05/12/2020

An exploratory study on multilingual quality estimation

Shuo Sun, Marina Fomicheva, Frédéric Blain and
Vishrav Chaudhary, Ahmed El-Kishky, Adithya Renduchintala, Francisco Guzmán, Lucia Specia

Keywords Paper

0

0

0

0

14:31

06/12/2020

Modular Meta-Learning with Shrinkage

Yutian Chen, Abe Friesen, Feryal Behbahani and
Arnaud Doucet, David Budden, Matthew Hoffman, Nando de Freitas

Keywords Paper

0

0

0

0

3:21

08/12/2020

TableGPT: Few-shot Table-to-Text Generation with Table Structure Reconstruction and Content Matching

Heng Gong, Yawei Sun, Xiaocheng Feng and
Bing Qin, Wei Bi, Xiaojiang Liu, Ting Liu

Keywords Paper

0

0

0

0

8:45

08/12/2020

Specializing Unsupervised Pretraining Models for Word-Level Semantic Similarity

Anne Lauscher, Ivan Vulić, Edoardo Maria Ponti and
Anna Korhonen, Goran Glavaš

Keywords Paper

0

0

0

0

13:01

06/12/2021

Refining Language Models with Compositional Explanations

Huihan Yao, Ying Chen, Qinyuan Ye and
Xisen Jin, Xiang Ren

Keywords Paper

machine learning, fairness, language

0

0

0

0

13:17

02/02/2021

Do Response Selection Models Really Know What’s Next? Utterance Manipulation Strategies for Multi-turn Response Selection

Taesun Whang, Dongyub Lee, Dongsuk Oh and
Chanhee Lee, Kijong Han, Dong-hun Lee, Saebyeok Lee

Keywords Paper

0

0

0

0

17:37

26/04/2020

Residual Energy-Based Models for Text Generation

Yuntian Deng, Anton Bakhtin, Myle Ott and
Arthur Szlam, Marc'Aurelio Ranzato

Keywords Paper

energy-based models, text generation

0

0

0

0

4:59

01/07/2020

Linguistic Features for Readability Assessment

Tovly Deutsch, Masoud Jasbi, Stuart Shieber

Keywords Paper

0

0

0

0

12:06

16/11/2020

Bridging Linguistic Typology and Multilingual Machine Translation with Multi-View Language Representations

Arturo Oncevay, Barry Haddow, Alexandra Birch

Keywords Paper

multilingual translation, language clustering, multilingual transfer, singular analysis

0

0

0

0

11:28

04/07/2020

Neural Data-to-Text Generation via Jointly Learning the Segmentation and Correspondence

Xiaoyu Shen, Ernie Chang, Hui Su and
Cheng Niu, Dietrich Klakow

Keywords Paper

Neural Generation, Segmentation, data-to-text tasks, neural model

0

0

0

0

9:09

16/11/2020

Zero-Shot Crosslingual Sentence Simplification

Jonathan Mallinson, Rico Sennrich, Mirella Lapata

Keywords Paper

sentence simplification, translation, simplification, encoder-decoder models

0

0

0

0

10:34

16/11/2020

Syntactic Structure Distillation Pretraining for Bidirectional Encoders

Adhiguna Kuncoro, Lingpeng Kong, Daniel Fried and
Dani Yogatama, Laura Rimell, Chris Dyer, Phil Blunsom

Keywords Paper

bert pretraining, structured tasks, natural understanding, textual learners

0

0

0

0

12:23

06/12/2020

Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data

Michael Cogswell, Jiasen Lu, Rishabh Jain and
Stefan Lee, Devi Parikh, Dhruv Batra

Keywords Paper

1

0

0

0

3:29

02/02/2021

Effective Slot Filling via Weakly-Supervised Dual-Model Learning

Jue Wang, Ke Chen, Lidan Shou and
Sai Wu, Gang Chen

Keywords Paper

0

0

0

0

18:02

02/02/2021

Have We Solved The Hard Problem? It’s Not Easy! Contextual Lexical Contrast as a Means to Probe Neural Coherence

Wenqiang Lei, Yisong Miao, Runpeng Xie and
Bonnie Webber, Meichun Liu, Tat-Seng Chua, Nancy F. Chen

Keywords Paper

0

0

0

0

18:55

02/02/2021

Commonsense Knowledge Augmentation for Low-Resource Languages via Adversarial Learning

Bosung Kim, Juae Kim, Youngjoong Ko, Jungyun Seo

Keywords Paper

0

0

0

0

19:38

01/07/2020

Neural Multi-task Text Normalization and Sanitization with Pointer-Generator

Hoang Nguyen, Sandro Cavallari

Keywords Paper

0

0

0

0

9:16

16/11/2020

X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained Language Models

Zhengbao Jiang, Antonios Anastasopoulos, Jun Araki and
Haibo Ding, Graham Neubig

Keywords Paper

factual retrieval, language models, lms, probing methods

0

0

0

0

9:45

01/07/2020

Syntactic Parsing in Humans and Machines

Paola Merlo

Keywords Paper

0

0

0

0

44:12

04/07/2020

How Can We Accelerate Progress Towards Human-like Linguistic Generalization?

Tal Linzen

Keywords Paper

natural understanding, classification task, Pretraining-Agnostic paradigm, pre-training

0

0

0

0

7:04

16/11/2020

Towards Better Context-aware Lexical Semantics:Adjusting Contextualized Representations through Static Anchors

Qianchu Liu, Diana McCarthy, Anna Korhonen

Keywords Paper

transformation, contextualized models, dynamic embeddings, post-processing technique

0

0

0

0

6:53

04/07/2020

Towards Robustifying NLI Models Against Lexical Dataset Biases

Xiang Zhou, Mohit Bansal

Keywords Paper

Natural Inference, data augmentation, Robustifying Models, deep models

0

0

0

0

11:34

16/11/2020

An Information Bottleneck Approach for Controlling Conciseness in Rationale Extraction

Bhargavi Paranjape, Mandar Joshi, John Thickstun and
Hannaneh Hajishirzi, Luke Zettlemoyer

Keywords Paper

language understanding, semi-supervised setting, complex models, explainer

0

0

0

0

11:44

02/02/2021

GENSYNTH: Synthesizing Datalog Programs without Language Bias

Jonathan Mendelson, Aaditya Naik, Mukund Raghothaman, Mayur Naik

Keywords Paper

0

0

0

0

20:03

08/12/2020

Best Practices for Data-Efficient Modeling in NLG:How to Train Production-Ready Neural Models with Less Data

Ankit Arun, Soumya Batra, Vikas Bhardwaj and
Ashwini Challa, Pinar Donmez, Peyman Heidari, Hakan Inan, Shashank Jain, Anuj Kumar, Shawn Mei, Karthik Mohan, Michael White

Keywords Paper

0

0

0

0

15:01

26/04/2020

A Latent Morphology Model for Open-Vocabulary Neural Machine Translation

Duygu Ataman, Wilker Aziz, Alexandra Birch

Keywords Paper

neural machine translation, low-resource languages, latent-variable models

0

0

0

0

5:10

08/12/2020

Emergent Communication Pretraining for Few-Shot Machine Translation

Yaoyiran Li, Edoardo Maria Ponti, Ivan Vulić, Anna Korhonen

Keywords Paper

0

0

0

0

14:42

16/11/2020

Centering-based Neural Coherence Modeling with Hierarchical Discourse Segments

Sungho Jeon, Michael Strube

Keywords Paper

automated scoring, neural models, coherence model, linguistic coherence

0

0

0

0

10:53

16/11/2020

Generationary or “How We Went beyond Word Sense Inventories and Learned to Gloss”

Michele Bevilacqua, Marco Maru, Roberto Navigli

Keywords Paper

generative modeling, definition modeling, discriminative tasks, word disambiguation

0

0

0

0

11:49

16/11/2020

Do sequence-to-sequence VAEs learn global features of sentences?

Tom Bosc, Pascal Vincent

Keywords Paper

generation, memorization, autoregressive models, variational autoencoder

0

0

0

0

12:00

01/07/2020

How Self-Attention Improves Rare Class Performance in a Question-Answering Dialogue Agent

Adam Stiff, Qi Song, Eric Fosler-Lussier

Keywords Paper

0

0

0

0

7:55

26/04/2020

Plug and Play Language Models: A Simple Approach to Controlled Text Generation

Sumanth Dathathri, Andrea Madotto, Janice Lan and
Jane Hung, Eric Frank, Piero Molino, Jason Yosinski, Rosanne Liu

Keywords Paper

controlled text generation, generative models, conditional generative models, language modeling, transformer

0

0

1

1

4:58

04/07/2020

Language-aware Interlingua for Multilingual Neural Machine Translation

Changfeng Zhu, Heng Yu, Shanbo Cheng, Weihua Luo

Keywords Paper

Multilingual Translation, low-resource scenarios, Language-aware Interlingua, NMT

0

0

0

0

6:09

18/07/2021

Straight to the Gradient: Learning to Use Novel Tokens for Neural Text Generation

Xiang Lin, Simeng Han, Shafiq Joty

Keywords Paper

Applications, Natural Language Processing

0

0

0

0

16:00

04/07/2020

Few-Shot NLG with Pre-Trained Language Model

Zhiyu Chen, Harini Eavani, Wenhu Chen and
Yinyin Liu, William Yang Wang

Keywords Paper

natural generation, NLG, real-world applications, content selection

0

0

0

0

5:59