Short Text Topic Modeling with Topic Distribution Quantization and Negative Sampling Decoder

16/11/2020

Short Text Topic Modeling with Topic Distribution Quantization and Negative Sampling Decoder

Xiaobao Wu, Chunping Li, Yan Zhu, Yishu Miao

Keywords: decoding, short modeling, topic models, neural model

Abstract Paper Similar Papers

Abstract: Topic models have been prevailing for many years on discovering latent semantics while modeling long documents. However, for short texts they generally suffer from data sparsity because of extremely limited word co-occurrences; thus tend to yield repetitive or trivial topics with low quality. In this paper, to address this issue, we propose a novel neural topic model in the framework of autoencoding with a new topic distribution quantization approach generating peakier distributions that are more appropriate for modeling short texts. Besides the encoding, to tackle this issue in terms of decoding, we further propose a novel negative sampling decoder learning from negative samples to avoid yielding repetitive topics. We observe that our model can highly improve short text topic modeling performance. Through extensive experiments on real-world datasets, we demonstrate our model can outperform both strong traditional and neural baselines under extreme data sparsity scenes, producing high-quality topics.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at EMNLP 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

25/07/2020

Copula guided neural topic modelling for short texts

Lihui Lin, Hongyu Jiang, Yanghui Rao

Keywords Paper

short text modelling, Archimedean copulas, neural topic modelling, auto-encoding variational Bayes

0

0

0

0

8:46

03/05/2021

Neural Topic Model via Optimal Transport

He Zhao, Dinh Phung, Viet Huynh and
Trung Le, Wray Buntine

Keywords Paper

optimal transport, document analysis, topic modelling

0

0

0

1

9:29

26/08/2020

Variational Autoencoders for Sparse and Overdispersed Discrete Data

He Zhao, Piyush Rai, Lan Du and
Wray Buntine, Dinh Phung, Mingyuan Zhou

Keywords Paper

0

0

0

0

14:28

08/12/2020

Enhancing Extractive Text Summarization with Topic-Aware Graph Neural Networks

Peng Cui, Le Hu, Yuanchao Liu

Keywords Paper

0

0

0

0

6:48

03/05/2021

Mirostat: A Neural Text Decoding Algorithm That Directly Controls Perplexity

Sourya Basu, Govardana Sachithanandam Ramachandran, Nitish Shirish Keskar, Lav R Varshney

Keywords Paper

cross-entropy, incoherence, repetitions, sampling algorithms, Neural text decoding

0

0

0

0

5:07

06/12/2020

OTLDA: A Geometry-aware Optimal Transport Approach for Topic Modeling

Viet Huynh, He Zhao, Dinh Phung

Keywords Paper

0

0

0

1

3:04

26/04/2020

The Curious Case of Neural Text Degeneration

Ari Holtzman, Jan Buys, Li Du and
Maxwell Forbes, Yejin Choi

Keywords Paper

generation, text, NLG, NLP, natural language, natural language generation, language model, neural, neural language model

0

0

0

0

4:57

02/02/2021

Meta-Transfer Learning for Low-Resource Abstractive Summarization

Yi-Syuan Chen, Hong-Han Shuai

Keywords Paper

0

0

0

0

19:10

19/10/2020

Distant supervision in BERT-based adhoc document retrieval

Koustav Rudra, Avishek Anand

Keywords Paper

distant supervision, adhoc retrieval, document ranking

0

0

0

0

6:49

26/08/2020

Prior-aware Composition Inference for Spectral Topic Models

Moontae Lee, David Bindel, David Mimno

Keywords Paper

0

0

0

0

14:46

19/04/2021

Randomized deep structured prediction for discourse-level processing

Manuel Widmoser, Maria Leonor Pacheco, Jean Honorio, Dan Goldwasser

Keywords Paper

0

0

0

0

9:44

02/02/2021

SARG: A Novel Semi Autoregressive Generator for Multi-turn Incomplete Utterance Restoration

Mengzuo Huang, Feng Li, Wuhe Zou, Weidong Zhang

Keywords Paper

0

0

0

0

14:50

08/12/2020

Model-agnostic Methods for Text Classification with Inherent Noise

Kshitij Tayal, Rahul Ghosh, Vipin Kumar

Keywords Paper

0

0

0

0

8:46

08/12/2020

Best Practices for Data-Efficient Modeling in NLG:How to Train Production-Ready Neural Models with Less Data

Ankit Arun, Soumya Batra, Vikas Bhardwaj and
Ashwini Challa, Pinar Donmez, Peyman Heidari, Hakan Inan, Shashank Jain, Anuj Kumar, Shawn Mei, Karthik Mohan, Michael White

Keywords Paper

0

0

0

0

15:01

16/11/2020

Cold-Start and Interpretability: Turning Regular Expressions into Trainable Recurrent Neural Networks

Chengyue Jiang, Yinggong Zhao, Shanbo Chu and
Libin Shen, Kewei Tu

Keywords Paper

natural applications, training, text classification, neural networks

0

0

0

0

11:32

19/04/2021

Zero-shot neural passage retrieval via domain-targeted synthetic question generation

Ji Ma, Ivan Korotkov, Yinfei Yang and
Keith Hall, Ryan McDonald

Keywords Paper

0

0

0

0

12:47

26/04/2020

Learning from Explanations with Neural Execution Tree

Ziqi Wang, Yujia Qin, Wenxuan Zhou and
Jun Yan, Qinyuan Ye, Leonardo Neves, Zhiyuan Liu, Xiang Ren

Keywords Paper

0

0

0

0

4:58

04/07/2020

Location Attention for Extrapolation to Longer Sequences

Yann Dubois, Gautier Dagan, Dieuwke Hupkes, Elia Bruni

Keywords Paper

Extrapolation, natural processing, generalization, Lookup task

0

0

0

0

11:02

07/09/2020

Learning Effectively from Noisy Supervision for Weakly Supervised Semantic Segmentation

Wenbin Xie, Qiaoqiao Wei, Zheng Li, Hui Zhang

Keywords Paper

Semantic Segmentation, Weakly Supervised Semantic Segmentation, Self Attention

0

0

0

0

3:46

04/07/2020

Neural Data-to-Text Generation via Jointly Learning the Segmentation and Correspondence

Xiaoyu Shen, Ernie Chang, Hui Su and
Cheng Niu, Dietrich Klakow

Keywords Paper

Neural Generation, Segmentation, data-to-text tasks, neural model

0

0

0

0

9:09

14/06/2020

On Vocabulary Reliance in Scene Text Recognition

Zhaoyi Wan, Jielei Zhang, Liang Zhang and
Jiebo Luo, Cong Yao

Keywords Paper

scene text recognition, text spotting, document analysis, ocr, scene text detection, sequence recognition, language and vision

0

0

0

0

1:00

14/06/2020

Adversarial Feature Hallucination Networks for Few-Shot Learning

Kai Li, Yulun Zhang, Kunpeng Li, Yun Fu

Keywords Paper

few-shot learning, data augmentation, feature hallucination, generative adversarial networks

0

0

0

0

1:01

19/08/2021

Guided Attention Network for Concept Extraction

Songtao Fang, Zhenya Huang, Ming He and
Shiwei Tong, Xiaoqing Huang, Ye Liu, Jie Huang, Qi Liu

Keywords Paper

Data Mining, Information Retrieval, Mining Text, Web, Social Media

0

0

0

0

14:26

03/05/2021

Variational Information Bottleneck for Effective Low-Resource Fine-Tuning

Rabeeh Karimi Mahabadi, Yonatan Belinkov, James Henderson

Keywords Paper

variational information bottleneck, biases, robust, over-fitting, large-scale pre-trained language models, NLP, Transfer learning

0

0

0

0

5:07

04/07/2020

Diversifying Dialogue Generation with Non-Conversational Text

Hui Su, Xiaoyu Shen, Sanqiang Zhao and
Zhou Xiao, Pengwei Hu, Randy Zhong, Cheng Niu, Jie Zhou

Keywords Paper

Diversifying Generation, low-diversity problem, open-domain generation, dialogue generation

0

0

0

1

10:53

26/08/2020

Improving Maximum Likelihood Training for Text Generation with Density Ratio Estimation

Yuxuan Song, Ning Miao, Hao Zhou and
Lantao Yu, Mingxuan Wang, Lei Li

Keywords Paper

0

0

0

0

12:32

19/04/2021

Generative text modeling through short run inference

Bo Pang, Erik Nijkamp, Tian Han, Ying Nian Wu

Keywords Paper

0

0

0

0

7:55

03/05/2021

Neural Pruning via Growing Regularization

Huan Wang, Can Qin, Yulun Zhang, Yun Fu

Keywords Paper

deep neural network pruning, regularization, Hessian matrix, model compression

0

0

0

0

6:15

02/02/2021

DIBS: Diversity Inducing Information Bottleneck in Model Ensembles

Samarth Sinha, Homanga Bharadhwaj, Anirudh Goyal and
Hugo Larochelle, Animesh Garg, Florian Shkurti

Keywords Paper

0

0

0

0

16:26

05/12/2020

Knowledge-enhanced named entity disambiguation for short text

Zhifan Feng, Qi Wang, Wenbin Jiang and
Yajuan Lyu, Yong Zhu

Keywords Paper

0

0

0

0

14:40

19/08/2021

Learning Deeper Non-Monotonic Networks by Softly Transferring Solution Space

Zheng-Fan Wu, Hui Xue, Weimin Bai

Keywords Paper

Machine Learning, Kernel Methods, Deep Learning, Classification

0

0

0

0

12:50

06/12/2021

Shapeshifter: a Parameter-efficient Transformer using Factorized Reshaped Matrices

Aliakbar Panahi, Seyran Saeedi, Tom Arodz

Keywords Paper

transformers

0

0

0

0

13:06

03/05/2021

NAS-Bench-ASR: Reproducible Neural Architecture Search for Speech Recognition

Abhinav Mehrotra, Alberto Gil Couto Pimentel Ramos, Sourav Bhattacharya and
Łukasz Dudziak, Ravichander Vipperla, Thomas C Chau, Mohamed Abdelfattah, Samin Ishtiaq, Nic Lane

Keywords Paper

Benchmark, NAS, ASR

0

0

0

0

4:50

04/07/2020

Discrete Latent Variable Representations for Low-Resource Text Classification

Shuning Jin, Sam Wiseman, Karl Stratos, Karen Livescu

Keywords Paper

Low-Resource Classification, Discrete Representations, discrete models, continuous representations

0

0

0

0

11:17

16/11/2020

Sparse Text Generation

Pedro Henrique Martins, Zita Marinho, André F. T. Martins

Keywords Paper

story completion, dialogue generation, text generators, language models

0

0

0

0

11:27

26/04/2020

Neural Module Networks for Reasoning over Text

Nitish Gupta, Kevin Lin, Dan Roth and
Sameer Singh, Matt Gardner

Keywords Paper

question answering, compositionality, neural module networks, multi-step reasoning, reading comprehension

0

0

0

0

4:36

12/07/2020

SIGUA: Forgetting May Make Learning with Noisy Labels More Robust

Bo Han, Gang Niu, Xingrui Yu and
QUANMING YAO, Miao Xu, Ivor Tsang, Masashi Sugiyama

Keywords Paper

Deep Learning - Algorithms

0

0

0

0

7:00

04/07/2020

Word-level Textual Adversarial Attacking as Combinatorial Optimization

Yuan Zang, Fanchao Qi, Chenghao Yang and
Zhiyuan Liu, Meng Zhang, Qun Liu, Maosong Sun

Keywords Paper

Textual attacking, Word-level attacking, combinatorial problem, Word-level Attacking

0

0

0

0

9:34

14/06/2020

Towards Causal VQA: Revealing and Reducing Spurious Correlations by Invariant and Covariant Semantic Editing

Vedika Agarwal, Rakshith Shetty, Mario Fritz

Keywords Paper

robustness, vqa, causality, gan, dataset, evaluation, automated semantic scene editing, data augmentation, invariance, covariance

0

0

0

0

1:00

03/05/2021

Auxiliary Task Update Decomposition: The Good, the Bad and the Neutral

Lucio Dery, Yann Dauphin, David Grangier

Keywords Paper

multitask learning, deeplearning, pre-training, gradient decomposition

0

0

0

0

5:22