Exploiting class labels to boost performance on embedding-based text classification

19/10/2020

Exploiting class labels to boost performance on embedding-based text classification

Arkaitz Zubiaga

Keywords: embeddings, text classification, weighting schemes

Abstract Paper Similar Papers

Abstract: Text classification is one of the most frequent tasks for processing textual data, facilitating among others research from large-scale datasets. Embeddings of different kinds have recently become the de facto standard as features used for text classification. These embeddings have the capacity to capture meanings of words inferred from occurrences in large external collections. While they are built out of external collections, they are unaware of the distributional characteristics of words in the classification dataset at hand, including most importantly the distribution of words across classes in training data. To make the most of these embeddings as features and to boost the performance of classifiers using them, we introduce a weighting scheme, Term Frequency-Category Ratio (TF-CR), which can weight high-frequency, category-exclusive words higher when computing word embeddings. Our experiments on eight datasets show the effectiveness of TF-CR, leading to improved performance scores over the well-known weighting schemes TF-IDF and KLD as well as over the absence of a weighting scheme in most cases.

The video of this talk cannot be embedded. You can watch it here:

https://dl.acm.org/doi/10.1145/3340531.3417444#sec-supp

(Link will open in new window)

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at CIKM 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

19/08/2021

MultiMirror: Neural Cross-lingual Word Alignment for Multilingual Word Sense Disambiguation

Luigi Procopio, Edoardo Barba, Federico Martelli, Roberto Navigli

Keywords Paper

Natural Language Processing, Natural Language Semantics, Resources and Evaluation

0

0

0

0

12:25

02/02/2021

Revisiting Mahalanobis Distance for Transformer-Based Out-of-Domain Detection

Alexander Podolskiy, Dmitry Lipin, Andrey Bout and
Ekaterina Artemova, Irina Piontkovskaya

Keywords Paper

0

0

0

0

16:08

04/07/2020

Cross-Lingual Unsupervised Sentiment Classification with Multi-View Transfer Learning

Hongliang Fei, Ping Li

Keywords Paper

Cross-Lingual Classification, sentiment classification, unsupervised system, classification

0

0

0

0

12:23

16/11/2020

What Have We Achieved on Text Summarization?

Dandan Huang, Leyang Cui, Sen Yang and
Guangsheng Bao, Kun Wang, Jun Xie, Yue Zhang

Keywords Paper

text summarization, deep learning, automatic summarizers, summarization systems

0

0

0

0

11:20

19/08/2021

Exemplification Modeling: Can You Give Me an Example, Please?

Edoardo Barba, Luigi Procopio, Caterina Lacerra and
Tommaso Pasini, Roberto Navigli

Keywords Paper

Natural Language Processing, Natural Language Semantics, Resources and Evaluation

0

0

0

0

14:47

03/05/2021

Variational Information Bottleneck for Effective Low-Resource Fine-Tuning

Rabeeh Karimi Mahabadi, Yonatan Belinkov, James Henderson

Keywords Paper

variational information bottleneck, biases, robust, over-fitting, large-scale pre-trained language models, NLP, Transfer learning

0

0

0

0

5:07

08/12/2020

Federated Learning for Spoken Language Understanding

Zhiqi Huang, Fenglin Liu, Yuexian Zou

Keywords Paper

0

0

0

0

14:05

08/12/2020

Fine-grained Information Status Classification Using Discourse Context-Aware BERT

Yufang Hou

Keywords Paper

0

0

0

0

13:13

16/11/2020

Improving Multilingual Models with Language-Clustered Vocabularies

Hyung Won Chung, Dan Garrette, Kiat Chuan Tan, Jason Riesa

Keywords Paper

massively applications, multilingual generation, cross-lingual sharing, multilingual models

0

0

0

0

6:59

02/02/2021

LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding

Hao Fu, Shaojun Zhou, Qihong Yang and
Junjie Tang, Guiquan Liu, Kaikui Liu, Xiaolong Li

Keywords Paper

0

0

0

0

15:25

04/07/2020

CluBERT: A Cluster-Based Approach for Learning Sense Distributions in Multiple Languages

Tommaso Pasini, Federico Scozzafava, Bianca Scarlini

Keywords Paper

English tasks, disambiguation, multilingual tasks, CluBERT

0

0

0

0

12:17

16/11/2020

Visually Grounded Compound PCFGs

Yanpeng Zhao, Ivan Titov

Keywords Paper

exploiting groundings, language understanding, gradient estimates, fully-differentiable learning

0

0

0

0

12:24

08/12/2020

SentiX: A Sentiment-Aware Pre-Trained Model for Cross-Domain Sentiment Analysis

Jie Zhou, Junfeng Tian, Rui Wang and
Yuanbin Wu, Wenming Xiao, Liang He

Keywords Paper

0

0

0

0

12:42

16/11/2020

Structured Pruning of Large Language Models

Ziheng Wang, Jeremy Wohlwend, Tao Lei

Keywords Paper

natural tasks, model compression, language tasks, pruning embeddings

0

0

0

0

11:04

06/12/2021

Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation

Jixuan Wang, Kuan-Chieh Wang, Frank Rudzicz, Michael Brudno

Keywords Paper

machine learning, transformers, meta learning, language, transfer learning

0

0

0

0

14:45

02/02/2021

C2C-GenDA: Cluster-to-Cluster Generation for Data Augmentation of Slot Filling

Yutai Hou, Sanyuan Chen, Wanxiang Che and
Cheng Chen, Ting Liu

Keywords Paper

0

0

0

0

15:01

16/11/2020

Partially-Aligned Data-to-Text Generation with Distant Supervision

Zihao Fu, Bei Shi, Wai Lam and
Lidong Bing, Zhiyuan Liu

Keywords Paper

data-to-text task, generation task, dataset problem, over-generation problem

0

0

0

0

11:58

19/04/2021

Deep subjecthood: Higher-order grammatical features in multilingual BERT

Isabel Papadimitriou, Ethan A. Chi, Richard Futrell, Kyle Mahowald

Keywords Paper

0

0

0

0

11:56

06/12/2021

Align before Fuse: Vision and Language Representation Learning with Momentum Distillation

Junnan Li, Ramprasaath Selvaraju, Akhilesh Gotmare and
Shafiq Joty, Caiming Xiong, Steven Chu Hong Hoi

Keywords Paper

transformers, vision, representation learning

0

0

0

0

9:40

12/07/2020

Word-Level Speech Recognition With a Letter to Word Encoder

Ronan Collobert, Awni Hannun, Gabriel Synnaeve

Keywords Paper

Applications - Language, Speech and Dialog

0

0

0

0

14:53

26/04/2020

Residual Energy-Based Models for Text Generation

Yuntian Deng, Anton Bakhtin, Myle Ott and
Arthur Szlam, Marc'Aurelio Ranzato

Keywords Paper

energy-based models, text generation

0

0

0

0

4:59

01/07/2020

Supertagging with CCG primitives

Aditya Bhargava, Gerald Penn

Keywords Paper

0

0

0

0

5:00

06/12/2021

TriBERT: Human-centric Audio-visual Representation Learning

Tanzila Rahman, Mengyu Yang, Leonid Sigal

Keywords Paper

transformers, representation learning

0

0

0

0

13:54

04/07/2020

Evaluating and Enhancing the Robustness of Neural Network-based Dependency Parsing Models with Adversarial Examples

Xiaoqing Zheng, Jiehang Zeng, Yi Zhou and
Cho-Jui Hsieh, Minhao Cheng, Xuanjing Huang

Keywords Paper

semantic tasks, sentiment analysis, question answering, reading comprehension

0

0

0

0

11:57

18/07/2021

Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision

Chao Jia, Yinfei Yang, Ye Xia and
Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc Le, Yun-Hsuan Sung, Zhen Li, Tom Duerig

Keywords Paper

Deep Learning, Embedding and Representation learning

0

0

0

0

21:03

04/07/2020

Unsupervised Cross-lingual Representation Learning at Scale

Alexis Conneau, Kartikay Khandelwal, Naman Goyal and
Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, Veselin Stoyanov

Keywords Paper

cross-lingual tasks, XNLI, MLQA, NER

0

0

0

0

12:15

06/12/2020

A causal view of compositional zero-shot recognition

Yuval Atzmon, Felix Kreuk, Uri Shalit, Gal Chechik

Keywords Paper

0

0

0

0

3:22

22/11/2021

From Seq2Seq Recognition to Handwritten Word Embeddings

George Retsinas, Giorgos Sfikas, Christophoros Nikou, Petros Maragos

Keywords Paper

keyword spotting, handwritten text recognition, sequence-to-sequence

0

0

0

0

2:59

04/07/2020

SpanBERT: Improving Pre-training by Representing and Predicting Spans

Mandar Joshi, Danqi Chen, Yinhan Liu and
Daniel S. Weld, Luke Zettlemoyer, Omer Levy

Keywords Paper

span tasks, question answering, coreference resolution, OntoNotes task

0

0

0

0

14:14

16/11/2020

On Negative Interference in Multilingual Models: Findings and A Meta-Learning Treatment

Zirui Wang, Zachary C. Lipton, Yulia Tsvetkov

Keywords Paper

multilingual models, meta-learning algorithm, multilingual representations, negative interference

0

0

0

0

12:03

02/02/2021

Audio-Oriented Multimodal Machine Comprehension via Dynamic Inter- and Intra-modality Attention

Zhiqi Huang, Fenglin Liu, Xian Wu and
Shen Ge, Helin Wang, Wei Fan, Yuexian Zou

Keywords Paper

0

0

0

0

14:47

16/11/2020

Detecting Fine-Grained Cross-Lingual Semantic Divergences without Supervision by Learning to Rank

Eleftheria Briakou, Marine Carpuat

Keywords Paper

detecting content, cross-lingual nlp, machine problem, annotation

0

0

0

0

11:06

16/11/2020

The World is Not Binary: Learning to Rank with Grayscale Data for Dialogue Response Selection

Zibo Lin, Deng Cai, Yan Wang and
Xiaojiang Liu, Haitao Zheng, Shuming Shi

Keywords Paper

response selection, retrieval-based systems, learning-to-rank problem, learning-to-rank

0

0

0

0

12:03

16/11/2020

MEGATRON-CNTRL: Controllable Story Generation with External Knowledge Using Large-Scale Language Models

Peng Xu, Mostofa Patwary, Mohammad Shoeybi and
Raul Puri, Pascale Fung, Anima Anandkumar, Bryan Catanzaro

Keywords Paper

text generation, pre-trained models, megatron-cntrl, large-scale models

0

0

0

0

11:59

07/09/2020

BCaR: Beginner Classifier as Regularization Towards Generalizable Re-ID

Masato Tamura, Tomoaki Yoshinaga

Keywords Paper

person re-identification, generalizable, soft label, knowledge distillation, Re-ID, domain generalization

0

0

0

0

6:53

14/06/2020

Counterfactual Samples Synthesizing for Robust Visual Question Answering

Long Chen, Xin Yan, Jun Xiao and
Hanwang Zhang, Shiliang Pu, Yueting Zhuang

Keywords Paper

visual question answering, counterfactual, debias, language bias, data augmentation, visual-and-language

0

0

0

0

1:01

04/07/2020

Pretraining with Contrastive Sentence Objectives Improves Discourse Performance of Language Models

Dan Iter, Kelvin Guu, Larry Lansing, Dan Jurafsky

Keywords Paper

Discourse, unsupervised text, contextual representations, discourse-level representations

0

0

0

0

9:14

04/07/2020

Sources of Transfer in Multilingual Named Entity Recognition

David Mueller, Nicholas Andrews, Mark Dredze

Keywords Paper

Multilingual Recognition, polyglot recognition, multilingual transfer, naive models

0

0

0

0

12:23

16/11/2020

MODE-LSTM: A Parameter-efficient Recurrent Network with Multi-Scale for Sentence Classification

Qianli Ma, Zhenxi Lin, Jiangyue Yan and
Zipeng Chen, Liuhong Yu

Keywords Paper

sentence classification, extracting features, generalization, cnn models

0

0

0

0

10:35

16/11/2020

Small but Mighty: New Benchmarks for Split and Rephrase

Li Zhang, Huaiyu Zhu, Siddhartha Brahma, Yunyao Li

Keywords Paper

text task, fine-grained evaluation, automatic process, rule-based model

0

0

0

0

6:58