Learning Domain Terms - Empirical Methods to Enhance Enterprise Text Analytics Performance

08/12/2020

Learning Domain Terms - Empirical Methods to Enhance Enterprise Text Analytics Performance

Gargi Roy, Lipika Dey, Mohammad Shakir, Tirthankar Dasgupta

Keywords:

Abstract Paper Similar Papers

Abstract: Performance of standard text analytics algorithms are known to be substantially degraded on consumer generated data, which are often very noisy. These algorithms also do not work well on enterprise data which has a very different nature from News repositories, storybooks or Wikipedia data. Text cleaning is a mandatory step which aims at noise removal and correction to improve performance. However, enterprise data need special cleaning methods since it contains many domain terms which appear to be noise against a standard dictionary, but in reality are not so. In this work we present detailed analysis of characteristics of enterprise data and suggest unsupervised methods for cleaning these repositories after domain terms have been automatically segregated from true noise terms. Noise terms are thereafter corrected in a contextual fashion. The effectiveness of the method is established through careful manual evaluation of error corrections over several standard data sets, including those available for hate speech detection, where there is deliberate distortion to avoid detection. We also share results to show enhancement in classification accuracy after noise correction.

The video of this talk cannot be embedded. You can watch it here:

https://underline.io/lecture/6120-learning-domain-terms---empirical-methods-to-enhance-enterprise-text-analytics-performance

(Link will open in new window)

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at COLING Workshops 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

04/07/2020

Noise-Based Augmentation Techniques for Emotion Datasets: What do we Recommend?

Mimansa Jaiswal, Emily Mower Provost

Keywords Paper

mental monitoring, educational diagnosis, hate classification, targeted advertising

0

0

0

0

9:20

19/04/2021

Challenges in automated debiasing for toxic language detection

Xuhui Zhou, Maarten Sap, Swabha Swayamdipta and
Yejin Choi, Noah Smith

Keywords Paper

0

0

0

0

11:54

22/11/2021

PropMix: Hard Sample Filtering and Proportional MixUp for Learning with Noisy Labels

Filipe Rolim Cordeiro, Vasileios Belagiannis, Ian Reid, Gustavo Carneiro

Keywords Paper

noisy labels, noisy annotation, Mixup, hard samples, noisy samples, noisy training

0

0

0

0

3:01

26/04/2020

Learning The Difference That Makes A Difference With Counterfactually-Augmented Data

Divyansh Kaushik, Eduard Hovy, Zachary Lipton

Keywords Paper

humans in the loop, annotation artifacts, text classification, sentiment analysis, natural language inference

0

0

0

0

4:25

08/12/2020

Effective Use of Target-side Context for Neural Machine Translation

Hideya Mino, Hitoshi Ito, Isao Goto and
Ichiro Yamada, Takenobu Tokunaga

Keywords Paper

0

0

0

0

13:42

05/01/2021

TrustMAE: A Noise-Resilient Defect Classification Framework Using Memory-Augmented Auto-Encoders With Trust Regions

Daniel Stanley Tan, Yi-Chun Chen, Trista Pei-Chun Chen, Wei-Chao Chen

Keywords Paper

0

0

0

0

4:40

14/06/2020

Real-World Person Re-Identification via Degradation Invariance Learning

Yukun Huang, Zheng-Jun Zha, Xueyang Fu and
Richang Hong, Liang Li

Keywords Paper

disentangled representation learning, person re-identification, generative adversarial network, image degradation, self-supervised learning

0

0

0

0

1:01

03/05/2021

In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label Selection Framework for Semi-Supervised Learning

Mamshad Nayeem Rizve, Kevin Duarte, Yogesh S Rawat, Mubarak Shah

Keywords Paper

Deep Learning, Calibration, Uncertainty, Pseudo-Labeling, Semi-Supervised Learning

0

0

0

0

5:06

16/11/2020

HABERTOR: An Efficient and Effective Deep Hatespeech Detector

Thanh Tran, Yifan Hu, Changwei Hu and
Kevin Yen, Fei Tan, Kyumin Lee, Se Rim Park

Keywords Paper

downstream task, hatespeech classification, habertor model, bert model

0

0

0

0

11:46

16/11/2020

Fortifying Toxic Speech Detectors Against Veiled Toxicity

Xiaochuang Han, Yulia Tsvetkov

Keywords Paper

detecting toxicity, toxic detectors, toxic detector, disguised language

0

0

0

0

7:00

03/05/2021

Mirostat: A Neural Text Decoding Algorithm That Directly Controls Perplexity

Sourya Basu, Govardana Sachithanandam Ramachandran, Nitish Shirish Keskar, Lav R Varshney

Keywords Paper

cross-entropy, incoherence, repetitions, sampling algorithms, Neural text decoding

0

0

0

0

5:07

02/11/2020

Deep autoencoding GMM-based unsupervised anomaly detection in acoustic signals and its hyper-parameter optimization

Harsh Purohit, Ryo Tanabe, Takashi Endo and
Kaori Suefusa, Yuki Nikaido, Yohei Kawaguchi

Keywords Paper

0

0

0

0

15:41

14/06/2020

Global-Local GCN: Large-Scale Label Noise Cleansing for Face Recognition

Yaobin Zhang, Weihong Deng, Mei Wang and
Jiani Hu, Xian Li, Dongyue Zhao, Dongchao Wen

Keywords Paper

face recognition, label noise, graph convolutional network, global-local

0

0

0

0

1:00

03/05/2021

Active Contrastive Learning of Audio-Visual Video Representations

Shuang Ma, Zhaoyang Zeng, Daniel McDuff, Yale Song

Keywords Paper

video recognition, audio-visual representation, self-supervised learning, active learning, contrastive representation learning

0

0

0

0

5:22

05/01/2021

Legacy Photo Editing With Learned Noise Prior

Yuzhi Zhao, Lai-Man Po, Tingyu Lin and
Xuehui Wang, Kangcheng Liu, Yujia Zhang, Wing-Yin Yu, Pengfei Xian, Jingjing Xiong

Keywords Paper

0

0

0

0

4:51

03/05/2021

Learning with Instance-Dependent Label Noise: A Sample Sieve Approach

Hao Cheng, Zhaowei Zhu, Xingyu Li and
Yifei Gong, Xing Sun, Yang Liu

Keywords Paper

deep neural networks., instance-based label noise, Learning with noisy labels

0

0

0

0

5:18

03/05/2021

Empirical Analysis of Unlabeled Entity Problem in Named Entity Recognition

Yangming Li, lemao liu, Shuming Shi

Keywords Paper

Negative Sampling, Unlabeled Entity Problem, Named Entity Recognition

0

0

0

1

4:49

22/11/2021

Tendentious Noise-rectifying Framework for Pathological HCC Grading

XiaoTian Yu, Zunlei Feng, Yuexuan Wang and
Thomas Kwok To Li, Xiuming Zhang, Mingli Song

Keywords Paper

noisy label, pathological image, HCC

0

0

0

0

2:31

02/02/2021

Learning from Noisy Labels with Complementary Loss Functions

Deng-Bao Wang, Yong Wen, Lujia Pan, Min-Ling Zhang

Keywords Paper

0

0

0

0

14:00

06/12/2021

Interactive Label Cleaning with Example-based Explanations

Stefano Teso, Andrea Bontempelli, Fausto Giunchiglia, Andrea Passerini

Keywords Paper

active learning

0

0

0

0

12:23

12/08/2020

Fuzzing Error Handling Code using Context-Sensitive Software Fault Injection

Zu-Ming Jiang, Jia-Ju Bai, Kangjie Lu, Shi-Min Hu

Keywords Paper

0

0

0

0

10:37

19/04/2021

WER-BERT: Automatic WER estimation with BERT in a balanced ordinal classification paradigm

Akshay Krishna Sheshadri, Anvesh Rao Vijjini, Sukhdeep Kharbanda

Keywords Paper

0

0

0

0

11:45

22/06/2020

IterefinE: Iterative KG Refinement Embeddings using Symbolic Knowledge

Siddhant Arora, Srikanta Bedathur, Maya Ramanath, Deepak Sharma

Keywords Paper

Knowledge graph refinement, embeddings, inference

0

0

0

0

5:00

02/11/2020

Evaluation metric of sound event detection considering severe misdetection by scenes

Noriyuki Tonami, Keisuke Imoto, Takahiro Fukumori, Yoichi Yamashita

Keywords Paper

0

0

0

0

14:10

04/07/2020

A Reinforced Generation of Adversarial Examples for Neural Machine Translation

Wei Zou, Shujian Huang, Jun Xie and
Xinyu Dai, Jiajun Chen

Keywords Paper

Reinforced Examples, Neural Translation, Neural , industrial maintenance

0

0

0

0

15:39

19/04/2021

From toxicity in online comments to incivility in American news: Proceed with caution

Anushree Hede, Oshin Agarwal, Linda Lu and
Diana C. Mutz, Ani Nenkova

Keywords Paper

0

0

0

0

10:10

04/07/2020

Max-Margin Incremental CCG Parsing

Miloš Stanojević, Mark Steedman

Keywords Paper

Incremental parsing, human processing, ASR, MT

0

0

0

0

11:39

12/08/2020

ParmeSan: Sanitizer-guided Greybox Fuzzing

Sebastian Österlund, Kaveh Razavi, Herbert Bos, Cristiano Giuffrida

Keywords Paper

0

0

0

0

12:04

08/12/2020

Is it Great or Terrible? Preserving Sentiment in Neural Machine Translation of Arabic Reviews

Hadeel Saadany, Constantin Orasan

Keywords Paper

0

0

0

0

14:35

12/07/2020

Adversarial Filters of Dataset Biases

Ronan Le Bras, Swabha Swayamdipta, Chandra Bhagavatula and
Rowan Zellers, Matthew Peters, Ashish Sabharwal, Yejin Choi

Keywords Paper

Deep Learning - Algorithms

0

0

0

0

15:25

06/12/2020

AOT: Appearance Optimal Transport Based Identity Swapping for Forgery Detection

Hao Zhu, Chaoyou Fu, Qianyi Wu and
Wayne Wu, Chen Qian, Ran He

Keywords Paper

0

0

0

0

3:14

03/08/2020

Brief announcement: Noisy beeping networks

Yagel Ashkenazi, Ran Gelles, Amir Leshem

Keywords Paper

collision-detection, noise-resilience, error-correction in networks

0

0

0

0

9:19

07/09/2020

From Saturation to Zero-Shot Visual Relationship Detection Using Local Context

Nikolaos Gkanatsios, Vassilis Pitsikalis, Petros Maragos

Keywords Paper

Visual Relationship Detection, Scene Graph Generation, Zero-shot Classification, Local Context, Language Bias

0

0

0

0

7:17

14/06/2020

Optical Flow in the Dark

Yinqiang Zheng, Mingfang Zhang, Feng Lu

Keywords Paper

low-light, optical flow, noise modeling, synthetic dataset, cnn

0

0

0

0

1:01

02/02/2021

Tackling Instance-Dependent Label Noise via a Universal Probabilistic Model

Qizhou Wang, Bo Han, Tongliang Liu and
Gang Niu, Jian Yang, Chen Gong

Keywords Paper

0

0

0

0

14:56

08/12/2020

Misspelling Detection from Noisy Product Images

Varun Nagaraj Rao, Mingwei Shen

Keywords Paper

0

0

0

0

10:59

20/08/2020

A dependently typed calculus with pattern matching and erasure inference

Matúš Tejiščák

Keywords Paper

erasure, inference, dependent types

0

0

0

0

13:50

02/02/2021

Analysing the Noise Model Error for Realistic Noisy Label Data

Michael A. Hedderich, Dawei Zhu, Dietrich Klakow

Keywords Paper

0

0

0

0

15:11

02/06/2020

Hybrid Reasoning Over Large Knowledge Bases Using On-The-Fly Knowledge Extraction

Giorgos Stoilos, Damir Juric, Szymon Wartak and
Claudia Schulz, Mohammad Khodadadi

Keywords Paper

0

0

0

0

28:41

23/08/2020

Preserving dynamic attention for long-term spatial-temporal prediction

Haoxing Lin, Rufan Bai, Weijia Jia and
Xinyu Yang, Yongjian You

Keywords Paper

attention mechanism, long-term prediction, neural network, mining spatial-temporal information

0

0

0

0

15:03