Misspelling Detection from Noisy Product Images

08/12/2020

Misspelling Detection from Noisy Product Images

Varun Nagaraj Rao, Mingwei Shen

Keywords:

Abstract Paper Similar Papers

Abstract: Misspellings are introduced on products either due to negligence or as an attempt to deliberately deceive stakeholders. This leads to a revenue loss for online sellers and fosters customer mistrust. Existing spelling research has primarily focused on advancement in misspelling correction and the approach for misspelling detection has remained the use of a large dictionary. The dictionary lookup results in the incorrect detection of several non-dictionary words as misspellings. In this paper, we propose a method to automatically detect misspellings from product images in an attempt to reduce false positive detections. We curate a large scale corpus, define a rich set of features and propose a novel model that leverages importance weighting to account for within class distributional variance. Finally, we experimentally validate this approach on both the curated corpus and an out-of-domain public dataset and show that it leads to a relative improvement of up to 20% in F1 score. The approach thus creates a more robust, generalized deployable solution and reduces reliance on large scale custom dictionaries used today.

The video of this talk cannot be embedded. You can watch it here:

https://underline.io/lecture/6108-misspelling-detection-from-noisy-product-images

(Link will open in new window)

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at COLING Workshops 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

03/05/2021

No Cost Likelihood Manipulation at Test Time for Making Better Mistakes in Deep Networks

Shyamgopal Karthik, Ameya Prabhu, Puneet Dokania, Vineet Gandhi

Keywords Paper

Conditional Risk Minimization, Hierarchy-Aware Classification, Post-Hoc Correction

0

0

0

0

4:53

03/05/2021

Active Contrastive Learning of Audio-Visual Video Representations

Shuang Ma, Zhaoyang Zeng, Daniel McDuff, Yale Song

Keywords Paper

video recognition, audio-visual representation, self-supervised learning, active learning, contrastive representation learning

0

0

0

0

5:22

19/10/2020

Prospective modeling of users for online display advertising via deep time-aware model

Djordje Gligorijevic, Jelena Gligorijevic, Aaron Flores

Keywords Paper

time-aware prediction, prospective advertising, deep learning

0

0

0

0

8:57

22/09/2020

Doubly robust estimator for ranking metrics with post-click conversions

Yuta Saito

Keywords Paper

inverse propensity score., post-click conversions, ranking metrics, selection bias, doubly robust

0

0

0

0

3:19

04/07/2020

Moving Down the Long Tail of Word Sense Disambiguation with Gloss Informed Bi-encoders

Terra Blevins, Luke Zettlemoyer

Keywords Paper

Word Disambiguation, Word WSD, WSD, sense disambiguation

0

0

0

0

11:18

03/05/2021

Contemplating Real-World Object Classification

Ali Borji

Keywords Paper

Robustness, object recognition, deep learning, ObjectNet

0

0

0

0

5:12

19/04/2021

Reanalyzing the most probable sentence problem: A case study in explicating the role of entropy in algorithmic complexity

Eric Corlett, Gerald Penn

Keywords Paper

0

0

0

0

11:08

06/12/2021

Understanding the Limits of Unsupervised Domain Adaptation via Data Poisoning

Akshay Mehra, Bhavya Kailkhura, Pin-Yu Chen, Jihun Hamm

Keywords Paper

robustness, domain adaptation

0

0

0

0

13:34

04/07/2020

Learning Robust Models for e-Commerce Product Search

Thanh Nguyen, Nikhil Rao, Karthik Subbian

Keywords Paper

e-Commerce Search, Mitigating problem, ranking algorithms, deep model

0

0

0

0

7:34

03/05/2021

Tomographic Auto-Encoder: Unsupervised Bayesian Recovery of Corrupted Data

Francesco Tonolini, Pablo Garcia Moreno, Andreas Damianou, Roderick Murray-Smith

Keywords Paper

Missing value imputation, variational auto-encoders, variational inference

0

0

0

0

5:09

04/07/2020

Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation

Biao Zhang, Philip Williams, Ivan Titov, Rico Sennrich

Keywords Paper

Massively Translation, Zero-Shot Translation, neural translation, NMT

0

0

0

0

11:47

16/11/2020

Detecting Word Sense Disambiguation Biases in Machine Translation for Model-Agnostic Adversarial Attacks

Denis Emelin, Ivan Titov, Rico Sennrich

Keywords Paper

word disambiguation, nmt, prediction errors, adversarial strategy

0

0

0

0

12:57

06/12/2021

FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition

Yichong Leng, Xu Tan, Linchen Zhu and
Jin Xu, Renqian Luo, Linquan Liu, Tao Qin, Xiangyang Li, Edward Lin, Tie-Yan Liu

Keywords Paper

0

0

0

0

13:44

03/05/2021

Empirical Analysis of Unlabeled Entity Problem in Named Entity Recognition

Yangming Li, lemao liu, Shuming Shi

Keywords Paper

Negative Sampling, Unlabeled Entity Problem, Named Entity Recognition

0

0

0

1

4:49

02/02/2021

Robustness to Spurious Correlations in Text Classification via Automatically Generated Counterfactuals

Zhao Wang, Aron Culotta

Keywords Paper

0

0

0

0

17:39

19/04/2021

From toxicity in online comments to incivility in American news: Proceed with caution

Anushree Hede, Oshin Agarwal, Linda Lu and
Diana C. Mutz, Ani Nenkova

Keywords Paper

0

0

0

0

10:10

02/02/2021

Capturing Delayed Feedback in Conversion Rate Prediction via Elapsed-Time Sampling

Jia-Qi Yang, Xiang Li, Shuguang Han and
Tao Zhuang, De-Chuan Zhan, Xiaoyi Zeng, Bin Tong

Keywords Paper

0

0

0

0

14:48

03/08/2020

Regret Analysis of Bandit Problems with Causal Background Knowledge

Yangyi Lu, Amirhossein Meisami, Ambuj Tewari, William Yan

Keywords Paper

0

0

0

0

7:32

04/07/2020

ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations

Fernando Alva-Manchego, Louis Martin, Antoine Bordes and
Carolina Scarton, Benoît Sagot, Lucia Specia

Keywords Paper

Tuning Models, rewriting transformations, automatic simplification, splitting

0

0

0

0

12:11

23/08/2020

On sampled metrics for item recommendation

Walid Krichene, Steffen Rendle

Keywords Paper

item recommendation, sampled metric, evaluation, metrics

0

0

0

0

16:46

08/12/2020

Learning Domain Terms - Empirical Methods to Enhance Enterprise Text Analytics Performance

Gargi Roy, Lipika Dey, Mohammad Shakir, Tirthankar Dasgupta

Keywords Paper

0

0

0

0

14:36

08/12/2020

Infusing Sequential Information into Conditional Masked Translation Model with Self-Review Mechanism

Pan Xie, Zhi Cui, Xiuying Chen and
XiaoHui Hu, Jianwei Cui, Bin Wang

Keywords Paper

0

0

0

0

6:43

08/12/2020

Is it Great or Terrible? Preserving Sentiment in Neural Machine Translation of Arabic Reviews

Hadeel Saadany, Constantin Orasan

Keywords Paper

0

0

0

0

14:35

06/12/2020

Recursive Inference for Variational Autoencoders

Minyoung Kim, Vladimir Pavlovic

Keywords Paper

0

0

0

0

3:24

06/12/2021

On Component Interactions in Two-Stage Recommender Systems

Jiri Hron, Karl Krauth, Michael Jordan, Niki Kilbertus

Keywords Paper

bandits

0

0

0

0

13:25

19/04/2021

Multilingual neural machine translation with deep encoder and multiple shallow decoders

Xiang Kong, Adithya Renduchintala, James Cross and
Yuqing Tang, Jiatao Gu, Xian Li

Keywords Paper

0

0

0

0

10:26

12/07/2020

More Information Supervised Probabilistic Deep Face Embedding Learning

Ying Huang, Shangfeng Qiu, Wenwei Zhang and
Xianghui Luo, Jinzhuo Wang

Keywords Paper

Applications - Computer Vision

0

0

0

0

12:10

08/12/2020

Mitigating Silence in Compliance Terminology during Parsing of Utterances

Esme Manandise, Conrad de Peuter

Keywords Paper

0

0

0

0

17:48

05/04/2021

ByzShield: An Efficient and Robust System for Distributed Training

Konstantinos Konstantinidis, Aditya Ramamoorthy

Keywords Paper

0

0

0

0

17:51

14/09/2020

Calibrating user response predictions in online advertising

Chao Deng, Hao Wang, Qing Tan and
Jian Xu, Kun Gai

Keywords Paper

online advertising, calibration, click-through rate prediction, conversion rate prediction

0

0

0

0

9:01

19/04/2021

We need to talk about random splits

Anders Søgaard, Sebastian Ebert, Jasmijn Bastings, Katja Filippova

Keywords Paper

0

0

0

0

7:49

18/07/2021

Adapting to misspecification in contextual bandits with offline regression oracles

Sanath Kumar Krishnamurthy, Vitor Hadad, Susan Athey

Keywords Paper

Reinforcement Learning and Planning, Bandits

0

0

0

0

4:17

06/12/2021

The Out-of-Distribution Problem in Explainability and Search Methods for Feature Importance Explanations

Peter Hase, Harry Xie, Mohit Bansal

Keywords Paper

machine learning, interpretability

0

0

0

0

15:05

26/04/2020

Learning The Difference That Makes A Difference With Counterfactually-Augmented Data

Divyansh Kaushik, Eduard Hovy, Zachary Lipton

Keywords Paper

humans in the loop, annotation artifacts, text classification, sentiment analysis, natural language inference

0

0

0

0

4:25

25/07/2020

Sampling bias due to near-duplicates in learning to rank

Maik Fröbe, Janek Bevendorff, Jan Heinrich Reimer and
Martin Potthast, Matthias Hagen

Keywords Paper

near-duplicate-detection, selection bias, learning to rank, novelty principle

0

0

0

0

10:59

16/11/2020

Generationary or “How We Went beyond Word Sense Inventories and Learned to Gloss”

Michele Bevilacqua, Marco Maru, Roberto Navigli

Keywords Paper

generative modeling, definition modeling, discriminative tasks, word disambiguation

0

0

0

0

11:49

12/07/2020

When are Non-Parametric Methods Robust?

Robi Bhattacharjee, Kamalika Chaudhuri

Keywords Paper

Learning Theory

0

0

0

0

15:17

18/07/2021

Order-Agnostic Cross Entropy for Non-Autoregressive Machine Translation

Cunxiao Du, Zhaopeng Tu, Jing Jiang

Keywords Paper

Applications, Natural Language Processing

0

0

0

0

17:21

08/12/2020

Understanding the effects of word-level linguistic annotations in under-resourced neural machine translation

Víctor M. Sánchez-Cartagena, Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez

Keywords Paper

0

0

0

0

14:59

12/07/2020

Dynamic Knapsack Optimization Towards Efficient Multi-Channel Sequential Advertising

Xiaotian Hao, Zhaoqing Peng, Yi Ma and
Guan Wang, Junqi Jin, Jianye Hao, Shan Chen, Rongquan Bai, Mingzhou Xie, Miao Xu, Zhenzhe Zheng, Chuan Yu, HAN LI, Jian Xu, Kun Gai

Keywords Paper

Applications - Other

0

0

0

0

15:13