hinglishNorm - A Corpus of Hindi-English Code Mixed Sentences for Text Normalization

08/12/2020

hinglishNorm - A Corpus of Hindi-English Code Mixed Sentences for Text Normalization

Piyush Makhija, Ankit Kumar, Anuj Gupta

Keywords:

Abstract Paper Similar Papers

Abstract: We present hinglishNorm - a human annotated corpus of Hindi-English code-mixed sentences for text normalization task. Each sentence in the corpus is aligned to its corresponding human annotated normalized form. To the best of our knowledge, there is no corpus of Hindi-English code-mixed sentences for text normalization task that is publicly available. Our work is the first attempt in this direction. The corpus contains 13494 segments annotated for text normalization. Further, we present baseline normalization results on this corpus. We obtain a Word Error Rate (WER) of 15.55, BiLingual Evaluation Understudy (BLEU) score of 71.2, and Metric for Evaluation of Translation with Explicit ORdering (METEOR) score of 0.50.

The video of this talk cannot be embedded. You can watch it here:

https://underline.io/lecture/6109-hinglishnorm---a-corpus-of-hindi-english-code-mixed-sentences-for-text-normalization

(Link will open in new window)

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at COLING Workshops 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

08/12/2020

SpanAlign: Sentence Alignment Method based on Cross-Language Span Prediction and ILP

Katsuki Chousa, Masaaki Nagata, Masaaki Nishino

Keywords Paper

0

0

0

0

14:39

08/12/2020

Exploring Cross-sentence Contexts for Named Entity Recognition with BERT

Jouni Luoma, Sampo Pyysalo

Keywords Paper

0

0

0

0

14:39

04/07/2020

Facet-Aware Evaluation for Extractive Summarization

Yuning Mao, Liyuan Liu, Qi Zhu and
Xiang Ren, Jiawei Han

Keywords Paper

Facet-Aware Evaluation, Extractive Summarization, fine-grained evaluation, comparative analysis

0

0

0

0

11:43

16/11/2020

Multilingual AMR-to-Text Generation

Angela Fan, Claire Gardent

Keywords Paper

multilingual generation, cross-lingual embeddings, pretraining, multilingual models

0

0

0

0

12:06

19/04/2021

A large-scale evaluation of neural machine transliteration for indic languages

Anoop Kunchukuttan, Siddharth Jain, Rahul Kejriwal

Keywords Paper

0

0

0

0

7:33

06/12/2021

BARTScore: Evaluating Generated Text as Text Generation

Weizhe Yuan, Graham Neubig, Pengfei Liu

Keywords Paper

0

0

0

0

13:47

04/07/2020

A Simple and Effective Unified Encoder for Document-Level Machine Translation

Shuming Ma, Dongdong Zhang, Ming Zhou

Keywords Paper

Document-Level Translation, Unified Encoder, encoders, pre-training models

0

0

0

0

7:04

19/08/2021

Exemplification Modeling: Can You Give Me an Example, Please?

Edoardo Barba, Luigi Procopio, Caterina Lacerra and
Tommaso Pasini, Roberto Navigli

Keywords Paper

Natural Language Processing, Natural Language Semantics, Resources and Evaluation

0

0

0

0

14:47

05/12/2020

STIL - simultaneous slot filling, translation, intent classification, and language identification: Initial results using mBART on MultiATIS++

Jack FitzGerald

Keywords Paper

0

0

0

0

9:46

04/07/2020

Self-Attention with Cross-Lingual Position Representation

Liang Ding, Longyue Wang, Dacheng Tao

Keywords Paper

natural tasks, WMT'17 tasks, Cross-Lingual Representation, Position encoding

0

0

0

0

7:46

02/02/2021

Listen, Understand and Translate: Triple Supervision Decouples End-to-end Speech-to-text Translation

Qianqian Dong, Rong Ye, Mingxuan Wang and
Hao Zhou, Shuang Xu, Bo Xu, Lei Li

Keywords Paper

0

0

0

0

14:09

19/04/2021

Towards a decomposable metric for explainable evaluation of text generation from AMR

Juri Opitz, Anette Frank

Keywords Paper

0

0

0

0

11:02

26/04/2020

Neural Machine Translation with Universal Visual Representation

Zhuosheng Zhang, Kehai Chen, Rui Wang and
Masao Utiyama, Eiichiro Sumita, Zuchao Li, Hai Zhao

Keywords Paper

Neural Machine Translation, Visual Representation, Multimodal Machine Translation, Language Representation

0

0

0

0

4:50

19/04/2021

Language models for lexical inference in context

Martin Schmitt, Hinrich Schütze

Keywords Paper

0

0

0

0

11:39

19/04/2021

Multilingual entity and relation extraction dataset and model

Alessandro Seganti, Klaudia Firląg, Helena Skowronska and
Michał Satława, Piotr Andruszkiewicz

Keywords Paper

0

0

0

0

10:48

19/04/2021

WER-BERT: Automatic WER estimation with BERT in a balanced ordinal classification paradigm

Akshay Krishna Sheshadri, Anvesh Rao Vijjini, Sukhdeep Kharbanda

Keywords Paper

0

0

0

0

11:45

04/07/2020

Unsupervised Multilingual Sentence Embeddings for Parallel Corpus Mining

Ivana Kvapilíková, Mikel Artetxe, Gorka Labaka and
Eneko Agirre, Ondřej Bojar

Keywords Paper

Unsupervised Embeddings, Parallel Mining, multilingual embeddings, parallel tasks

0

0

0

0

11:30

04/07/2020

Unsupervised Cross-lingual Representation Learning at Scale

Alexis Conneau, Kartikay Khandelwal, Naman Goyal and
Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, Veselin Stoyanov

Keywords Paper

cross-lingual tasks, XNLI, MLQA, NER

0

0

0

0

12:15

02/02/2021

UWSpeech: Speech to Speech Translation for Unwritten Languages

Chen Zhang, Xu Tan, Yi Ren and
Tao Qin, Kejun Zhang, Tie-Yan Liu

Keywords Paper

0

0

0

0

15:14

04/07/2020

Automatic Machine Translation Evaluation using Source Language Inputs and Cross-lingual Language Model

Kosuke Takahashi, Katsuhito Sudoh, Satoshi Nakamura

Keywords Paper

Automatic Evaluation, machine translation, Cross-lingual Model, regression model

0

0

0

0

7:17

02/02/2021

Consecutive Decoding for Speech-to-text Translation

Qianqian Dong, Mingxuan Wang, Hao Zhou and
Shuang Xu, Bo Xu, Lei Li

Keywords Paper

0

0

0

0

14:20

16/11/2020

BLiMP: The Benchmark of Linguistic Minimal Pairs for English

Alex Warstadt, Alicia Parrish, Haokun Liu and
Anhad Monananey, Wei Peng, Sheng-Fu Wang, Samuel Bowman

Keywords Paper

linguistic, blimp, lms, linguist-crafted templates

0

0

0

0

11:42

04/07/2020

Investigating the effect of auxiliary objectives for the automated grading of learner English speech transcriptions

Hannah Craighead, Andrew Caines, Paula Buttery, Helen Yannakoudakis

Keywords Paper

automated transcriptions, automatically speech, multi-task learning, inductive transfer

0

0

0

0

11:37

16/11/2020

The Secret is in the Spectra: Predicting Cross-lingual Task Performance with Spectral Similarity Measures

Haim Dubossarsky, Ivan Vulić, Roi Reichart, Anna Korhonen

Keywords Paper

cross-lingual tasks, large-scale study, bli, parsing

0

0

0

0

12:18

16/11/2020

Automatic Machine Translation Evaluation in Many Languages via Zero-Shot Paraphrasing

Brian Thompson, Matt Post

Keywords Paper

machine evaluation, zero-shot task, wmt task, quality estimation

0

0

0

0

11:02

08/12/2020

A Human Evaluation of AMR-to-English Generation Systems

Emma Manning, Shira Wein, Nathan Schneider

Keywords Paper

0

0

0

0

15:12

04/07/2020

Sentence Meta-Embeddings for Unsupervised Semantic Textual Similarity

Nina Poerner, Ulli Waltinger, Hinrich Schütze

Keywords Paper

Unsupervised Similarity, unsupervised STS, dimensionality reduction, pre-trained encoders

0

0

0

0

7:14

01/07/2020

BIT’s system for the AutoSimTrans 2020

Minqin Li, Haodong Cheng, Yuanjie Wang and
Sijia Zhang, Liting Wu, Yuhang Guo

Keywords Paper

0

0

0

0

9:19

04/07/2020

Multimodal and Multiresolution Speech Recognition with Transformers

Georgios Paraskevopoulos, Srinivas Parthasarathy, Aparna Khare, Shiva Sundaram

Keywords Paper

Multimodal Recognition, ASR, multiresolution ASR, Transformers

0

0

0

0

6:48

16/11/2020

The role of context in neural pitch accent detection in English

Elizabeth Nielsen, Mark Steedman, Sharon Goldwater

Keywords Paper

pitch detection, cnn-based model, phenomena, contrast

0

0

0

0

6:41

08/12/2020

PASTRIE: A Corpus of Prepositions Annotated with Supersense Tags in Reddit International English

Michael Kranzlein, Emma Manning, Siyao Peng and
Shira Wein, Aryaman Arora, Nathan Schneider

Keywords Paper

0

0

0

0

13:28

04/07/2020

How Accents Confound: Probing for Accent Information in End-to-End Speech Recognition Systems

Archiki Prasad, Preethi Jyothi

Keywords Paper

Probing, Accent Information, End-to-End Systems, end-to-end system

0

0

0

0

11:41

16/11/2020

XGLUE: A New Benchmark Datasetfor Cross-lingual Pre-training, Understanding and Generation

Yaobo Liang, Nan Duan, Yeyun Gong and
Ning Wu, Fenfei Guo, Weizhen Qi, Ming Gong, Linjun Shou, Daxin Jiang, Guihong Cao, Xiaodong Fan, Ruofei Zhang, Rahul Agrawal, Edward Cui, Sining Wei, Taroon Bharti, Ying Qiao, Jiun-Hung Chen, Winnie Wu, Shuguang Liu, Fan Yang, Daniel Campos, Rangan Majumder, Ming Zhou

Keywords Paper

large-scale models, cross-lingual tasks, natural tasks, cross-lingual pre-training

0

0

0

0

10:06

08/12/2020

Morphologically Aware Word-Level Translation

Paula Czarnowska, Sebastian Ruder, Ryan Cotterell, Ann Copestake

Keywords Paper

0

0

0

0

14:12

16/11/2020

A Supervised Word Alignment Method based on Cross-Language Span Prediction using Multilingual BERT

Masaaki Nagata, Katsuki Chousa, Masaaki Nishino

Keywords Paper

cross-language prediction, word problem, squad task, alignment

0

0

0

0

11:13

04/07/2020

SpanBERT: Improving Pre-training by Representing and Predicting Spans

Mandar Joshi, Danqi Chen, Yinhan Liu and
Daniel S. Weld, Luke Zettlemoyer, Omer Levy

Keywords Paper

span tasks, question answering, coreference resolution, OntoNotes task

0

0

0

0

14:14

08/12/2020

AlphaMWE: Construction of Multilingual Parallel Corpora with MWE Annotations

Lifeng Han, Gareth Jones, Alan Smeaton

Keywords Paper

0

0

0

0

14:26

04/07/2020

Dynamic Programming Encoding for Subword Segmentation in Neural Machine Translation

Xuanli He, Gholamreza Haffari, Mohammad Norouzi

Keywords Paper

Subword Segmentation, Neural Translation, learning, inference

0

0

0

0

10:49

16/11/2020

Multilingual Denoising Pre-training for Neural Machine Translation

Jiatao Gu, Yinhan Liu, Naman Goyal and
Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer

Keywords Paper

machine tasks, pre-training, multilingual pre-training, mbart

0

0

0

0

10:32

01/07/2020

IlliniMet: Illinois System for Metaphor Detection with Contextual and Linguistic Information

Hongyu Gong, Kshitij Gupta, Akriti Jain, Suma Bhat

Keywords Paper

0

0

0

0

4:09