Handling out-of-vocabulary problem in hangeul word embeddings

19/04/2021

Handling out-of-vocabulary problem in hangeul word embeddings

Ohjoon Kwon, Dohyun Kim, Soo-Ryeon Lee, Junyoung Choi, SangKeun Lee

Keywords:

Abstract Paper Similar Papers

Abstract: Word embedding is considered an essential factor in improving the performance of various Natural Language Processing (NLP) models. However, it is hardly applicable in real-world datasets as word embedding is generally studied with a well-refined corpus. Notably, in Hangeul (Korean writing system), which has a unique writing system, various kinds of Out-Of-Vocabulary (OOV) appear from typos. In this paper, we propose a robust Hangeul word embedding model against typos, while maintaining high performance. The proposed model utilizes a Convolutional Neural Network (CNN) architecture with a channel attention mechanism that learns to infer the original word embeddings. The model train with a dataset that consists of a mix of typos and correct words. To demonstrate the effectiveness of the proposed model, we conduct three kinds of intrinsic and extrinsic tasks. While the existing embedding models fail to maintain stable performance as the noise level increases, the proposed model shows stable performance.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at EACL 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

12/07/2020

Robustness to Programmable String Transformations via Augmented Abstract Training

Yuhao Zhang, Aws Albarghouthi, Loris D'Antoni

Keywords Paper

Adversarial Examples

0

0

0

0

14:49

19/04/2021

Evaluating neural model robustness for machine comprehension

Winston Wu, Dustin Arendt, Svitlana Volkova

Keywords Paper

0

0

0

0

11:41

08/12/2020

Multi-task Learning of Spoken Language Understanding by Integrating N-Best Hypotheses with Hierarchical Attention

Mingda Li, Xinyue Liu, Weitong Ruan and
Luca Soldaini, Wael Hamza, Chengwei Su

Keywords Paper

0

0

0

0

14:43

02/02/2021

Decision-Guided Weighted Automata Extraction from Recurrent Neural Networks

Xiyue Zhang, Xiaoning Du, Xiaofei Xie and
Lei Ma, Yang Liu, Meng Sun

Keywords Paper

0

0

0

0

16:44

04/07/2020

Cross-Lingual Unsupervised Sentiment Classification with Multi-View Transfer Learning

Hongliang Fei, Ping Li

Keywords Paper

Cross-Lingual Classification, sentiment classification, unsupervised system, classification

0

0

0

0

12:23

06/12/2021

Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation

Jixuan Wang, Kuan-Chieh Wang, Frank Rudzicz, Michael Brudno

Keywords Paper

machine learning, transformers, meta learning, language, transfer learning

0

0

0

0

14:45

06/12/2021

Learning to Generate Visual Questions with Noisy Supervision

Shen Kai, Lingfei Wu, Siliang Tang and
Yueting Zhuang, zhen he, Zhuoye Ding, Yun Xiao, Bo Long

Keywords Paper

generative model

0

0

0

0

14:54

04/07/2020

On the Linguistic Representational Power of Neural Machine Translation Models

Yonatan Belinkov, Nadir Durrani, Fahim Dalvi and
Hassan Sajjad, James Glass

Keywords Paper

Linguistic Models, natural processing, artificial intelligence, translating languages

0

0

0

0

19:17

02/02/2021

Do Response Selection Models Really Know What’s Next? Utterance Manipulation Strategies for Multi-turn Response Selection

Taesun Whang, Dongyub Lee, Dongsuk Oh and
Chanhee Lee, Kijong Han, Dong-hun Lee, Saebyeok Lee

Keywords Paper

0

0

0

0

17:37

06/12/2021

Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning

Colin Wei, Sang Michael Xie, Tengyu Ma

Keywords Paper

theory, machine learning, self-supervised learning, generative model, representation learning, language

0

0

0

0

14:53

16/11/2020

Calibrated Language Model Fine-Tuning for In- and Out-of-Distribution Data

Lingkai Kong, Haoming Jiang, Yuchen Zhuang and
Jie Lyu, Tuo Zhao, Chao Zhang

Keywords Paper

augmented training, in-distribution calibration, text classification, expectation error

0

0

0

0

11:47

08/12/2020

ContraCAT: Contrastive Coreference Analytical Templates for Machine Translation

Dario Stojanovski, Benno Krojer, Denis Peskov, Alexander Fraser

Keywords Paper

0

0

0

0

14:09

02/02/2021

IsoBN: Fine-Tuning BERT with Isotropic Batch Normalization

Wenxuan Zhou, Bill Yuchen Lin, Xiang Ren

Keywords Paper

0

0

0

0

16:25

04/07/2020

Towards Robustifying NLI Models Against Lexical Dataset Biases

Xiang Zhou, Mohit Bansal

Keywords Paper

Natural Inference, data augmentation, Robustifying Models, deep models

0

0

0

0

11:34

08/12/2020

Emergent Communication Pretraining for Few-Shot Machine Translation

Yaoyiran Li, Edoardo Maria Ponti, Ivan Vulić, Anna Korhonen

Keywords Paper

0

0

0

0

14:42

04/07/2020

SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization

Haoming Jiang, Pengcheng He, Weizhu Chen and
Xiaodong Liu, Jianfeng Gao, Tuo Zhao

Keywords Paper

NLP, generalization, NLP tasks, SMART

0

0

0

0

11:43

06/12/2021

How Should Pre-Trained Language Models Be Fine-Tuned Towards Adversarial Robustness?

Xinshuai Dong, Anh Tuan Luu, Min Lin and
Shuicheng Yan, Hanwang Zhang

Keywords Paper

robustness, adversarial robustness and security, language

0

0

0

0

10:26

04/07/2020

Rigid Formats Controlled Text Generation

Piji Li, Haisong Zhang, Xiaojiang Liu, Shuming Shi

Keywords Paper

Rigid Generation, Neural generation, text generation, rhyming schemes

0

0

0

0

12:06

06/12/2021

Refining Language Models with Compositional Explanations

Huihan Yao, Ying Chen, Qinyuan Ye and
Xisen Jin, Xiang Ren

Keywords Paper

machine learning, fairness, language

0

0

0

0

13:17

02/02/2021

MASKER: Masked Keyword Regularization for Reliable Text Classification

Seung Jun Moon, Sangwoo Mo, Kimin Lee and
Jaeho Lee, Jinwoo Shin

Keywords Paper

0

0

0

0

15:05

04/07/2020

End-to-End Bias Mitigation by Modelling Biases in Corpora

Rabeeh Karimi Mahabadi, Yonatan Belinkov, James Henderson

Keywords Paper

End-to-End Mitigation, real-world scenarios, training, large-scale benchmarks

0

0

0

0

10:57

04/07/2020

How does BERT's attention change when you fine-tune? An analysis methodology and a case study in negation scope

Yiyun Zhao, Steven Bethard

Keywords Paper

downstream task, NLP problems, knowledge-related tasks, downstream tasks

0

0

0

0

11:43

12/07/2020

Educating Text Autoencoders: Latent Representation Guidance via Denoising

Tianxiao Shen, Jonas Mueller, Regina Barzilay, Tommi Jaakkola

Keywords Paper

Deep Learning - Generative Models and Autoencoders

0

0

0

0

17:06

03/05/2021

Modelling Hierarchical Structure between Dialogue Policy and Natural Language Generator with Option Framework for Task-oriented Dialogue System

Jianhong Wang, Yuan Zhang, Tae-Kyun Kim, Yunjie Gu

Keywords Paper

Task-oriented Dialogue System, Hierarchical Reinforcement Learning, Policy Optimization, Natural Language Processing

0

0

0

0

5:44

16/11/2020

Named Entity Recognition Only from Word Embeddings

Ying Luo, Hai Zhao, Junlang Zhan

Keywords Paper

named recognition, entity detection, type prediction, deep models

0

0

0

0

9:54

03/05/2021

InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective

Boxin Wang, Shuohang Wang, Yu Cheng and
Zhe Gan, Ruoxi Jia, Bo Li, Jingjing Liu

Keywords Paper

adversarial training, QA, NLI, BERT, information theory, adversarial robustness

0

0

0

0

5:21

16/11/2020

BERT-ATTACK: Adversarial Attack Against BERT Using BERT

Linyang Li, Ruotian Ma, Qipeng Guo and
Xiangyang Xue, Xipeng Qiu

Keywords Paper

adversarial attacks, downstream tasks, calculation, gradient-based methods

0

0

0

0

11:36

16/11/2020

Multi-resolution Annotations for Emoji Prediction

Weicheng Ma, Ruibo Liu, Lili Wang, Soroush Vosoughi

Keywords Paper

natural tasks, emojis, linguistic components, multi-class setting

0

0

0

0

11:52

04/07/2020

Does Multi-Encoder Help? A Case Study on Context-Aware Neural Machine Translation

Bei Li, Hui Liu, Ziyang Wang and
Yufan Jiang, Tong Xiao, Jingbo Zhu, Tongran Liu, Changliang Li

Keywords Paper

Context-Aware Translation, document-level translation, document-level NMT, document-level

0

0

0

0

6:42

02/02/2021

Token-Aware Virtual Adversarial Training in Natural Language Understanding

Linyang Li, Xipeng Qiu

Keywords Paper

0

0

0

0

12:49

04/07/2020

Max-Margin Incremental CCG Parsing

Miloš Stanojević, Mark Steedman

Keywords Paper

Incremental parsing, human processing, ASR, MT

0

0

0

0

11:39

06/12/2021

NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM

Connor Holmes, Minjia Zhang, Yuxiong He, Bo Wu

Keywords Paper

optimization, transformers, language

0

0

0

0

10:53

02/02/2021

Revisiting Mahalanobis Distance for Transformer-Based Out-of-Domain Detection

Alexander Podolskiy, Dmitry Lipin, Andrey Bout and
Ekaterina Artemova, Irina Piontkovskaya

Keywords Paper

0

0

0

0

16:08

04/07/2020

Spelling Error Correction with Soft-Masked BERT

Shaohua Zhang, Haoran Huang, Jicong Liu, Hang Li

Keywords Paper

Spelling Correction, Chinese correction, Chinese CSC, error detection

0

0

0

0

11:34

19/04/2021

Generating syntactically controlled paraphrases without using annotated parallel pairs

Kuan-Hao Huang, Kai-Wei Chang

Keywords Paper

0

0

0

1

10:41

03/05/2021

FairFil: Contrastive Neural Debiasing Method for Pretrained Text Encoders

Pengyu Cheng, Weituo Hao, Siyang Yuan and
Shijing Si, Lawrence Carin

Keywords Paper

Mutual Information, Pretrained Text Encoders, Contrastive Learning, Fairness

0

0

0

0

4:43

26/04/2020

Residual Energy-Based Models for Text Generation

Yuntian Deng, Anton Bakhtin, Myle Ott and
Arthur Szlam, Marc'Aurelio Ranzato

Keywords Paper

energy-based models, text generation

0

0

0

0

4:59

26/04/2020

Improving Neural Language Generation with Spectrum Control

Lingxiao Wang, Jing Huang, Kevin Huang and
Ziniu Hu, Guangtao Wang, Quanquan Gu

Keywords Paper

0

0

0

0

4:58

06/12/2020

A Benchmark for Systematic Generalization in Grounded Language Understanding

Laura Ruis, Jacob Andreas, Marco Baroni and
Diane Bouchacourt, Brenden Lake

Keywords Paper

0

0

0

0

3:15

04/07/2020

Parallel Corpus Filtering via Pre-trained Language Models

Boliang Zhang, Ajay Nagesh, Kevin Knight

Keywords Paper

machine models, WMT task, Parallel Filtering, Pre-trained Models

0

0

0

0

12:04