Entity Enhanced BERT Pre-training for Chinese NER

16/11/2020

Entity Enhanced BERT Pre-training for Chinese NER

Chen Jia, Yuefeng Shi, Qinrong Yang, Yue Zhang

Keywords: chinese ner, pre-training, ner fine-tuning, ner

Abstract Paper Similar Papers

Abstract: Character-level BERT pre-trained in Chinese suffers a limitation of lacking lexicon information, which shows effectiveness for Chinese NER. To integrate the lexicon into pre-trained LMs for Chinese NER, we investigate a semi-supervised entity enhanced BERT pre-training method. In particular, we first extract an entity lexicon from the relevant raw text using a new-word discovery method. We then integrate the entity information into BERT using Char-Entity-Transformer, which augments the self-attention using a combination of character and entity representations. In addition, an entity classification task helps inject the entity information into model parameters in pre-training. The pre-trained models are used for NER fine-tuning. Experiments on a news dataset and two datasets annotated by ourselves for NER in long-text show that our method is highly effective and achieves the best results.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at EMNLP 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

04/07/2020

Spelling Error Correction with Soft-Masked BERT

Shaohua Zhang, Haoran Huang, Jicong Liu, Hang Li

Keywords Paper

Spelling Correction, Chinese correction, Chinese CSC, error detection

0

0

0

0

11:34

04/07/2020

A Complete Shift-Reduce Chinese Discourse Parser with Robust Dynamic Oracle

Shyh-Shiun Hung, Hen-Hsen Huang, Hsin-Hsi Chen

Keywords Paper

Chinese parsing, rhetorical recognition, Shift-Reduce Parser, Robust Oracle

0

0

0

0

6:26

16/11/2020

Multi-Stage Pre-training for Automated Chinese Essay Scoring

Wei Song, Kai Zhang, Ruiji Fu and
Lizhen Liu, Ting Liu, Miaomiao Cheng

Keywords Paper

supervised fine-tuning, pre-training method, weakly pre-training, essay scorer

0

0

0

0

10:45

04/07/2020

Enhancing Pre-trained Chinese Character Representation with Word-aligned Attention

Yanzeng Li, Bowen Yu, Xue Mengge, Tingwen Liu

Keywords Paper

segmentation propagation, Pre-trained Representation, Chinese models, word-aligned attention

0

0

0

0

6:13

04/07/2020

Joint Chinese Word Segmentation and Part-of-speech Tagging via Two-way Attentions of Auto-analyzed Knowledge

Yuanhe Tian, Yan Song, Xiang Ao and
Fei Xia, Xiaojun Quan, Tong Zhang, Yonggang Wang

Keywords Paper

Chinese Segmentation, Part-of-speech Tagging, Chinese processing, joint tagging

0

0

0

0

11:53

05/01/2021

Handwritten Chinese Font Generation With Collaborative Stroke Refinement

Chuan Wen, Yujie Pan, Jie Chang and
Ya Zhang, Siheng Chen, Yanfeng Wang, Mei Han, Qi Tian

Keywords Paper

0

0

0

0

5:01

30/11/2020

Self-supervised Learning of Orc-Bert Augmentator for Recognizing Few-Shot Oracle Characters

Wenhui Han, Xinlin Ren, Hangyu Lin and
Yanwei Fu, Xiangyang Xue

Keywords Paper

0

0

0

0

7:38

05/12/2020

UnihanLM: Coarse-to-fine Chinese-Japanese language model pretraining with the unihan database

Canwen Xu, Tao Ge, Chenliang Li, Furu Wei

Keywords Paper

0

0

0

0

8:52

04/07/2020

Simplify the Usage of Lexicon in Chinese NER

Ruotian Ma, Minlong Peng, Qi Zhang and
Zhongyu Wei, Xuanjing Huang

Keywords Paper

Chinese recognition, NER, Lattice-LSTM, complex architecture

0

0

0

0

11:07

05/12/2020

Chinese grammatical correction using BERT-based pre-trained model

Hongfei Wang, Michiki Kurosawa, Satoru Katsumata, Mamoru Komachi

Keywords Paper

0

0

0

0

8:39

04/07/2020

Camouflaged Chinese Spam Content Detection with Semi-supervised Generative Active Learning

Zhuoren Jiang, Zhe Gao, Yu Duan and
Yangyang Kang, Changlong Sun, Qiong Zhang, Xiaozhong Liu

Keywords Paper

Camouflaged Detection, text problems, Chinese task, annotation

0

0

0

0

6:48

02/02/2021

FontRL: Chinese Font Synthesis via Deep Reinforcement Learning

Yitian Liu, Zhouhui Lian

Keywords Paper

0

0

0

0

13:49

06/12/2020

MPNet: Masked and Permuted Pre-training for Language Understanding

Kaitao Song, Xu Tan, Tao Qin and
Jianfeng Lu, Tie-Yan Liu

Keywords Paper

0

0

0

0

3:23

04/07/2020

A Graph-based Model for Joint Chinese Word Segmentation and Dependency Parsing

Hang Yan, Xipeng Qiu, Xuanjing Huang

Keywords Paper

Joint Segmentation, Joint Parsing, Chinese segmentation, dependency parsing

0

0

0

0

8:15

08/12/2020

A Sentence Cloze Dataset for Chinese Machine Reading Comprehension

Yiming Cui, Ting Liu, Ziqing Yang and
Zhipeng Chen, Wentao Ma, Wanxiang Che, Shijin Wang, Guoping Hu

Keywords Paper

0

0

0

0

10:02

02/02/2021

Ideography Leads Us to the Field of Cognition: A Radical-Guided Associative Model for Chinese Text Classification

Hanqing Tao, Shiwei Tong, Kun Zhang and
Tong Xu, Qi Liu, Enhong Chen, Min Hou

Keywords Paper

0

0

0

0

14:26

08/12/2020

Does Chinese BERT Encode Word Structure?

Yile Wang, Leyang Cui, Yue Zhang

Keywords Paper

0

0

0

0

9:38

04/07/2020

2kenize: Tying Subword Sequences for Chinese Script Conversion

- Pranav A, Isabelle Augenstein

Keywords Paper

Chinese Conversion, Chinese NLP, mapping sequences, topic classification

0

0

0

0

10:32

19/04/2021

CLiMP: A benchmark for Chinese language model evaluation

Beilei Xiang, Changbing Yang, Yu Li and
Alex Warstadt, Katharina Kann

Keywords Paper

0

0

0

0

7:04

08/12/2020

Synonym Knowledge Enhanced Reader for Chinese Idiom Reading Comprehension

Siyu Long, Ran Wang, Kun Tao and
Jiali Zeng, Xinyu Dai

Keywords Paper

0

0

0

0

9:58

04/07/2020

Improving Disfluency Detection by Self-Training a Self-Attentive Model

Paria Jamshid Lou, Mark Johnson

Keywords Paper

Disfluency Detection, joint parsing, Self-Attentive Model, Self-attentive parsers

0

0

0

0

12:37

08/12/2020

Try to Substitute: An Unsupervised Chinese Word Sense Disambiguation Method Based on HowNet

Bairu Hou, Fanchao Qi, Yuan Zang and
Xurui Zhang, Zhiyuan Liu, Maosong Sun

Keywords Paper

0

0

0

0

7:54

01/07/2020

A Deep Reinforced Model for Zero-Shot Cross-Lingual Summarization with Bilingual Semantic Similarity Rewards

Zi-Yi Dou, Sachin Kumar, Yulia Tsvetkov

Keywords Paper

0

0

0

0

4:35

19/04/2021

Don’t change me! User-controllable selective paraphrase generation

Mohan Zhang, Luchen Tan, Zihang Fu and
Kun Xiong, Jimmy Lin, Ming Li, Zhengkai Tu

Keywords Paper

0

0

0

0

6:03

08/12/2020

Monolingual and Multilingual Reduction of Gender Bias in Contextualized Representations

Sheng Liang, Philipp Dufter, Hinrich Schütze

Keywords Paper

0

0

0

0

14:20

02/02/2021

StrokeGAN: Reducing Mode Collapse in Chinese Font Generation via Stroke Encoding

Jinshan Zeng, Qi Chen, Yunxin Liu and
Mingwen Wang, Yuan Yao

Keywords Paper

0

0

0

0

17:02

04/07/2020

Improving Chinese Word Segmentation with Wordhood Memory Networks

Yuanhe Tian, Yan Song, Fei Xia and
Tong Zhang, Yonggang Wang

Keywords Paper

Chinese Segmentation, character-based segmenters, cross-domain experiments, Wordhood Networks

0

0

0

0

11:35

04/07/2020

Coupling Distant Annotation and Adversarial Training for Cross-Domain Chinese Word Segmentation

Ning Ding, Dingkun Long, Guangwei Xu and
Muhua Zhu, Pengjun Xie, Xiaobin Wang, Haitao Zheng

Keywords Paper

Coupling Annotation, Cross-Domain Segmentation, Chinese segmentation, Chinese CWS

0

0

0

0

8:35

22/06/2020

How Context Affects Language Models' Factual Predictions

Fabio Petroni, Patrick Lewis, Aleksandra Piktus and
Tim Rocktäschel, Yuxiang Wu, Alexander H. Miller, Sebastian Riedel

Keywords Paper

0

0

0

0

10:16

02/02/2021

Dynamic Modeling Cross- and Self-Lattice Attention Network for Chinese NER

Shan Zhao, Minghao Hu, Zhiping Cai and
Haiwen Chen, Fang Liu

Keywords Paper

0

0

0

0

15:11

08/12/2020

Joint Chinese Word Segmentation and Part-of-speech Tagging via Multi-channel Attention of Character N-grams

Yuanhe Tian, Yan Song, Fei Xia

Keywords Paper

0

0

0

0

14:53

19/04/2021

ENPAR:enhancing entity and entity pair representations for joint entity relation extraction

Yijun Wang, Changzhi Sun, Yuanbin Wu and
Hao Zhou, Lei Li, Junchi Yan

Keywords Paper

0

0

0

0

7:23

02/02/2021

LET: Linguistic Knowledge Enhanced Graph Transformer for Chinese Short Text Matching

Boer Lyu, Lu Chen, Su Zhu, Kai Yu

Keywords Paper

0

0

0

0

15:57

04/07/2020

Span Selection Pre-training for Question Answering

Michael Glass, Alfio Gliozzo, Rishav Chakravarti and
Anthony Ferritto, Lin Pan, G P Shrivatsa Bhargav, Dinesh Garg, Avi Sil

Keywords Paper

Question Answering, language tasks, Next Prediction, pre-training task

0

0

0

0

13:16

02/02/2021

Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution

Jiapeng Wang, Chongyu Liu, Lianwen Jin and
Guozhi Tang, Jiaxin Zhang, Shuaitao Zhang, Qianying Wang, Yaqiang Wu, Mingxiang Cai

Keywords Paper

0

0

0

0

16:18

16/11/2020

A Joint Multiple Criteria Model in Transfer Learning for Cross-domain Chinese Word Segmentation

Kaiyu Huang, Degen Huang, Zhuang Liu, Fengran Mo

Keywords Paper

natural, chinese segmentation, chinese, chinese tasks

0

0

0

0

10:49

04/07/2020

SpellGCN: Incorporating Phonological and Visual Similarities into Language Models for Chinese Spelling Check

Xingyi Cheng, Weidi Xu, Kunlong Chen and
Shaohua Jiang, Feng Wang, Taifeng Wang, Wei Chu, Yuan Qi

Keywords Paper

Chinese Check, spelling errors, spelling language, CSC

0

0

0

0

10:27

01/07/2020

Robust Neural Machine Translation with ASR Errors

Haiyang Xue, Yang Feng, Shuhao Gu, Wei Chen

Keywords Paper

0

0

0

0

8:15

04/07/2020

Encoder-Decoder Models Can Benefit from Pre-trained Masked Language Models in Grammatical Error Correction

Masahiro Kaneko, Masato Mita, Shun Kiyono and
Jun Suzuki, Kentaro Inui

Keywords Paper

Grammatical Correction, GEC, Encoder-Decoder Models, Pre-trained Models

0

0

0

0

6:44

04/07/2020

On the Robustness of Language Encoders against Grammatical Errors

Fan Yin, Quanyu Long, Tao Meng, Kai-Wei Chang

Keywords Paper

downstream applications, linguistic task, Language Encoders, pre-trained encoders

0

0

0

0

11:09