CLiMP: A benchmark for Chinese language model evaluation

19/04/2021

CLiMP: A benchmark for Chinese language model evaluation

Beilei Xiang, Changbing Yang, Yu Li, Alex Warstadt, Katharina Kann

Keywords:

Abstract Paper Similar Papers

Abstract: Linguistically informed analyses of language models (LMs) contribute to the understanding and improvement of such models. Here, we introduce the corpus of Chinese linguistic minimal pairs (CLiMP) to investigate what knowledge Chinese LMs acquire. CLiMP consists of sets of 1000 minimal pairs (MPs) for 16 syntactic contrasts in Chinese, covering 9 major Chinese linguistic phenomena. The MPs are semi-automatically generated, and human agreement with the labels in CLiMP is 95.8%. We evaluate 11 different LMs on CLiMP, covering n-grams, LSTMs, and Chinese BERT. We find that classifier–noun agreement and verb complement selection are the phenomena that models generally perform best at. However, models struggle the most with the ba construction, binding, and filler-gap dependencies. Overall, Chinese BERT achieves an 81.8% average accuracy, while the performances of LSTMs and 5-grams are only moderately above chance level.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at EACL 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

04/07/2020

A Graph-based Model for Joint Chinese Word Segmentation and Dependency Parsing

Hang Yan, Xipeng Qiu, Xuanjing Huang

Keywords Paper

Joint Segmentation, Joint Parsing, Chinese segmentation, dependency parsing

0

0

0

0

8:15

08/12/2020

Multi-grained Chinese Word Segmentation with Weakly Labeled Data

Chen Gong, Zhenghua Li, Bowei Zou, Min Zhang

Keywords Paper

0

0

0

0

14:48

08/12/2020

Does Chinese BERT Encode Word Structure?

Yile Wang, Leyang Cui, Yue Zhang

Keywords Paper

0

0

0

0

9:38

16/11/2020

Improving the Efficiency of Grammatical Error Correction with Erroneous Span Detection and Correction

Mengyun Chen, Tao Ge, Xingxing Zhang and
Furu Wei, Ming Zhou

Keywords Paper

erroneous detection, erroneous correction, inference, language-independent approach

0

0

0

0

6:27

08/12/2020

Text Classification by Contrastive Learning and Cross-lingual Data Augmentation for Alzheimer’s Disease Detection

Zhiqiang Guo, Zhaoci Liu, Zhenhua Ling and
Shijin Wang, Lingjing Jin, Yunxia Li

Keywords Paper

0

0

0

0

13:12

02/02/2021

LET: Linguistic Knowledge Enhanced Graph Transformer for Chinese Short Text Matching

Boer Lyu, Lu Chen, Su Zhu, Kai Yu

Keywords Paper

0

0

0

0

15:57

04/07/2020

Enhancing Pre-trained Chinese Character Representation with Word-aligned Attention

Yanzeng Li, Bowen Yu, Xue Mengge, Tingwen Liu

Keywords Paper

segmentation propagation, Pre-trained Representation, Chinese models, word-aligned attention

0

0

0

0

6:13

04/07/2020

Pre-training via Leveraging Assisting Languages for Neural Machine Translation

Haiyue Song, Raj Dabre, Zhuoyuan Mao and
Fei Cheng, Sadao Kurohashi, Eiichiro Sumita

Keywords Paper

Neural Translation, S2S tasks, LOI, low-resource translation

0

0

0

0

12:04

08/12/2020

Monolingual and Multilingual Reduction of Gender Bias in Contextualized Representations

Sheng Liang, Philipp Dufter, Hinrich Schütze

Keywords Paper

0

0

0

0

14:20

04/07/2020

Glyph2Vec: Learning Chinese Out-of-Vocabulary Word Embedding from Glyphs

Hong-You Chen, SZ-HAN YU, Shou-de Lin

Keywords Paper

Chinese Embedding, Chinese applications, Chinese problem, out-of-vocabulary embedding

0

0

0

0

7:22

08/12/2020

Joint Chinese Word Segmentation and Part-of-speech Tagging via Multi-channel Attention of Character N-grams

Yuanhe Tian, Yan Song, Fei Xia

Keywords Paper

0

0

0

0

14:53

08/12/2020

Synonym Knowledge Enhanced Reader for Chinese Idiom Reading Comprehension

Siyu Long, Ran Wang, Kun Tao and
Jiali Zeng, Xinyu Dai

Keywords Paper

0

0

0

0

9:58

16/11/2020

A Joint Multiple Criteria Model in Transfer Learning for Cross-domain Chinese Word Segmentation

Kaiyu Huang, Degen Huang, Zhuang Liu, Fengran Mo

Keywords Paper

natural, chinese segmentation, chinese, chinese tasks

0

0

0

0

10:49

04/07/2020

Bootstrapping Techniques for Polysynthetic Morphological Analysis

William Lane, Steven Bird

Keywords Paper

Polysynthetic Analysis, Bootstrapping Techniques, natural technologies, linguistically-informed approaches

0

0

0

0

12:12

08/12/2020

Anaphoric Zero Pronoun Identification: A Multilingual Approach

Abdulrahman Aloraini, Massimo Poesio

Keywords Paper

0

0

0

0

12:29

01/07/2020

Incorporating Uncertain Segmentation Information into Chinese NER for Social Media Text

Shengbin Jia, Ling Ding, Xiaojun Chen and
Shijia E, Yang Xiang

Keywords Paper

0

0

0

0

18:51

25/07/2020

Chinese document classification with bi-directional convolutional language model

Bin Liu, Guosheng Yin

Keywords Paper

text classification, CNN, neural language model

0

0

0

0

9:17

02/02/2021

Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution

Jiapeng Wang, Chongyu Liu, Lianwen Jin and
Guozhi Tang, Jiaxin Zhang, Shuaitao Zhang, Qianying Wang, Yaqiang Wu, Mingxiang Cai

Keywords Paper

0

0

0

0

16:18

02/02/2021

Commonsense Knowledge Augmentation for Low-Resource Languages via Adversarial Learning

Bosung Kim, Juae Kim, Youngjoong Ko, Jungyun Seo

Keywords Paper

0

0

0

0

19:38

04/07/2020

It’s Morphin’ Time! Combating Linguistic Discrimination with Inflectional Perturbations

Samson Tan, Shafiq Joty, Min-Yen Kan, Richard Socher

Keywords Paper

Linguistic Discrimination, Inflectional Perturbations, pre-trained networks, NLP models

0

0

0

0

8:23

04/07/2020

Spelling Error Correction with Soft-Masked BERT

Shaohua Zhang, Haoran Huang, Jicong Liu, Hang Li

Keywords Paper

Spelling Correction, Chinese correction, Chinese CSC, error detection

0

0

0

0

11:34

16/11/2020

Visually Grounded Compound PCFGs

Yanpeng Zhao, Ivan Titov

Keywords Paper

exploiting groundings, language understanding, gradient estimates, fully-differentiable learning

0

0

0

0

12:24

04/07/2020

Syntax-Aware Opinion Role Labeling with Dependency Graph Convolutional Networks

Bo Zhang, Yue Zhang, Rui Wang and
Zhenghua Li, Min Zhang

Keywords Paper

Syntax-Aware Labeling, Opinion labeling, ORL, opinion task

0

0

0

0

11:47

04/07/2020

Improving Low-Resource Named Entity Recognition using Joint Sentence and Token Labeling

Canasai Kruengkrai, Thien Hai Nguyen, Sharifah Mahani Aljunied, Lidong Bing

Keywords Paper

Low-Resource Recognition, low-resource NER, NER, binary classification

0

0

0

0

6:23

04/07/2020

Cross-Linguistic Syntactic Evaluation of Word Prediction Models

Aaron Mueller, Garrett Nicolai, Panayiota Petrou-Zeniou and
Natalia Talmina, Tal Linzen

Keywords Paper

Cross-Linguistic Syntax, Syntax, Cross-Linguistic Models, neural models

0

0

0

0

10:48

02/02/2021

Ideography Leads Us to the Field of Cognition: A Radical-Guided Associative Model for Chinese Text Classification

Hanqing Tao, Shiwei Tong, Kun Zhang and
Tong Xu, Qi Liu, Enhong Chen, Min Hou

Keywords Paper

0

0

0

0

14:26

04/07/2020

Simplify the Usage of Lexicon in Chinese NER

Ruotian Ma, Minlong Peng, Qi Zhang and
Zhongyu Wei, Xuanjing Huang

Keywords Paper

Chinese recognition, NER, Lattice-LSTM, complex architecture

0

0

0

0

11:07

16/11/2020

Entity Enhanced BERT Pre-training for Chinese NER

Chen Jia, Yuefeng Shi, Qinrong Yang, Yue Zhang

Keywords Paper

chinese ner, pre-training, ner fine-tuning, ner

0

0

0

0

9:39

04/07/2020

Joint Chinese Word Segmentation and Part-of-speech Tagging via Two-way Attentions of Auto-analyzed Knowledge

Yuanhe Tian, Yan Song, Xiang Ao and
Fei Xia, Xiaojun Quan, Tong Zhang, Yonggang Wang

Keywords Paper

Chinese Segmentation, Part-of-speech Tagging, Chinese processing, joint tagging

0

0

0

0

11:53

19/04/2021

Handling out-of-vocabulary problem in hangeul word embeddings

Ohjoon Kwon, Dohyun Kim, Soo-Ryeon Lee and
Junyoung Choi, SangKeun Lee

Keywords Paper

0

0

0

0

8:54

08/12/2020

Try to Substitute: An Unsupervised Chinese Word Sense Disambiguation Method Based on HowNet

Bairu Hou, Fanchao Qi, Yuan Zang and
Xurui Zhang, Zhiyuan Liu, Maosong Sun

Keywords Paper

0

0

0

0

7:54

08/12/2020

Specializing Unsupervised Pretraining Models for Word-Level Semantic Similarity

Anne Lauscher, Ivan Vulić, Edoardo Maria Ponti and
Anna Korhonen, Goran Glavaš

Keywords Paper

0

0

0

0

13:01

16/11/2020

A Supervised Word Alignment Method based on Cross-Language Span Prediction using Multilingual BERT

Masaaki Nagata, Katsuki Chousa, Masaaki Nishino

Keywords Paper

cross-language prediction, word problem, squad task, alignment

0

0

0

0

11:13

08/12/2020

Fine-grained Information Status Classification Using Discourse Context-Aware BERT

Yufang Hou

Keywords Paper

0

0

0

0

13:13

03/05/2021

Variational Information Bottleneck for Effective Low-Resource Fine-Tuning

Rabeeh Karimi Mahabadi, Yonatan Belinkov, James Henderson

Keywords Paper

variational information bottleneck, biases, robust, over-fitting, large-scale pre-trained language models, NLP, Transfer learning

0

0

0

0

5:07

05/12/2020

High-order refining for end-to-end Chinese semantic role labeling

Hao Fei, Yafeng Ren, Donghong Ji

Keywords Paper

0

0

0

0

5:38

05/12/2020

Mixed-lingual pre-training for cross-lingual summarization

Ruochen Xu, Chenguang Zhu, Yu Shi and
Michael Zeng, Xuedong Huang

Keywords Paper

0

0

0

0

11:49

04/07/2020

Interpreting Pretrained Contextualized Representations via Reductions to Static Embeddings

Rishi Bommasani, Kelly Davis, Claire Cardie

Keywords Paper

Interpreting Representations, downstream applications, static embeddings, Pretrained Representations

0

0

0

0

12:07

08/12/2020

SpanAlign: Sentence Alignment Method based on Cross-Language Span Prediction and ILP

Katsuki Chousa, Masaaki Nagata, Masaaki Nishino

Keywords Paper

0

0

0

0

14:39

16/11/2020

Improving Multilingual Models with Language-Clustered Vocabularies

Hyung Won Chung, Dan Garrette, Kiat Chuan Tan, Jason Riesa

Keywords Paper

massively applications, multilingual generation, cross-lingual sharing, multilingual models

0

0

0

0

6:59