A Supervised Word Alignment Method based on Cross-Language Span Prediction using Multilingual BERT

16/11/2020

A Supervised Word Alignment Method based on Cross-Language Span Prediction using Multilingual BERT

Masaaki Nagata, Katsuki Chousa, Masaaki Nishino

Keywords: cross-language prediction, word problem, squad task, alignment

Abstract Paper Similar Papers

Abstract: We present a novel supervised word alignment method based on cross-language span prediction. We first formalize a word alignment problem as a collection of independent predictions from a token in the source sentence to a span in the target sentence. Since this step is equivalent to a SQuAD v2.0 style question answering task, we solve it using the multilingual BERT, which is fine-tuned on manually created gold word alignment data. It is nontrivial to obtain accurate alignment from a set of independently predicted spans. We greatly improved the word alignment accuracy by adding to the question the source token′s context and symmetrizing two directional predictions. In experiments using five word alignment datasets from among Chinese, Japanese, German, Romanian, French, and English, we show that our proposed method significantly outperformed previous supervised and unsupervised word alignment methods without any bitexts for pretraining. For example, we achieved 86.7 F1 score for the Chinese-English data, which is 13.3 points higher than the previous state-of-the-art supervised method.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at EMNLP 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

08/12/2020

SpanAlign: Sentence Alignment Method based on Cross-Language Span Prediction and ILP

Katsuki Chousa, Masaaki Nagata, Masaaki Nishino

Keywords Paper

0

0

0

0

14:39

04/07/2020

A Novel Cascade Binary Tagging Framework for Relational Triple Extraction

Zhepei Wei, Jianlin Su, Yue Wang and
Yuan Tian, Yi Chang

Keywords Paper

Relational Extraction, large-scale construction, overlapping problem, relational task

0

0

0

0

11:05

16/11/2020

Nested Named Entity Recognition via Second-best Sequence Learning and Decoding

Takashi Shibuya, Eduard Hovy

Keywords Paper

inference, flat tasks, neural model, decoding method

0

0

0

0

12:04

08/12/2020

Specializing Unsupervised Pretraining Models for Word-Level Semantic Similarity

Anne Lauscher, Ivan Vulić, Edoardo Maria Ponti and
Anna Korhonen, Goran Glavaš

Keywords Paper

0

0

0

0

13:01

19/04/2021

Coordinate constructions in English enhanced Universal Dependencies: Analysis and computational modeling

Stefan Grünewald, Prisca Piccirilli, Annemarie Friedrich

Keywords Paper

0

0

0

0

12:44

16/11/2020

Pre-training Multilingual Neural Machine Translation by Leveraging Alignment Information

Zehui Lin, Xiao Pan, Mingxuan Wang and
Xipeng Qiu, Jiangtao Feng, Hao Zhou, Lei Li

Keywords Paper

machine mt, mt, rich mt, universal model

0

0

0

0

12:00

04/07/2020

SpellGCN: Incorporating Phonological and Visual Similarities into Language Models for Chinese Spelling Check

Xingyi Cheng, Weidi Xu, Kunlong Chen and
Shaohua Jiang, Feng Wang, Taifeng Wang, Wei Chu, Yuan Qi

Keywords Paper

Chinese Check, spelling errors, spelling language, CSC

0

0

0

0

10:27

03/05/2021

On Learning Universal Representations Across Languages

Xiangpeng Wei, Rongxiang Weng, Yue Hu and
Luxi Xing, Heng Yu, Weihua Luo

Keywords Paper

hierarchical contrastive learning, cross-lingual pretraining, universal representation learning

0

0

0

0

3:51

08/12/2020

SentiX: A Sentiment-Aware Pre-Trained Model for Cross-Domain Sentiment Analysis

Jie Zhou, Junfeng Tian, Rui Wang and
Yuanbin Wu, Wenming Xiao, Liang He

Keywords Paper

0

0

0

0

12:42

03/05/2021

On Position Embeddings in BERT

Wang Benyou, Lifeng Shang, Christina Lioma and
Xin Jiang, Hao Yang, Qun Liu, Jakob Simonsen

Keywords Paper

pretrained language model., Position Embedding, BERT

0

0

0

0

6:28

16/11/2020

DAGA: Data Augmentation with a Generation Approach forLow-resource Tagging Tasks

Bosheng Ding, Linlin Liu, Lidong Bing and
Canasai Kruengkrai, Thien Hai Nguyen, Shafiq Joty, Luo Si, Chunyan Miao

Keywords Paper

machine learning, generalization, low-resource tasks, named recognition

0

0

0

0

11:09

06/12/2020

Incorporating BERT into Parallel Sequence Decoding with Adapters

Junliang Guo, Zhirui Zhang, Linli Xu and
Hao-Ran Wei, Boxing Chen, Enhong Chen

Keywords Paper

0

0

0

0

3:17

04/07/2020

SpanBERT: Improving Pre-training by Representing and Predicting Spans

Mandar Joshi, Danqi Chen, Yinhan Liu and
Daniel S. Weld, Luke Zettlemoyer, Omer Levy

Keywords Paper

span tasks, question answering, coreference resolution, OntoNotes task

0

0

0

0

14:14

01/07/2020

Expand and Filter: CUNI and LMU Systems for the WNGT 2020 Duolingo Shared Task

Jindřich Libovický, Zdeněk Kasner, Jindřich Helcl, Ondřej Dušek

Keywords Paper

0

0

0

0

4:59

02/02/2021

Encoding Syntactic Knowledge in Transformer Encoder for Intent Detection and Slot Filling

Jixuan Wang, Kai Wei, Martin Radfar and
Weiwei Zhang, Clement Chung

Keywords Paper

0

0

0

0

19:31

16/11/2020

SeqMix: Augmenting Active Sequence Labeling via Sequence Mixup

Rongzhi Zhang, Yue Yu, Chao Zhang

Keywords Paper

low-resource tasks, active labeling, mixup, sequence mixup

0

0

0

0

11:16

02/02/2021

Commonsense Knowledge Augmentation for Low-Resource Languages via Adversarial Learning

Bosung Kim, Juae Kim, Youngjoong Ko, Jungyun Seo

Keywords Paper

0

0

0

0

19:38

22/11/2021

From Seq2Seq Recognition to Handwritten Word Embeddings

George Retsinas, Giorgos Sfikas, Christophoros Nikou, Petros Maragos

Keywords Paper

keyword spotting, handwritten text recognition, sequence-to-sequence

0

0

0

0

2:59

16/11/2020

Improving the Efficiency of Grammatical Error Correction with Erroneous Span Detection and Correction

Mengyun Chen, Tao Ge, Xingxing Zhang and
Furu Wei, Ming Zhou

Keywords Paper

erroneous detection, erroneous correction, inference, language-independent approach

0

0

0

0

6:27

04/07/2020

Parallel Corpus Filtering via Pre-trained Language Models

Boliang Zhang, Ajay Nagesh, Kevin Knight

Keywords Paper

machine models, WMT task, Parallel Filtering, Pre-trained Models

0

0

0

0

12:04

08/12/2020

Federated Learning for Spoken Language Understanding

Zhiqi Huang, Fenglin Liu, Yuexian Zou

Keywords Paper

0

0

0

0

14:05

16/11/2020

An Unsupervised Sentence Embedding Method by Mutual Information Maximization

Yan Zhang, Ruidan He, Zuozhu Liu and
Kwan Hui Lim, Lidong Bing

Keywords Paper

sentence-pair tasks, clustering, semantic search, downstream tasks

0

0

0

0

12:22

26/04/2020

Cross-Lingual Ability of Multilingual BERT: An Empirical Study

Karthikeyan K, Zihan Wang, Stephen Mayhew, Dan Roth

Keywords Paper

Cross-Lingual Learning, Multilingual BERT

0

0

0

0

4:31

01/07/2020

RobertNLP at the IWPT 2020 Shared Task: Surprisingly Simple Enhanced UD Parsing for English

Stefan Grünewald, Annemarie Friedrich

Keywords Paper

0

0

0

0

7:40

06/12/2021

FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition

Yichong Leng, Xu Tan, Linchen Zhu and
Jin Xu, Renqian Luo, Linquan Liu, Tao Qin, Xiangyang Li, Edward Lin, Tie-Yan Liu

Keywords Paper

0

0

0

0

13:44

02/02/2021

Multilingual Transfer Learning for QA using Translation as Data Augmentation

Mihaela Bornea, Lin Pan, Sara Rosenthal and
Radu Florian, Avirup Sil

Keywords Paper

0

0

0

0

15:44

06/12/2020

Multi-label Contrastive Predictive Coding

Jiaming Song, Stefano Ermon

Keywords Paper

0

0

0

0

3:10

04/07/2020

Cross-Lingual Unsupervised Sentiment Classification with Multi-View Transfer Learning

Hongliang Fei, Ping Li

Keywords Paper

Cross-Lingual Classification, sentiment classification, unsupervised system, classification

0

0

0

0

12:23

16/11/2020

LAReQA: Language-Agnostic Answer Retrieval from a Multilingual Pool

Uma Roy, Noah Constant, Rami Al-Rfou and
Aditya Barua, Aaron Phillips, Yinfei Yang

Keywords Paper

language-agnostic retrieval, cross-lingual tasks, cross-lingual retrieval, alignment

0

0

0

0

12:07

04/07/2020

AdvAug: Robust Adversarial Augmentation for Neural Machine Translation

Yong Cheng, Lu Jiang, Wolfgang Macherey, Jacob Eisenstein

Keywords Paper

Robust Augmentation, Neural Translation, Neural NMT, Neural

0

0

0

0

12:16

19/08/2021

MultiMirror: Neural Cross-lingual Word Alignment for Multilingual Word Sense Disambiguation

Luigi Procopio, Edoardo Barba, Federico Martelli, Roberto Navigli

Keywords Paper

Natural Language Processing, Natural Language Semantics, Resources and Evaluation

0

0

0

0

12:25

04/07/2020

Unsupervised Cross-lingual Representation Learning at Scale

Alexis Conneau, Kartikay Khandelwal, Naman Goyal and
Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, Veselin Stoyanov

Keywords Paper

cross-lingual tasks, XNLI, MLQA, NER

0

0

0

0

12:15

02/02/2021

ShapeNet: A Shapelet-Neural Network Approach for Multivariate Time Series Classification

Guozhong Li, Byron Choi, Jianliang Xu and
Sourav S Bhowmick, Kwok-Pan Chun, Grace Lai-Hung Wong

Keywords Paper

0

0

0

0

15:03

16/11/2020

XL-WiC: A Multilingual Benchmark for Evaluating Semantic Contextualization

Alessandro Raganato, Tommaso Pasini, Jose Camacho-Collados, Mohammad Taher Pilehvar

Keywords Paper

disambiguation task, binary problem, evaluation scenarios, zero-shot transfer

0

0

0

0

10:08

08/12/2020

Fine-grained Information Status Classification Using Discourse Context-Aware BERT

Yufang Hou

Keywords Paper

0

0

0

0

13:13

19/04/2021

Modeling context in answer sentence selection systems on a latency budget

Rujun Han, Luca Soldaini, Alessandro Moschitti

Keywords Paper

0

0

0

0

7:03

04/07/2020

Pretraining with Contrastive Sentence Objectives Improves Discourse Performance of Language Models

Dan Iter, Kelvin Guu, Larry Lansing, Dan Jurafsky

Keywords Paper

Discourse, unsupervised text, contextual representations, discourse-level representations

0

0

0

0

9:14

08/12/2020

Exploring Cross-sentence Contexts for Named Entity Recognition with BERT

Jouni Luoma, Sampo Pyysalo

Keywords Paper

0

0

0

0

14:39

08/12/2020

Multi-grained Chinese Word Segmentation with Weakly Labeled Data

Chen Gong, Zhenghua Li, Bowei Zou, Min Zhang

Keywords Paper

0

0

0

0

14:48

16/11/2020

Simulated multiple reference training improves low-resource machine translation

Huda Khayrallah, Brian Thompson, Matt Post, Philipp Koehn

Keywords Paper

machine mt, mt, simulated training, simulated

0

0

0

0

6:56