08/12/2020

Composing Byte-Pair Encodings for Morphological Sequence Classification

Adam Ek, Jean-Philippe Bernardy

Keywords:

Abstract: Byte-pair encodings is a method for splitting a word into sub-word tokens, a language model then assigns contextual representations separately to each of these tokens. In this paper, we evaluate four different methods of composing such sub-word representations into word representations. We evaluate the methods on morphological sequence classification, the task of predicting grammatical features of a word. Our experiments reveal that using an RNN to compute word representations is consistently more effective than the other methods tested across a sample of eight languages with different typology and varying numbers of byte-pair tokens per word.

The video of this talk cannot be embedded. You can watch it here:
https://underline.io/lecture/6550-composing-byte-pair-encodings-for-morphological-sequence-classification
(Link will open in new window)
 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at COLING Workshops 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd

Similar Papers