08/12/2020

Automatic Topological Field Identification in (Historical) German Texts

Katrin Ortmann

Keywords:

Abstract: For the study of certain linguistic phenomena and their development over time, large amounts of textual data must be enriched with relevant annotations. Since the manual creation of such annotations requires a lot of effort, automating the process with NLP methods would be convenient. But the required amounts of training data are usually not available for non-standard or historical language. The present study investigates whether models trained on modern newspaper text can be used to automatically identify topological fields, i.e. syntactic structures, in different modern and historical German texts. The evaluation shows that, in general, it is possible to transfer a parser model to other registers or time periods with overall F1-scores >92%. However, an error analysis makes clear that additional rules and domain-specific training data would be beneficial if sentence structures differ significantly from the training data, e.g. in the case of Early New High German.

The video of this talk cannot be embedded. You can watch it here:
https://underline.io/lecture/6443-automatic-topological-field-identification-in-(historical)-german-texts
(Link will open in new window)
 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at COLING Workshops 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd

Similar Papers