08/12/2020

Mitigating Silence in Compliance Terminology during Parsing of Utterances

Esme Manandise, Conrad de Peuter

Keywords:

Abstract: This paper reports on an approach to increase multi-token-term recall in a parsing task. We use a compliance-domain parser to extract, during the process of parsing raw text, terms that are unlisted in the terminology. The parser uses a similarity measure (Generalized Dice Coefficient) between listed terms and unlisted term candidates to (i) determine term status, (ii) serve putative terms to the parser, (iii) decrease parsing complexity by glomming multi-tokens as lexical singletons, and (iv) automatically augment the terminology after parsing of an utterance completes. We illustrate a small experiment with examples from the tax-and-regulations domain. Bootstrapping the parsing process to detect out- of-vocabulary terms at runtime increases parsing accuracy in addition to producing other benefits to a natural-language-processing pipeline, which translates arithmetic calculations written in English into computer-executable operations.

The video of this talk cannot be embedded. You can watch it here:
https://underline.io/lecture/6521-mitigating-silence-in-compliance-terminology-during-parsing-of-utterances
(Link will open in new window)
 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at COLING Workshops 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd Characters remaining: 140

Similar Papers