15/06/2020

Faster general parsing through context-free memoization

Grzegorz Herman

Keywords: Earley, context-free, generalized parsing, GLL, GLR, memoization

Abstract: We present a novel parsing algorithm for all context-free languages. The algorithm features a clean mathematical formulation: parsing is expressed as a series of standard operations on regular languages and relations. Parsing complexity w.r.t. input length matches the state of the art: it is worst-case cubic, quadratic for unambiguous grammars, and linear for LR-regular grammars. What distinguishes our approach is that parsing can be implemented using only immutable, acyclic data structures. We also propose a parsing optimization technique called context-free memoization. It allows handling an overwhelming majority of input symbols using a simple stack and a lookup table, similarly to the operation of a deterministic LR(1) parser. This allows our proof-of-concept implementation to outperform the best current implementations of common generalized parsing algorithms (Earley, GLR, and GLL). Tested on a large Java source corpus, parsing is 3–5 times faster, while recognition—35 times faster.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at PLDI 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd Characters remaining: 140

Similar Papers