Abstract:
In this work, we propose a system for automatically extracting handwritten word embeddings, using the encoding module of a Sequence-to-Sequence (Seq2Seq) recognition network. These embeddings are proven to be very discriminative, since they can be effectively used for Keyword Spotting, while they can also be fully decoded into the target string following the Seq2Seq rationale. Architecture-wise, the proposed system incorporates several novel modules (e.g. auto-encoder path or non-recurrent CTC-branch) that assist the training procedure and boost performance. Additionally, we also show how to further process these embeddings/representations with a binarization scheme to provide compact and highly efficient descriptors, suitable for Keyword Spotting. Numerical results validate the usefulness of the proposed architecture, as our method outperforms the previous state of the art in Keyword Spotting.