Abstract:
Recognizing scene text in natural images is challenging due to the irregular or distorted shapes of many text instances. In this paper, we propose a novel adaptive rectification model for robust recognition of arbitrary-shaped scene text. The rectification model approximates the complex non-uniform deformation required for rectifying the text with a group of localized linear projective transformations, which better preserve text's shape characteristics than non-linear deformations like TPS during the rectification. By end-to-end training with a text recognition network, the rectification model can effectively learn to transform the input text image to a more regular form that simplifies subsequent recognition. Experiment results on benchmarks demonstrate the effectiveness of the proposed rectification model for scene text recognition.