Abstract:
By setting a typeface, each character of the Chinese text can be converted to a glyph pixel matrix. We propose to conduct text classification with such glyph features using bi-directional convolution. Although the pixel embedding can be applied to all languages, it is much more convenient to be used to represent Chinese scripts due to the square shape of Chinese characters. We extract both the forward and backward n-gram features of the text via bi-directional convolutional operations and then concatenate them. A subsequent 1-dimensional max-over-time pooling is applied to the bi-directional feature maps, and then three fully connected layers are used for conducting text classification. The proposed model has a light-weight architecture that only contains a single-layer convolutional neural network. Experiments on several Chinese text classification datasets demonstrate surprisingly excellent results for the training speed and superior performance of the proposed model in comparison with traditional methods.