Chinese document classification with bi-directional convolutional language model

25/07/2020

Chinese document classification with bi-directional convolutional language model

Bin Liu, Guosheng Yin

Keywords: text classification, CNN, neural language model

Abstract: By setting a typeface, each character of the Chinese text can be converted to a glyph pixel matrix. We propose to conduct text classification with such glyph features using bi-directional convolution. Although the pixel embedding can be applied to all languages, it is much more convenient to be used to represent Chinese scripts due to the square shape of Chinese characters. We extract both the forward and backward n-gram features of the text via bi-directional convolutional operations and then concatenate them. A subsequent 1-dimensional max-over-time pooling is applied to the bi-directional feature maps, and then three fully connected layers are used for conducting text classification. The proposed model has a light-weight architecture that only contains a single-layer convolutional neural network. Experiments on several Chinese text classification datasets demonstrate surprisingly excellent results for the training speed and superior performance of the proposed model in comparison with traditional methods.

Chinese document classification with bi-directional convolutional language model

Bin Liu, Guosheng Yin

Comments

Similar Papers