Abstract:
We propose a novel convolutional neural network (ConvNet) equipped with two new semantic calibration and refinement approaches for automatic polyp segmentation from colonoscopy videos. While ConvNets set state-of-the-are performance for this task, it is still difficult to achieve satisfactory results in a real-time manner, which is a necessity in clinical practice. The main obstacle is the huge semantic gap between high-level features and low-level features, making it difficult to take full advantage of complementary semantic information contained in these hierarchical features. Compared with existing solutions, which either directly aggregate these features without considering the semantic gap or employ sophisticated non-local modeling techniques to refine semantic information by introduce many extra computational costs, the proposed ConvNet is able to more precisely yet efficiently calibrate and refine semantic information for better segmentation performance without increasing model complexity; we call the proposed ConvNet as SCR-Net, which has two key modules. We first propose a semantic calibration module (SCM) to effectively transmit the semantic information from high-level layers to low-level layers by learning the semantic-spatial relations during the training procedure. We then propose a semantic refinement module (SRM) to, based on the features calibrated by SCM, enhance the discrimination capability of the features for targeting objects. Extensive experiments on the Kvasir-SEG dataset demonstrate that the proposed SCR-Net is capable of achieving better segmentation accuracy than state-of-the-art approaches with a faster speed. The proposed techniques are general enough to be applied to similar applications where precise and efficient multi-level feature fusion is critical. The code is available at https://github.com/jiafuz/SCR-Net.