Abstract:
The state-of-the-art one-stage detectors are usually implemented with Feature Pyramid Network (FPN) as neck. FPN fuses multi-scale feature information so that the detector can better deal with objects with different scales. However, FPN has information loss due to feature dimension reduction. In this paper, we introduce a new feature enhancement architecture named Multi-scale Feature Enhancement (MFE). MFE includes Scale Fusion, CombineFPN and Pixel-Region Attention module. Scale Fusion can supplement the low-level information to the high-level features without the influence of semantic gap. CombineFPN further combines top-down and bottom-up structure to reduce the information loss of all scale features. Scale Fusion and CombineFPN can fully fuse features from different levels to enhance the multi-scale features. Pixel-Region Module, a lightweight non-local attention method, is finally used to enhance features with distant neighborhood information. For FCOS, RetinaNet and Mask R-CNN with ResNet50, using MFE can increase the Average Precision (AP) by 1.2, 1.1 and 1.0 points on MS COCO test-dev. For ATSS and FSAF with ResNet101 as backbone, using MFE can increase AP by 1.2 and 1.3 points. Our method also performs well on Pascal VOC dataset.