Dual Super-Resolution Learning for Semantic Segmentation

Abstract: Current state-of-the-art semantic segmentation methods often apply high-resolution input to attain high performance, which brings large computation budgets and limits their applications on resource-constrained devices. In this paper, we propose a simple and flexible two-stream framework named Dual Super-Resolution Learning (DSRL) to effectively improve the segmentation accuracy without introducing extra computation costs. Specifically, the proposed method consists of three parts: Semantic Segmentation Super-Resolution (SSSR), Single Image Super-Resolution (SISR) and Feature Affinity (FA) module, which can keep high-resolution representations with low-resolution input while simultaneously reducing the model computation complexity. Moreover, it can be easily generalized to other tasks, e.g., human pose estimation. This simple yet effective method leads to strong representations and is evidenced by promising performance on both semantic segmentation and human pose estimation. Specifically, for semantic segmentation on CityScapes, we can achieve $\geq$2\% higher mIoU with similar FLOPs, and keep the performance with 70\% FLOPs. For human pose estimation, we can gain $\geq$2\% mAP with the same FLOPs and maintain mAP with $30\%$ fewer FLOPs. Code and models are available at \url{https://github.com/wanglixilinx/DSRL}.