Abstract:
Exploiting global contextual information has been shown useful for improving performance of scene parsing and hence is widely used. In this paper, unlike previous work that captures long-range dependencies with multi-scale feature fusion or attention mechanism, we address the scene parsing tasks by aggregating rich contextual information based on graph reasoning. Specifically, we propose two graph reasoning modules, in which features are aggregated over the coordinate space and projected to the feature and probabilistic spaces, respectively. The feature graph reasoning module adaptively constructs pyramid graphs as multi-scale feature representations and then performs graph reasoning to model global context. Whilst, in the probabilistic graph reasoning module, graph reasoning is performed over a graph consisting of class-dependent representations generated by aggregating the pixels that belong to the same classes. We have conducted extensive experiments on the popular scene parsing datasets, including Cityscapes, PASCAL Context and ADE20K, and achieved state-of-the-art performances.