Abstract:
Detecting 3D objects from point clouds is a significant yet challenging issue in many applications. While most existing approaches seek to leverage geometric information of point clouds, few studies accommodate the inherent semantic characteristics of each point and the consistency between the geometric and semantic cues. In this work, we propose a novel semantic consistency network (SCNet) driven by a natural principle: the class of a predicted 3D bounding box should be consistent with the classes of all the points inside this box. Specifically, our SCNet consists of a feature extraction structure, a detection decision structure, and a semantic segmentation structure. In inference, the feature extraction and the detection decision structures are used to detect 3D objects. In training, the semantic segmentation structure is jointly trained with the other two structures to produce more robust and applicative model parameters. A novel semantic consistency loss is proposed to regulate the output 3D object boxes and the segmented points to boost the performance. Our model is evaluated on two challenging datasets and achieves comparable results to the state-of-the-art methods.