Abstract:
We present a novel approach that converts partial and noisy RGB-D scans into high-quality 3D scene reconstructions by inferring unobserved scene geometry. Our approach is fully self-supervised and can hence be trained solely on incomplete, real-world scans. To achieve, self-supervision, we remove frames from a given (incomplete) 3D scan in order to make it even more incomplete. self-supervision is then formulated by correlating the two levels of partialness of the same scan while masking out regions that have never been observed. Through generalization across a large training set, we can then predict 3D scene completions even without seeing any 3D scan of entirely complete geometry. Combined with a new 3D sparse generative convolutional neural network architecture, our method is able to predict highly detailed surfaces in a coarse-to-fine hierarchical fashion that outperform existing state-of-the-art methods by a significant margin in terms of reconstruction quality.