30/11/2020

Dynamic Depth Fusion and Transformation for Monocular 3D Object Detection

Erli Ouyang, Li Zhang, Mohan Chen, Anurag Arnab, Yanwei Fu

Keywords:

Abstract: Removing particular objects in a video and filling up the corresponding blank regions with a plausible background is a challenging and often ill-posed task. In this paper, we propose a framework to solve this difficult problem in complex, dynamic scenes by leveraging multi-view geometry and convolutional neural networks based approaches. Given an input video with undesired object masks, we first extract the depth map and relative camera pose for each of the input frames. We then fuse the estimated depth and pose to create a global 3D scene reconstruction. By projecting the point clouds from the reconstructed grid volume, we can fill in the most of the regions masked in the original input. We then use learning-based approaches to inpaint the remaining pixels in the input video which could not be resolved by 3D reconstruction. Compared with previous video inpainting approaches, our system generates superior qualitative results on the DAVIS 2016 and KITTI datasets, particularly in scenes where multiple, large objects are removed.

The video of this talk cannot be embedded. You can watch it here:
https://accv2020.github.io/miniconf/poster_684.html
(Link will open in new window)
 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at ACCV 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd Characters remaining: 140

Similar Papers