Attentional Separation-and-Aggregation Network for Self-supervised Depth-Pose Learning in Dynamic Scenes

Abstract: Learning depth and ego-motion from unlabeled videos via self-supervision from epipolar projection can improve the robustness and accuracy of the 3D perception and localization of vision-based robots. However, the rigid projection computed by ego-motion cannot represent all scene points, such as points on moving objects, leading to false guidance in these regions. To address this problem, we propose an Attentional Separation-and-Aggregation Network (ASANet), which can learn to distinguish and extract the scene’s static and dynamic characteristics via the attention mechanism. We further propose a novel MotionNet with an ASANet as the encoder, followed by two separate decoders, to estimate the camera’s ego-motion and the scene’s dynamic motion field. Then, we introduce an auto-selecting approach to detect the moving objects for dynamic-aware learning automatically. Empirical experiments demonstrate that our method can achieve the state-of-the-art performance on the KITTI benchmark.

16/11/2020

Applications, Applications, Computer Vision; Deep Learning, Deep Autoencoders; Deep Learning, Generative Models; Probabilistic Methods , Reinforcement Learning and Planning, Deep RL

5:13

16/11/2020

3D human pose estimation, 3D global motion estimation, unpaired training, 2d to 3d pose regression, monocular motion capture, human time sequence modeling

8:32

14/06/2020

ACNMP: Skill Transfer and Task Extrapolation through Learning from Demonstration and Reinforcement Learning via Representation Sharing

Mel Vecerik, Jean-Baptiste Regli, Oleg Sushkov and
David Barker, Rugile Pevceviciute, Thomas Roth ̈orl, Raia Hadsell, Lourdes Agapito, Jonathan Scholz

Andy Zeng, Pete Florence, Jonathan Tompson and
Stefan Welker, Jonathan Chien, Maria Attarian, Travis Armstrong, Ivan Krasin, Dan Duong, Vikas Sindhwani, Johnny Lee

Roland Hafner, Tim Hertweck, Philipp Kloeppner and
Michael Bloesch, Michael Neunert, Markus Wulfmeier, Saran Tunyasuvunakool, Nicolas Heess, Martin Riedmiller

self-supervised, unsupervised, keypoints, landmarks, pose, videos, adversarial, gan, disentanglement, factorizations

5:01

14/06/2020

6d pose estimation, keypoints detection, relative pose, object detection, deep learning, computer vision, multi-task learning, metric learning, multi-view learning, epipolar geometry

1:01

14/06/2020

autolabeling, differentiable rendering, pose and shape optimization, curriculum learning, object detection, autonomous driving, 3d shape modeling

4:59

16/11/2020

Deep Reactive Planning in Dynamic Environments

Kei Ota, Devesh Jha, Tadashi Onishi and
Asako Kanezaki, Yusuke Yoshiyasu, Yoko Sasaki, Toshisada Mariyama, Daniel Nikovski

Keywords Paper

5:05

03/05/2021

CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning

Ossama Ahmed, Frederik Träuble, Anirudh Goyal and
Alexander Neitz, Manuel Wuthrich, Yoshua Bengio, Bernhard Schoelkopf, Stefan Bauer

3d human pose estimation, self-supervised learning, disentangling factors of variation, human puppet model, pose transfer, novel view synthesis, human part segmentation

5:00

16/11/2020

instance completion, shape completion, instance segmentation, 3d scene understanding, rgb-d, multi-view geometry, segmentation

1:01