Unsupervised Learning of Visual 3D Keypoints for Control

Abstract: Learning sensorimotor control policies from high-dimensional images crucially relies on the quality of the underlying visual representations. Prior works show that structured latent space such as visual keypoints often outperforms unstructured representations for robotic control. However, most of these representations, whether structured or unstructured are learned in a 2D space even though the control tasks are usually performed in a 3D environment. In this work, we propose a framework to learn such a 3D geometric structure directly from images in an end-to-end unsupervised manner. The input images are embedded into latent 3D keypoints via a differentiable encoder which is trained to optimize both a multi-view consistency loss and downstream task objective. These discovered 3D keypoints tend to meaningfully capture robot joints as well as object movements in a consistent manner across both time and 3D space. The proposed approach outperforms prior state-of-art methods across a variety of reinforcement learning benchmarks. Code and videos at https://buoyancy99.github.io/unsup-3d-keypoints/.

16/11/2020

Mel Vecerik, Jean-Baptiste Regli, Oleg Sushkov and
David Barker, Rugile Pevceviciute, Thomas Roth ̈orl, Raia Hadsell, Lourdes Agapito, Jonathan Scholz

Ossama Ahmed, Frederik Träuble, Anirudh Goyal and
Alexander Neitz, Manuel Wuthrich, Yoshua Bengio, Bernhard Schoelkopf, Stefan Bauer

sparse features, reinforcement learning, key point detection, feature description, feature matching, relative pose estimation, ransac, essential matrix, sift, superpoint

5:01

03/05/2021

Martin Sundermeyer, Maximilian Durner, En Yen Puang and
Zoltan-Csaba Marton, Narunas Vaskevicius, Kai O. Arras, Rudolph Triebel

Keywords Paper

object pose estimation, encodings, multi object, synthetic data, symmetries, autoencoder, embedding, 6d object detection, t-less, relative pose estimation

1:01

30/11/2020

6d pose estimation, keypoints detection, relative pose, object detection, deep learning, computer vision, multi-task learning, metric learning, multi-view learning, epipolar geometry

1:01

14/06/2020

3d human pose estimation, self-supervised learning, disentangling factors of variation, human puppet model, pose transfer, novel view synthesis, human part segmentation

5:00

26/04/2020

Andy Zeng, Pete Florence, Jonathan Tompson and
Stefan Welker, Jonathan Chien, Maria Attarian, Travis Armstrong, Ivan Krasin, Dan Duong, Vikas Sindhwani, Johnny Lee

Keywords Paper

5:01