Bird’s Eye View Segmentation Using Lifted 2D Semantic Features

Abstract: We consider the problem of Bird's Eye View (BEV) segmentation with perspective monocular camera view as input. An effective solution to this problem is important in many autonomous navigation tasks such as behavior prediction and planning, being that the BEV segmented image provides an explainable intermediate representation that captures both the geometry and layout of the surrounding scene. Our approach to this problem involves a novel view transformation layer that effectively exploits depth maps to transform 2D image features to the BEV space. The framework includes the design of a neural network architecture to produce BEV segmentation maps using the proposed transformation layer. Of particular interest is evaluation of the proposed method in complex scenarios involving highly unstructured scenes that are not represented in static maps. In the absence of an appropriate dataset for this task, we introduce the EPOSH road-scene dataset that consists of 560 video-clips of highly unstructured construction scenes, annotated with unique labels in both perspective and BEV. For evaluation, we compare our approach with several competitive baselines and recently published works and show improvement over state of the art on the Nuscenes and on our EPOSH dataset. We plan to release the dataset, code and our trained models used in the paper at https://usa.honda-ri.com/eposh

05/01/2021

Yuyang Zhang Institute of Automation, Chinese Academy of Sciences, University of Chinese Academy of Sciences and
Jinge Wang, Shibiao Xu, Xiao Liu, Xiaopeng Zhang

single image view synthesis, view synthesis, differentiable rendering, point cloud, convolutional neural networks, generative networks

4:58

22/11/2021

structured representation, 3D representation, 3D Gaussians, image generation, image synthesis, image editing, controlled generation, GANs

2:49

26/04/2020

U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation

Martin Rünz, Kejie Li, Meng Tang and
Lingni Ma, Chen Kong, Tanner Schmidt, Ian Reid, Lourdes Agapito, Julian Straub, Steven Lovegrove, Richard Newcombe

Keywords Paper

reconstruction, shape embedding, 3d vision, object detection, shape prior, object representation, monocular, sdf, pointcloud, inference

1:01

14/06/2020

rgb-d reconstruction, 3d reconstruction, texture optimization, geometry optimization, joint texture and geometry optimization

0:57

14/06/2020

shape reconstruction, shape completion, implicit function learning, 3d scene understanding, single-view reconstruction, point cloud completion, voxel super-resolution, representation learning, surface reconstruction, 3d vision

1:01

14/06/2020

human, interaction, 3d, reconstruction, contact, dataset, pose, shape, body, person

1:00

14/06/2020

pose transfer, image animation, spatial transformation, local attention, novel view synthesis, pose-guided person image generation

1:00

06/12/2021

image synthesis, generative adversarial network, 3d controllability, unsupervised learning, 3d representation, disentangled representation, differentiable rendering, neural rendering

1:01

06/12/2020

3d reconstruction, mobile lightstage, mulitview photometric stereo, svbrdf estimation, shape from shading, material segmentation, handheld 3d sensor, non-lambertian surfaces

1:01

14/06/2020

differentiable rendering, 3d reconstruction, implicit representations, multi-view reconstruction, depth completion, 3d deep learning

1:01

14/06/2020

generative adversarial networks, local, global, semantic guided, scene generation, semantic image synthesis, cross-view image generation, class-specific feature representation, attention fusion

1:00

30/11/2020