14/06/2020

Video Instance Segmentation Tracking With a Modified VAE Architecture

Chung-Ching Lin, Ying Hung, Rogerio Feris, Linglin He

Keywords: video instance segmentation, video object tracking, variational autoencoder, vae, gaussian process, multi-task learning

Abstract: We propose a modified variational autoencoder (VAE) architecture built on top of Mask R-CNN for instance-level video segmentation and tracking. The method builds a shared encoder and three parallel decoders, yielding three disjoint branches for predictions of future frames, object detection boxes, and instance segmentation masks. To effectively solve multiple learning tasks, we introduce a Gaussian Process model to enhance the statistical representation of VAE by relaxing the prior strong independent and identically distributed (iid) assumption of conventional VAEs and allowing potential correlations among extracted latent variables. The network learns embedded spatial interdependence and motion continuity in video data and creates a representation that is effective to produce high-quality segmentation masks and track multiple instances in diverse and unstructured videos. Evaluation on a variety of recently introduced datasets shows that our model outperforms previous methods and achieves the new best in class performance.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at CVPR 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd Characters remaining: 140

Similar Papers