3D self-attention for unsupervised video quantization

25/07/2020

3D self-attention for unsupervised video quantization

Jingkuan Song, Ruimin Lang, Xiaosu Zhu, Xing Xu, Lianli Gao, Heng Tao Shen

Keywords: quantization, video retrieval, ann search

Abstract Paper Similar Papers

Abstract: Unsupervised video quantization is to compress the original videos to compact binary codes so that video retrieval can be conducted in an efficient way. In this paper, we make a first attempt to combine quantization method with video retrieval called 3D-UVQ, which obtains high retrieval accuracy with low storage cost. In the proposed framework, we address two main problems: 1) how to design an effective pipeline to perceive video contextual information for video features extraction; and 2) how to quantize these features for efficient retrieval. To tackle these problems, we propose a 3D self-attention module to exploit the spatial and temporal contextual information, where each pixel is influenced by its surrounding pixels. By taking a further recurrent operation, each pixel can finally capture the global context from all pixels. Then, we propose gradient-based residual quantization which consists of several quantization blocks to approximate the features gradually. Extensive experimental results on three benchmark datasets demonstrate that our method significantly outperforms the state-of-the-arts. Ablation study shows that both the 3D self-attention module and the gradient-based residual quantization can improve the performance of retrieval. Our model is publicly available at https://github.com/brownwolf/3D-UVQ.

The video of this talk cannot be embedded. You can watch it here:

https://dl.acm.org/doi/10.1145/3397271.3401122#sec-supp

(Link will open in new window)

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at SIGIR 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

22/11/2021

GTA: Global Temporal Attention for Video Action Understanding

Bo He, Xitong Yang, Zuxuan Wu and
Hao Chen, Ser-Nam Lim, Abhinav Shrivastava

Keywords Paper

action recognition, self-attention, temporal modeling

0

0

0

0

2:55

14/06/2020

Softmax Splatting for Video Frame Interpolation

Simon Niklaus, Feng Liu

Keywords Paper

video frame interpolation, softmax splatting, differentiable forward warping, feature pyramids for image synthesis

0

0

0

0

1:00

06/12/2021

Compressed Video Contrastive Learning

Yuqi Huo, Mingyu Ding, Haoyu Lu and
Nanyi Fei, Zhiwu Lu, Ji-Rong Wen, Ping Luo

Keywords Paper

self-supervised learning, contrastive learning, representation learning

0

0

0

0

9:07

30/11/2020

Robust High Dynamic Range (HDR) Imaging with Complex Motion and Parallax

Zhiyuan Pu, Peiyao Guo, M. Salman Asif, Zhan Ma

Keywords Paper

0

0

0

0

7:38

30/11/2020

Mask-Ranking Network for Semi-Supervised Video Object Segmentation

Wenjing Li, Xiang Zhang, Yujie Hu, Yingqi Tang

Keywords Paper

0

0

0

0

5:36

22/11/2021

Efficient Video Super Resolution by Gated Local Self Attention

Davide Abati, Amir Ghodrati, Amirhossein Habibian

Keywords Paper

video super resolution, video efficiency, super resolution

0

0

0

0

2:51

02/02/2021

Semantic Grouping Network for Video Captioning

Hobin Ryu, Sunghun Kang, Haeyong Kang, Chang D. Yoo

Keywords Paper

0

0

0

0

17:41

06/12/2021

Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Background Mixing

Aadarsh Sahoo, Rutav Shah, Rameswar Panda and
Kate Saenko, Abir Das

Keywords Paper

domain adaptation, contrastive learning

0

0

0

0

13:20

07/09/2020

Making a Case for 3D Convolutions for Object Segmentation in Videos

Sabarinath Mahadevan, Ali Athar, Aljosa Osep and
Laura Leal-Taixé, Bastian Leibe, Sebastian Hennen

Keywords Paper

object tracking, video segmentation, video object segmentation, video scene understanding, object segmentation

0

0

0

0

8:16

14/06/2020

Efficient Dynamic Scene Deblurring Using Spatially Variant Deconvolution Network With Optical Flow Guided Training

Yuan Yuan, Wei Su, Dandan Ma

Keywords Paper

dynamic scene deblurring, deconvolution neural network, bi-directional optical flow, deformable convolution, deep learning, image restoration

0

0

0

0

0:57

22/11/2021

V3GAN: Decomposing Background, Foreground and Motion for Video Generation

Arti Keshari, Sonam Gupta, Sukhendu Das

Keywords Paper

video generation, unconditional video generation, shuffling loss, feature level masking, unsupervised learning, GAN, foreground, background, motion decomposition

0

0

0

0

3:02

02/02/2021

Proposal-Free Video Grounding with Contextual Pyramid Network

Kun Li, Dan Guo, Meng Wang

Keywords Paper

0

0

0

0

14:19

14/06/2020

Learning Event-Based Motion Deblurring

Zhe Jiang, Yu Zhang, Dongqing Zou and
Jimmy Ren, Jiancheng Lv, Yebin Liu

Keywords Paper

deblur, event camera, video reconstruction, image restoration, low-level vision, neural networks, adversarial training, adaptive sampling, supervised learning, dynamic vision sensor

0

0

0

0

1:01

02/02/2021

Motion-blurred Video Interpolation and Extrapolation

Dawit Mureja Argaw, Junsik Kim, Francois Rameau, In So Kweon

Keywords Paper

0

0

0

0

17:28

03/05/2021

Self-Supervised Learning of Compressed Video Representations

Youngjae Yu, Sangho Lee, Gunhee Kim, Yale Song

Keywords Paper

self-supervised learning, Compressed videos

0

0

0

0

4:34

14/06/2020

Attention Mechanism Exploits Temporal Contexts: Real-Time 3D Human Pose Reconstruction

Ruixu Liu, Ju Shen, He Wang and
Chen Chen, Sen-ching Cheung, Vijayan Asari

Keywords Paper

3d human pose, attention mechanism, multi-scale dilation convolution, monocular motion reconstruction

0

0

0

0

5:01

17/08/2020

Radiative backpropagation: An adjoint method for lightning-fast differentiable rendering

Merlin Nimier-David, Sébastien Speierer, Benoı̂t Ruiz, Wenzel Jakob

Keywords Paper

ray tracing, global illumination, differentiable rendering

0

0

0

0

17:54

14/06/2020

Blurry Video Frame Interpolation

Wang Shen, Wenbo Bao, Guangtao Zhai and
Li Chen, Xiongkuo Min, Zhiyong Gao

Keywords Paper

video frame interpolation, frame-rate up-conversion, video deblurring, pyramid framework, spatial and temporal optimization

0

0

0

0

5:01

14/06/2020

Video Instance Segmentation Tracking With a Modified VAE Architecture

Chung-Ching Lin, Ying Hung, Rogerio Feris, Linglin He

Keywords Paper

video instance segmentation, video object tracking, variational autoencoder, vae, gaussian process, multi-task learning

0

0

0

0

1:01

14/06/2020

Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution

Xiaoyu Xiang, Yapeng Tian, Yulun Zhang and
Yun Fu, Jan P. Allebach, Chenliang Xu

Keywords Paper

space-time video super-resolution, high-resolution, slow motion, one-stage, fast and accurate, feature temporal interpolation, deformable convlstm, temporal alignment, temporal aggregation, video restoration

0

0

0

0

1:00

02/02/2021

MAMBA: Multi-level Aggregation via Memory Bank for Video Object Detection

Guanxiong Sun, Yang Hua, Guosheng Hu, Neil Robertson

Keywords Paper

0

0

0

0

16:48

06/12/2021

ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction

Gengshan Yang, Deqing Sun, Varun Jampani and
Daniel Vlasic, Forrester Cole, Ce Liu, Deva Ramanan

Keywords Paper

0

0

0

0

10:42

14/06/2020

Evolving Losses for Unsupervised Video Representation Learning

AJ Piergiovanni, Anelia Angelova, Michael S. Ryoo

Keywords Paper

unsupervised, video, represetnation learning, multi-task, multimodal

0

0

0

0

5:01

14/06/2020

A2dele: Adaptive and Attentive Depth Distiller for Efficient RGB-D Salient Object Detection

Yongri Piao, Zhengkun Rong, Miao Zhang and
Weisong Ren, Huchuan Lu

Keywords Paper

rgb-d, salient object dection, knowledge distillation, attention, computer vision, cnn

0

0

0

0

1:00

14/06/2020

Probability Weighted Compact Feature for Domain Adaptive Retrieval

Fuxiang Huang, Lei Zhang, Yang Yang, Xichuan Zhou

Keywords Paper

domain adaptive retrieval, bayesian formulation, learning to hash, transfer learning, focal-triplet loss, histogram feature of neighbors

0

0

0

0

1:03

02/02/2021

Query-Memory Re-Aggregation for Weakly-supervised Video Object Segmentation

Fanchao Lin, Hongtao Xie, Yan Li, Yongdong Zhang

Keywords Paper

0

0

0

0

14:19

14/06/2020

Exploring Spatial-Temporal Multi-Frequency Analysis for High-Fidelity and Temporal-Consistency Video Prediction

Beibei Jin, Yu Hu, Qiankun Tang and
Jingyu Niu, Zhiping Shi, Yinhe Han, Xiaowei Li

Keywords Paper

video prediction, video generation, human visual system, wavelet analysis, high fidelity, discrete wavelet transform, multi-frequency analysis, encoder-decoder, residual-in-residual dense block

0

0

0

0

1:01

06/12/2021

Deep Contextual Video Compression

Jiahao Li, Bin Li, Yan Lu

Keywords Paper

0

0

0

0

6:33

14/06/2020

Anisotropic Convolutional Networks for 3D Semantic Scene Completion

Jie Li, Kai Han, Peng Wang and
Yu Liu, Xia Yuan

Keywords Paper

semantic scene completion, dense voxel prediction, shape completion, semantic segmentation, rgb-d, anisotropic convolution, voxel-wise receptive fields, 3d convolution

0

0

0

0

1:01

06/12/2021

Temporal-attentive Covariance Pooling Networks for Video Recognition

Zilin Gao, Qilong Wang, Bingbing Zhang and
Qinghua Hu, Peihua Li

Keywords Paper

0

0

0

1

8:13

18/07/2021

Is Space-Time Attention All You Need for Video Understanding?

Gedas Bertasius, Heng Wang, Lorenzo Torresani

Keywords Paper

, Algorithms, AutoML, Deep Learning, Architectures

0

0

0

0

5:15

06/12/2020

ARMA Nets: Expanding Receptive Field for Dense Prediction

Jiahao Su, Shiqi Wang, Furong Huang

Keywords Paper

0

0

0

0

3:36

02/02/2021

Spatial-temporal Causal Inference for Partial Image-to-video Adaptation

Jin Chen, Xinxiao Wu, Yao Hu, Jiebo Luo

Keywords Paper

0

0

0

0

20:01

30/11/2020

Dynamic Depth Fusion and Transformation for Monocular 3D Object Detection

Erli Ouyang, Li Zhang, Mohan Chen and
Anurag Arnab, Yanwei Fu

Keywords Paper

0

0

0

0

6:30

06/12/2021

TransformerFusion: Monocular RGB Scene Reconstruction using Transformers

Aljaz Bozic, Pablo Palafox, Justus Thies and
Angela Dai, Matthias Niessner

Keywords Paper

transformers

0

0

0

0

7:14

14/06/2020

Learning Fused Pixel and Feature-Based View Reconstructions for Light Fields

Jinglei Shi, Xiaoran Jiang, Christine Guillemot

Keywords Paper

light field, view synthesis, feature-based reconstruction, pixel-based reconstruction, deep learning, angular super-resolution

0

0

0

0

4:56

07/09/2020

Align-and-Attend Network for Globally and Locally Coherent Video Inpainting

Sanghyun Woo, Dahun Kim, KwanYong Park and
Joon-Young Lee, In So Kweon

Keywords Paper

Video Inpainting, Video Processing, Spatio-Temporal Alignment, Spatio-Temporal Non-local Attention

0

0

0

0

5:17

14/06/2020

JA-POLS: A Moving-Camera Background Model via Joint Alignment and Partially-Overlapping Local Subspaces

Irit Chelly, Vlad Winter, Dor Litvak and
David Rosen, Oren Freifeld

Keywords Paper

background subtraction, video analysis, computer vision, machine learning, robust pca, deep learning, moving camera, transfer learning, video surveillance, lie groups

0

0

0

0

1:00

05/01/2021

Cinematic-L1 Video Stabilization With a Log-Homography Model

Arwen Bradley, Jason Klivington, Joseph Triscari, Rudolph van der Merwe

Keywords Paper

0

0

0

0

5:01

30/11/2020

MIX'EM: Unsupervised Image Classification using a Mixture of Embeddings

Ali Varamesh, Tinne Tuytelaars

Keywords Paper

0

0

0

0

6:40