Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers

06/12/2021

Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers

Mandela Patrick, Dylan Campbell, Yuki Asano, Ishan Misra, Florian Metze, Christoph Feichtenhofer, Andrea Vedaldi, João Henriques

Keywords: transformers

Abstract Paper Similar Papers

Abstract: In video transformers, the time dimension is often treated in the same way as the two spatial dimensions. However, in a scene where objects or the camera may move, a physical point imaged at one location in frame $t$ may be entirely unrelated to what is found at that location in frame $t+k$. These temporal correspondences should be modeled to facilitate learning about dynamic scenes. To this end, we propose a new drop-in block for video transformers - trajectory attention - that aggregates information along implicitly determined motion paths. We additionally propose a new method to address the quadratic dependence of computation and memory on the input size, which is particularly important for high resolution or long videos. While these ideas are useful in a range of settings, we apply them to the specific task of video action recognition with a transformer model and obtain state-of-the-art results on the Kinetics, Something-Something V2, and Epic-Kitchens datasets.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2020

Convolutional Tensor-Train LSTM for Spatio-Temporal Learning

Jiahao Su, Wonmin Byeon, Jean Kossaifi and
Furong Huang, Jan Kautz, Anima Anandkumar

Keywords Paper

0

0

0

0

3:29

14/06/2020

Time Flies: Animating a Still Image With Time-Lapse Video As Reference

Chia-Chi Cheng, Hung-Yu Chen, Wei-Chen Chiu

Keywords Paper

time-lapse video animation, self-supervised learning, style transfer, temporal consistency

0

0

0

0

1:01

14/06/2020

AdaCoF: Adaptive Collaboration of Flows for Video Frame Interpolation

Hyeongmin Lee, Taeoh Kim, Tae-young Chung and
Daehyun Pak, Yuseok Ban, Sangyoun Lee

Keywords Paper

video frame interpolation, video temporal super-resolution, frame rate up conversion, frame synthesis, motion estimation, motion compensation, frame warping

0

0

0

0

1:01

06/12/2021

Space-time Mixing Attention for Video Transformer

Adrian Bulat, Juan Manuel Perez Rua, Swathikiran Sudhakaran and
Brais Martinez, Georgios Tzimiropoulos

Keywords Paper

transformers

0

0

0

0

10:25

05/01/2021

Coarse Temporal Attention Network (CTA-Net) for Driver's Activity Recognition

Zachary Wharton, Ardhendu Behera, Yonghuai Liu, Nik Bessis

Keywords Paper

0

0

0

0

5:30

18/07/2021

Generative Video Transformer: Can Objects be the Words?

Yi-Fu Wu, Jaesik Yoon, Sungjin Ahn

Keywords Paper

Deep Learning, Generative Models

0

0

0

0

5:15

02/02/2021

Vid-ODE: Continuous-Time Video Generation with Neural Ordinary Differential Equation

Sunghyun Park, Kangyeol Kim, Junsoo Lee and
Jaegul Choo, Joonseok Lee, Sookyung Kim, Edward Choi

Keywords Paper

0

0

0

0

18:25

06/12/2020

Video Frame Interpolation without Temporal Priors

Youjian Zhang, Chaoyue Wang, Dacheng Tao

Keywords Paper

0

0

0

0

3:18

05/01/2021

Only Time Can Tell: Discovering Temporal Data for Temporal Modeling

Laura Sevilla-Lara, Shengxin Zha, Zhicheng Yan and
Vedanuj Goswami, Matt Feiszli, Lorenzo Torresani

Keywords Paper

0

0

0

0

4:14

26/04/2020

Physics-as-Inverse-Graphics: Unsupervised Physical Parameter Estimation from Video

Miguel Jaques, Michael Burke, Timothy Hospedales

Keywords Paper

0

0

0

0

4:41

22/11/2021

Knowing What, Where and When to Look: Video Action modelling with Attention

Juan-Manuel Perez-Rua, Brais Martinez, Xiatian Zhu and
Antoine S Toisoul, Victor A Escorcia, Tao Xiang

Keywords Paper

Action recognition, Fine-grained action, video attention, Spatial attention, Channel attention, Temporal attention, Spatio-temporal attention, Feature refinement

0

0

0

0

2:46

22/11/2021

Continuous-Time Video Generation via Learning Motion Dynamics with Neural ODE

Kangyeol Kim, Sunghyun Park, Junsoo Lee and
Joonseok Lee, Sookyung Kim, Jaegul Choo, Edward Choi

Keywords Paper

video generation, generative model, neural ODE, motion estimation

0

0

0

0

2:46

02/02/2021

MVFNet: Multi-View Fusion Network for Efficient Video Recognition

Wenhao Wu, Dongliang He, Tianwei Lin and
Fu Li, Chuang Gan, Errui Ding

Keywords Paper

0

0

0

0

14:02

14/06/2020

JA-POLS: A Moving-Camera Background Model via Joint Alignment and Partially-Overlapping Local Subspaces

Irit Chelly, Vlad Winter, Dor Litvak and
David Rosen, Oren Freifeld

Keywords Paper

background subtraction, video analysis, computer vision, machine learning, robust pca, deep learning, moving camera, transfer learning, video surveillance, lie groups

0

0

0

0

1:00

03/05/2021

A Good Image Generator Is What You Need for High-Resolution Video Synthesis

Yu Tian, Jian Ren, Menglei Chai and
Kyle Olszewski, Xi Peng, Dimitris Metaxas, Sergey Tulyakov

Keywords Paper

contrastive learning, cross-domain video generation, high-resolution video generation

0

0

0

0

10:03

05/01/2021

Adaptive Streaming of 360-Degree Videos With Reinforcement Learning

Sohee Park, Minh Hoai, Arani Bhattacharya, Samir R. Das

Keywords Paper

0

0

0

0

4:51

06/12/2021

Shifted Chunk Transformer for Spatio-Temporal Representational Learning

Xuefan Zha, Wentao Zhu, Lv Xun and
Sen Yang, Ji Liu

Keywords Paper

machine learning, transformers, vision, language

0

0

0

0

6:14

02/02/2021

Temporal ROI Align for Video Object Recognition

Tao Gong, Kai Chen, Xinjiang Wang and
Qi Chu, Feng Zhu, Dahua Lin, Nenghai Yu, Huamin Feng

Keywords Paper

0

0

0

0

14:29

14/06/2020

Attention Mechanism Exploits Temporal Contexts: Real-Time 3D Human Pose Reconstruction

Ruixu Liu, Ju Shen, He Wang and
Chen Chen, Sen-ching Cheung, Vijayan Asari

Keywords Paper

3d human pose, attention mechanism, multi-scale dilation convolution, monocular motion reconstruction

0

0

0

0

5:01

12/07/2020

Video Prediction via Example Guidance

Jingwei Xu, Harry (Huazhe) Xu, Bingbing Ni and
Xiaokang Yang, Trevor Darrell

Keywords Paper

Applications - Computer Vision

0

0

0

0

12:20

06/12/2021

Temporal-attentive Covariance Pooling Networks for Video Recognition

Zilin Gao, Qilong Wang, Bingbing Zhang and
Qinghua Hu, Peihua Li

Keywords Paper

0

0

0

1

8:13

22/11/2021

Temporal Meta-Adaptor for Video Object Detection

Chi Wang, Yang Hua, ZHENG LU and
Jian Gao, Neil Robertson

Keywords Paper

video object detection, temporal aggregation, meta-learning, ImageNet VID

0

0

0

0

6:58

22/11/2021

LARNet: Latent Action Representation for Human Action Synthesis

Naman Biyani, Aayush Jung Bahadur Rana, Shruti Vyas, Yogesh Rawat

Keywords Paper

action synthesis, video synthesis, joint generative model, human action generation, end-to-end learning, conditional video generation

0

0

0

0

3:02

02/02/2021

MAMBA: Multi-level Aggregation via Memory Bank for Video Object Detection

Guanxiong Sun, Yang Hua, Guosheng Hu, Neil Robertson

Keywords Paper

0

0

0

0

16:48

03/05/2021

VA-RED$^2$: Video Adaptive Redundancy Reduction

Bowen Pan, Rameswar Panda, Camilo L Fosco and
Chung-Ching Lin, Alex J Andonian, Yue Meng, Kate Saenko, Aude Oliva, Rogerio Feris

Keywords Paper

0

0

0

0

5:02

14/06/2020

TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting

Zhuoqian Yang, Wentao Zhu, Wayne Wu and
Chen Qian, Qiang Zhou, Bolei Zhou, Chen Change Loy

Keywords Paper

motion retargeting, disentanglement, representation learning, video generation

0

0

0

0

1:02

22/11/2021

Conditional Model Selection for Efficient Video Understanding

Mihir Jain, Haitam Ben Yahia, Amir Ghodrati and
Amirhossein Habibian, Fatih Porikli

Keywords Paper

action recognition, efficient classification, efficient localization, conditional compute

0

0

0

0

2:49

17/08/2020

Example-driven virtual cinematography by learning camera behaviors

Hongda Jiang, Bin Wang, Xi Wang and
Marc Christie, Baoquan Chen

Keywords Paper

camera behaviors, machine learning, virtual cinematography

0

0

0

0

14:17

06/12/2021

End-to-end Multi-modal Video Temporal Grounding

Yi-Wen Chen, Yi-Hsuan Tsai, Ming-Hsuan Yang

Keywords Paper

self-supervised learning, transformers, vision, contrastive learning

0

0

0

0

8:46

18/07/2021

Is Space-Time Attention All You Need for Video Understanding?

Gedas Bertasius, Heng Wang, Lorenzo Torresani

Keywords Paper

, Algorithms, AutoML, Deep Learning, Architectures

0

0

0

0

5:15

30/11/2020

Mask-Ranking Network for Semi-Supervised Video Object Segmentation

Wenjing Li, Xiang Zhang, Yujie Hu, Yingqi Tang

Keywords Paper

0

0

0

0

5:36

14/06/2020

Video Modeling With Correlation Networks

Heng Wang, Du Tran, Lorenzo Torresani, Matt Feiszli

Keywords Paper

action recognition, video classification, motion, correlation, temporal information, kinetics, something-something.

0

0

0

0

1:05

06/12/2021

Exploring Cross-Video and Cross-Modality Signals for Weakly-Supervised Audio-Visual Video Parsing

Yan-Bo Lin, Hung-Yu Tseng, Hsin-Ying Lee and
Yen-Yu Lin, Ming-Hsuan Yang

Keywords Paper

0

0

0

0

14:06

06/12/2021

Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Background Mixing

Aadarsh Sahoo, Rutav Shah, Rameswar Panda and
Kate Saenko, Abir Das

Keywords Paper

domain adaptation, contrastive learning

0

0

0

0

13:20

02/02/2021

CompFeat: Comprehensive Feature Aggregation for Video Instance Segmentation

Yang Fu, Linjie Yang, Ding Liu and
Thomas S. Huang, Humphrey Shi

Keywords Paper

0

0

0

0

16:24

14/06/2020

Syntax-Aware Action Targeting for Video Captioning

Qi Zheng, Chaoyue Wang, Dacheng Tao

Keywords Paper

video and language, video captioning, action predicting

0

0

0

0

1:01

16/11/2020

Learning an Expert Skill-Space for Replanning Dynamic Quadruped Locomotion over Obstacles

David Surovik, Oliwier Melon, Mathieu Geisert and
Maurice Fallon, Ioannis Havoutis

Keywords Paper

0

0

0

0

5:01

02/02/2021

RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning

Peihao Chen, Deng Huang, Dongliang He and
Xiang Long, Runhao Zeng, Shilei Wen, Mingkui Tan, Chuang Gan

Keywords Paper

0

0

0

0

14:14

06/12/2021

Low-Fidelity Video Encoder Optimization for Temporal Action Localization

Mengmeng Xu, Juan Manuel Perez Rua, Xiatian Zhu and
Bernard Ghanem, Brais Martinez

Keywords Paper

optimization, machine learning, transfer learning

0

0

0

0

14:34

07/09/2020

Align-and-Attend Network for Globally and Locally Coherent Video Inpainting

Sanghyun Woo, Dahun Kim, KwanYong Park and
Joon-Young Lee, In So Kweon

Keywords Paper

Video Inpainting, Video Processing, Spatio-Temporal Alignment, Spatio-Temporal Non-local Attention

0

0

0

0

5:17