Hierarchical Contrastive Motion Learning for Video Action Recognition

22/11/2021

Hierarchical Contrastive Motion Learning for Video Action Recognition

Xitong Yang, Xiaodong Yang, Sifei Liu, Deqing Sun, Larry Davis, Jan Kautz

Keywords: action recognition, motion hierarchy, motion representation, contrastive learning

Abstract Paper Similar Papers

Abstract: One central question for video action recognition is how to model motion. In this paper, we present hierarchical contrastive motion learning, a new self-supervised learning framework to extract effective motion representations from raw video frames. Our approach progressively learns a hierarchy of motion features that correspond to different abstraction levels in a network. This hierarchical design bridges the semantic gap between low-level motion cues and high-level recognition tasks, and promotes the fusion of appearance and motion information at multiple levels. At each level, an explicit motion self-supervision is provided via contrastive learning to enforce the motion features at the current level to predict the future ones at the previous level. Thus, the motion features at higher levels are trained to gradually capture semantic dynamics and evolve more discriminative for action recognition. Our motion learning module is lightweight and flexible to be embedded into various backbone networks. Extensive experiments on four benchmarks show that our approach compares favorably against the state-of-the-art methods yet without requiring optical flow or supervised pre-training.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at BMVC 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

07/09/2020

Attention Distillation for Learning Video Representations

Miao Liu, Xin Chen, Yun Zhang and
Yin Li, James Rehg

Keywords Paper

Action Recognition, Deep Learning, Representation Learning

0

0

0

0

9:50

14/06/2020

Deep Homography Estimation for Dynamic Scenes

Hoang Le, Feng Liu, Shu Zhang, Aseem Agarwala

Keywords Paper

homography estimation, dynamic scenes, motion estimation, multi-task learning, deep learning

0

0

0

0

1:01

14/06/2020

Syntax-Aware Action Targeting for Video Captioning

Qi Zheng, Chaoyue Wang, Dacheng Tao

Keywords Paper

video and language, video captioning, action predicting

0

0

0

0

1:01

05/01/2021

3D Human Pose and Shape Estimation Through Collaborative Learning and Multi-View Model-Fitting

Zhongguo Li, Magnus Oskarsson, Anders Heyden

Keywords Paper

0

0

0

0

5:13

06/12/2020

Self-Supervised MultiModal Versatile Networks

Jean-Baptiste Alayrac, Adria Recasens, Rosalia Schneider and
Relja Arandjelović, Jason Ramapuram, Jeffrey De Fauw, Lucas Smaira, Sander Dieleman, Andrew Zisserman

Keywords Paper

1

0

0

0

3:25

22/11/2021

Dynamic Graph Warping Transformer for Video Alignment

Junyan Wang, Yang Long, Maurice Pagnucco, Yang Song

Keywords Paper

Video alignment, Transformer, Graph Neural Network

0

0

0

0

2:45

02/02/2021

SeCo: Exploring Sequence Supervision for Unsupervised Representation Learning

Ting Yao, Yiheng Zhang, Zhaofan Qiu and
Yingwei Pan, Tao Mei

Keywords Paper

0

0

0

0

16:17

14/06/2020

Self-Supervised Monocular Trained Depth Estimation Using Self-Attention and Discrete Disparity Volume

Adrian Johnston, Gustavo Carneiro

Keywords Paper

self-supervised depth estimation, self-supervised learning, self-attention, depth estimation, uncertainty

0

0

0

0

1:01

03/05/2021

Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning

Zhenfang Chen, Jiayuan Mao, Jiajun Wu and
Kwan-Yee K Wong, Joshua B Tenenbaum, Chuang Gan

Keywords Paper

Visual Reasoning, Video Reasoning, Neuro-Symbolic Learning, Concept Learning

0

0

0

0

4:58

06/12/2021

H-NeRF: Neural Radiance Fields for Rendering and Temporal Reconstruction of Humans in Motion

Hongyi Xu, Thiemo Alldieck, Cristian Sminchisescu

Keywords Paper

robustness

0

0

0

0

8:39

14/06/2020

Single Image Reflection Removal With Physically-Based Training Images

Soomin Kim, Yuchi Huo, Sung-Eui Yoon

Keywords Paper

reflection removal, physical-based rendering, deep learning, layer decomposition, image processing

0

0

0

0

4:56

14/06/2020

A Unified Object Motion and Affinity Model for Online Multi-Object Tracking

Junbo Yin, Wenguan Wang, Qinghao Meng and
Ruigang Yang, Jianbing Shen

Keywords Paper

mot, multi-task learning, motion, affinity, attention, online

0

0

0

0

1:03

14/06/2020

Multi-Domain Learning for Accurate and Few-Shot Color Constancy

Jin Xiao, Shuhang Gu, Lei Zhang

Keywords Paper

color constancy, multi-domain learning, few-shot

0

0

0

0

1:01

12/07/2020

Feature-map-level Online Adversarial Knowledge Distillation

Inseop Chung, SeongUk Park, Kim Jangho, NOJUN KWAK

Keywords Paper

Applications - Computer Vision

0

0

0

0

14:06

12/07/2020

Multi-Agent Determinantal Q-Learning

Yaodong Yang, Ying Wen, Jun Wang and
Liheng Chen, Kun Shao, David Mguni, Weinan Zhang

Keywords Paper

Planning, Control, and Multiagent Learning

0

0

0

0

15:58

14/06/2020

Multi-Mutual Consistency Induced Transfer Subspace Learning for Human Motion Segmentation

Tao Zhou, Huazhu Fu, Chen Gong and
Jianbing Shen, Ling Shao, Fatih Porikli

Keywords Paper

human motion segmentation, transfer subspace learning, multi-level features, multi-mutual consistency learning.

0

0

0

0

1:00

06/12/2020

Self-Learning Transformations for Improving Gaze and Head Redirection

Yufeng Zheng, Seonwook Park, Xucong Zhang and
Shalini De Mello, Otmar Hilliges

Keywords Paper

0

0

0

0

3:20

03/05/2021

Self-Supervised Learning of Compressed Video Representations

Youngjae Yu, Sangho Lee, Gunhee Kim, Yale Song

Keywords Paper

self-supervised learning, Compressed videos

0

0

0

0

4:34

14/06/2020

Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition

Canjie Luo, Yuanzhi Zhu, Lianwen Jin, Yongpan Wang

Keywords Paper

data augmentation, text recognition, joint training

0

0

0

0

0:59

14/06/2020

Few-Shot Video Classification via Temporal Alignment

Kaidi Cao, Jingwei Ji, Zhangjie Cao and
Chien-Yi Chang, Juan Carlos Niebles

Keywords Paper

video classification, few-shot learning, action recognition, temporal alignment

0

0

0

0

0:57

14/06/2020

Video Modeling With Correlation Networks

Heng Wang, Du Tran, Lorenzo Torresani, Matt Feiszli

Keywords Paper

action recognition, video classification, motion, correlation, temporal information, kinetics, something-something.

0

0

0

0

1:05

14/06/2020

Learning Identity-Invariant Motion Representations for Cross-ID Face Reenactment

Po-Hsiang Huang, Fu-En Yang, Yu-Chiang Frank Wang

Keywords Paper

face reenactment, video retargeting, representation learning, video generation, adversarial learning, self-supervised learning

0

0

0

0

1:01

18/07/2021

Unsupervised Co-part Segmentation through Assembly

Qingzhe Gao, Bin Wang, Libin Liu, Baoquan Chen

Keywords Paper

Applications, Computer Vision

0

0

0

0

5:01

05/01/2021

Data-Efficient Alignment of Multimodal Sequences by Aligning Gradient Updates and Internal Feature Distributions

Jianan Wang, Boyang Li, Xiangyu Fan and
Jing Lin, Yanwei Fu

Keywords Paper

0

0

0

0

4:49

16/11/2020

Learning Predictive Representations for Deformable Objects Using Contrastive Estimation

Wilson Yan, Ashwin Vangipuram, Pieter Abbeel, Lerrel Pinto

Keywords Paper

0

0

0

0

4:49

06/12/2020

Boosting Adversarial Training with Hypersphere Embedding

Tianyu Pang, Xiao Yang, Yinpeng Dong and
Kun Xu, Jun Zhu, Hang Su

Keywords Paper

0

0

0

0

2:59

02/02/2021

Learning to Sit: Synthesizing Human-Chair Interactions via Hierarchical Control

Yu-Wei Chao, Jimei Yang, Weifeng Chen, Jia Deng

Keywords Paper

0

0

0

0

19:45

14/06/2020

Focus on Defocus: Bridging the Synthetic to Real Domain Gap for Depth Estimation

Maxim Maximov, Kevin Galim, Laura Leal-Taixé

Keywords Paper

depth estimation, generalisation, depth from focus, blur estimation, depth

0

0

0

0

1:01

06/12/2021

End-to-end Multi-modal Video Temporal Grounding

Yi-Wen Chen, Yi-Hsuan Tsai, Ming-Hsuan Yang

Keywords Paper

self-supervised learning, transformers, vision, contrastive learning

0

0

0

0

8:46

18/07/2021

Optimization Planning for 3D ConvNets

Zhaofan Qiu, Ting Yao, Chong-Wah Ngo, Tao Mei

Keywords Paper

Applications, Activity and Event Recognition

0

0

0

0

5:13

12/07/2020

When Does Self-Supervision Help Graph Convolutional Networks?

Yuning You, Tianlong Chen, Zhangyang Wang, Yang Shen

Keywords Paper

Unsupervised and Semi-Supervised Learning

0

0

0

0

14:08

14/06/2020

Distilled Semantics for Comprehensive Scene Understanding from Videos

Fabio Tosi, Filippo Aleotti, Pierluigi Zama Ramirez and
Matteo Poggi, Samuele Salti, Luigi Di Stefano, Stefano Mattoccia

Keywords Paper

monocular depth estimation, optical flow, semantic segmentation, motion segmentation, knowledge distillation

0

0

0

0

0:56

14/06/2020

Disentangled and Controllable Face Image Generation via 3D Imitative-Contrastive Learning

Yu Deng, Jiaolong Yang, Dong Chen and
Fang Wen, Xin Tong

Keywords Paper

face image synthesis, disentangled representation learning, controllable generation, gan, 3d

0

0

0

0

5:01

05/01/2021

Coarse Temporal Attention Network (CTA-Net) for Driver's Activity Recognition

Zachary Wharton, Ardhendu Behera, Yonghuai Liu, Nik Bessis

Keywords Paper

0

0

0

0

5:30

17/08/2020

Consistent video depth estimation

Xuan Luo, Jia-Bin Huang, Richard Szeliski and
Kevin Matzen, Johannes Kopf

Keywords Paper

video, depth estimation

0

0

0

1

12:43

07/09/2020

Adversarial Concurrent Training: Optimizing Robustness and Accuracy Trade-off of Deep Neural Networks

Elahe Arani, Fahad Sarfraz, Bahram Zonooz

Keywords Paper

Adversarial Robustness, Generalization, Adversarial Training, Deep Learning, Collaborative Learning

0

0

0

0

3:39

06/12/2021

Look at What I’m Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos

Reuben Tan, Bryan Plummer, Kate Saenko and
Hailin Jin, Bryan Russell

Keywords Paper

optimization

0

0

0

0

12:28

06/12/2021

Learning from Inside: Self-driven Siamese Sampling and Reasoning for Video Question Answering

Weijiang Yu, Haoteng Zheng, Mengfei Li and
Lei Ji, Lijun Wu, Nong Xiao, Nan Duan

Keywords Paper

transformers

0

0

0

0

13:47

22/11/2021

Self-Supervised Monocular Depth Estimation with Internal Feature Fusion

Hang Zhou, David Greenwood, Sarah Taylor

Keywords Paper

depth estimation, structure from motion

0

0

0

0

2:49

05/01/2021

Self-Supervised 4D Spatio-Temporal Feature Learning via Order Prediction of Sequential Point Cloud Clips

Haiyan Wang, Liang Yang, Xuejian Rong and
Jinglun Feng, Yingli Tian

Keywords Paper

0

0

0

0

4:52