TEAM-Net: Multi-modal Learning for Video Action Recognition with Partial Decoding

22/11/2021

TEAM-Net: Multi-modal Learning for Video Action Recognition with Partial Decoding

Zhengwei Wang, Qi She, Aljosa Smolic

Keywords: video action recognition, partially decoded video, multi-modal fusion

Abstract Paper Code Similar Papers

Abstract: Most of existing video action recognition models ingest raw RGB frames. However, the raw video stream requires enormous storage and contains significant temporal redundancy. Video compression (e.g., H.264, MPEG-4) reduces the superfluous information by representing the raw video stream using the concept of Group of Pictures (GOP). Each GOP is composed of the first I-frame (aka RGB image) followed by a number of P-frames, represented by motion vectors and residuals, which can be regarded and used as pre-extracted features. In this work, we 1) introduce sampling the input for the network from partially decoded videos based on GOP-level, and 2) propose a plug-and-play mulTi-modal lEArning Module (TEAM) for training the network using information from I-frames and P-frames in an end-to-end manner. We demonstrate the superior performance of TEAM-Net compared to the baseline using RGB only. TEAM-Net also achieves the state-of-the-art performance in the area of video action recognition with partial decoding.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at BMVC 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

03/05/2021

Self-Supervised Learning of Compressed Video Representations

Youngjae Yu, Sangho Lee, Gunhee Kim, Yale Song

Keywords Paper

self-supervised learning, Compressed videos

0

0

0

0

4:34

06/12/2021

Compressed Video Contrastive Learning

Yuqi Huo, Mingyu Ding, Haoyu Lu and
Nanyi Fei, Zhiwu Lu, Ji-Rong Wen, Ping Luo

Keywords Paper

self-supervised learning, contrastive learning, representation learning

0

0

0

0

9:07

03/05/2021

VA-RED$^2$: Video Adaptive Redundancy Reduction

Bowen Pan, Rameswar Panda, Camilo L Fosco and
Chung-Ching Lin, Alex J Andonian, Yue Meng, Kate Saenko, Aude Oliva, Rogerio Feris

Keywords Paper

0

0

0

0

5:02

14/06/2020

AdaCoF: Adaptive Collaboration of Flows for Video Frame Interpolation

Hyeongmin Lee, Taeoh Kim, Tae-young Chung and
Daehyun Pak, Yuseok Ban, Sangyoun Lee

Keywords Paper

video frame interpolation, video temporal super-resolution, frame rate up conversion, frame synthesis, motion estimation, motion compensation, frame warping

0

0

0

0

1:01

06/12/2020

Cycle-Contrast for Self-Supervised Video Representation Learning

Quan Kong, Wenpeng Wei, Ziwei Deng and
Tomoaki Yoshinaga, Tomokazu Murakami

Keywords Paper

0

0

0

0

3:13

22/11/2021

Inter-intra Variant Dual Representations for Self-supervised Video Recognition

Lin ZHANG, Qi She, Zhengyang Shen, Changhu Wang

Keywords Paper

video action recognition, self-supervised learning, contrastive learning, representation learning

0

0

0

0

2:55

06/12/2021

Shifted Chunk Transformer for Spatio-Temporal Representational Learning

Xuefan Zha, Wentao Zhu, Lv Xun and
Sen Yang, Ji Liu

Keywords Paper

machine learning, transformers, vision, language

0

0

0

0

6:14

22/11/2021

Back to the Future: Cycle Encoding Prediction for Self-supervised Video Representation Learning

Xinyu Yang, Majid Mirmehdi, Tilo Burghardt

Keywords Paper

unsupervised learning, self-supervised learning, video self-supervised learning, contrastive learning, representation learning, cycle consistency, temporal prediction, action recognition

0

0

0

0

2:59

05/01/2021

Set Augmented Triplet Loss for Video Person Re-Identification

Pengfei Fang, Pan Ji, Lars Petersson, Mehrtash Harandi

Keywords Paper

0

0

0

0

4:56

14/06/2020

Non-Adversarial Video Synthesis With Learned Priors

Abhishek Aich, Akash Gupta, Rameswar Panda and
Rakib Hyder, M. Salman Asif, Amit K. Roy-Chowdhury

Keywords Paper

video synthesis, non-adversarial learning, generative network, latent space, triplet condition, latent space

0

0

0

0

0:58

06/12/2021

Look at What I’m Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos

Reuben Tan, Bryan Plummer, Kate Saenko and
Hailin Jin, Bryan Russell

Keywords Paper

optimization

0

0

0

0

12:28

14/06/2020

M-LVC: Multiple Frames Prediction for Learned Video Compression

Jianping Lin, Dong Liu, Houqiang Li, Feng Wu

Keywords Paper

learned video compression, video prediction, video coding, deep learning

0

0

0

0

1:01

18/11/2020

AARM: Action attention recalibration module for action recognition

Li Zhonghong, Yi Yang, She Ying and
Song Jialun, Wu Yukun

Keywords Paper

0

0

0

0

13:27

06/12/2020

Self-Supervised MultiModal Versatile Networks

Jean-Baptiste Alayrac, Adria Recasens, Rosalia Schneider and
Relja Arandjelović, Jason Ramapuram, Jeffrey De Fauw, Lucas Smaira, Sander Dieleman, Andrew Zisserman

Keywords Paper

1

0

0

0

3:25

07/09/2020

Tripping through time: Efficient Localization of Activities in Videos

Meera Hahn, Asim Kadav, James Rehg, Hans Peter Graf

Keywords Paper

Activity Localization, Reinforcement learning, Vision and Language

0

0

0

0

10:10

05/01/2021

Temporal Context Aggregation for Video Retrieval With Contrastive Learning

Jie Shao, Xin Wen, Bingchen Zhao, Xiangyang Xue

Keywords Paper

0

0

0

0

4:50

14/06/2020

Searching for Actions on the Hyperbole

Teng Long, Pascal Mettes, Heng Tao Shen, Cees G. M. Snoek

Keywords Paper

video retrieval, hyperbolic learning, hierarchical, zero-shot learning, action recognition, hyperbolic geometry

0

0

0

0

1:00

06/12/2021

Dynamic Normalization and Relay for Video Action Recognition

Dongqi Cai, Anbang Yao, Yurong Chen

Keywords Paper

deep learning, representation learning

0

0

0

0

10:42

05/01/2021

Weakly Supervised Deep Reinforcement Learning for Video Summarization With Semantically Meaningful Reward

Zutong Li, Lei Yang

Keywords Paper

0

0

0

0

4:54

06/12/2021

Learning from Inside: Self-driven Siamese Sampling and Reasoning for Video Question Answering

Weijiang Yu, Haoteng Zheng, Mengfei Li and
Lei Ji, Lijun Wu, Nong Xiao, Nan Duan

Keywords Paper

transformers

0

0

0

0

13:47

06/12/2021

Deep Contextual Video Compression

Jiahao Li, Bin Li, Yan Lu

Keywords Paper

0

0

0

0

6:33

06/12/2020

Learning Representations from Audio-Visual Spatial Alignment

Pedro Morgado, Yi Li, Nuno Nvasconcelos

Keywords Paper

0

0

0

0

3:21

14/06/2020

Evolving Losses for Unsupervised Video Representation Learning

AJ Piergiovanni, Anelia Angelova, Michael S. Ryoo

Keywords Paper

unsupervised, video, represetnation learning, multi-task, multimodal

0

0

0

0

5:01

14/06/2020

Straight to the Point: Fast-Forwarding Videos via Reinforcement Learning Using Textual Data

Washington Ramos, Michel Silva, Edson Araujo and
Leandro Soriano Marcolino, Erickson Nascimento

Keywords Paper

video fast-forwarding, vision and language, reinforcement learning, multi-modal embedding, hyperlapse, video processing, video acceleration, textual-visual embedding space, reinforce, instructional videos

0

0

0

0

1:01

06/12/2021

CLIP-It! Language-Guided Video Summarization

Medhini Narasimhan, Anna Rohrbach, Trevor Darrell

Keywords Paper

transformers

0

0

0

0

6:14

02/02/2021

SeCo: Exploring Sequence Supervision for Unsupervised Representation Learning

Ting Yao, Yiheng Zhang, Zhaofan Qiu and
Yingwei Pan, Tao Mei

Keywords Paper

0

0

0

0

16:17

22/11/2021

Fine-grained Multi-Modal Self-Supervised Learning

Duo Wang, Salah Karout

Keywords Paper

self-supervised learning, multi-modal learning

0

0

0

0

2:46

07/09/2020

Attention Distillation for Learning Video Representations

Miao Liu, Xin Chen, Yun Zhang and
Yin Li, James Rehg

Keywords Paper

Action Recognition, Deep Learning, Representation Learning

0

0

0

0

9:50

02/02/2021

Augmented Partial Mutual Learning with Frame Masking for Video Captioning

Ke Lin, Zhuoxin Gan, Liwei Wang

Keywords Paper

0

0

0

0

16:57

18/07/2021

Optimization Planning for 3D ConvNets

Zhaofan Qiu, Ting Yao, Chong-Wah Ngo, Tao Mei

Keywords Paper

Applications, Activity and Event Recognition

0

0

0

0

5:13

14/06/2020

Video Playback Rate Perception for Self-Supervised Spatio-Temporal Representation Learning

Yuan Yao, Chang Liu, Dezhao Luo and
Yu Zhou, Qixiang Ye

Keywords Paper

self-supervised spatio-temporal representation learning, multi-temporal resolution characteristic, playback rate perception, motion attention mechanism

0

0

0

0

1:01

22/11/2021

StyleVideoGAN: A Temporal Generative Model using a Pretrained StyleGAN

Gereon Fox, Ayush Tewari, Mohamed Elgharib, Christian Theobalt

Keywords Paper

video generation, StyleGAN, GAN, embedding, faces, hands, cars, RNN

0

0

0

0

8:07

22/11/2021

Deep Video Decaptioning

Pengpeng Chu, Weize Quan, Tong Wang and
Pan Wang, Peiran Ren, Dong-Ming Yan

Keywords Paper

video decaptioning, caption mask extraction, frame attention, real time

0

0

0

0

2:59

05/01/2021

Data-Efficient Alignment of Multimodal Sequences by Aligning Gradient Updates and Internal Feature Distributions

Jianan Wang, Boyang Li, Xiangyu Fan and
Jing Lin, Yanwei Fu

Keywords Paper

0

0

0

0

4:49

05/01/2021

DynaVSR: Dynamic Adaptive Blind Video Super-Resolution

Suyoung Lee, Myungsub Choi, Kyoung Mu Lee

Keywords Paper

0

0

0

0

4:56

14/06/2020

ActionBytes: Learning From Trimmed Videos to Localize Actions

Mihir Jain, Amir Ghodrati, Cees G. M. Snoek

Keywords Paper

action localization, weakly-supervised, self-supervised learning, action proposals, zero-shot, thumos14, activitynet, multithumos, self-training, temporal segmentation

0

0

0

0

1:01

14/06/2020

Cross-Modality Person Re-Identification With Shared-Specific Feature Transfer

Yan Lu, Yue Wu, Bin Liu and
Tianzhu Zhang, Baopu Li, Qi Chu, Nenghai Yu

Keywords Paper

person re-identification, cross modality

0

0

0

0

0:56

02/02/2021

Semantic Grouping Network for Video Captioning

Hobin Ryu, Sunghun Kang, Haeyong Kang, Chang D. Yoo

Keywords Paper

0

0

0

0

17:41

22/11/2021

Stacked Temporal Attention: Improving First-person Action Recognition by Emphasizing Discriminative Clips

Lijin Yang, Yifei Huang, Yusuke Sugano, Yoichi Sato

Keywords Paper

Egocentric action recognition, Action recognition, Temporal attention

0

0

0

0

3:01

22/11/2021

Hierarchical Contrastive Motion Learning for Video Action Recognition

Xitong Yang, Xiaodong Yang, Sifei Liu and
Deqing Sun, Larry Davis, Jan Kautz

Keywords Paper

action recognition, motion hierarchy, motion representation, contrastive learning

0

0

0

0

8:29