ActionBytes: Learning From Trimmed Videos to Localize Actions

14/06/2020

ActionBytes: Learning From Trimmed Videos to Localize Actions

Mihir Jain, Amir Ghodrati, Cees G. M. Snoek

Keywords: action localization, weakly-supervised, self-supervised learning, action proposals, zero-shot, thumos14, activitynet, multithumos, self-training, temporal segmentation

Abstract Paper Similar Papers

Abstract: This paper tackles the problem of localizing actions in long untrimmed videos. Different from existing works, which all use annotated untrimmed videos during training, we learn only from short trimmed videos. This enables learning from large-scale datasets originally designed for action classification. We propose a method to train an action localization network that segments a video into interpretable fragments, we call ActionBytes. Our method jointly learns to cluster ActionBytes and trains the localization network using the cluster assignments as pseudo-labels. By doing so, we train on short trimmed videos that become untrimmed for ActionBytes. In isolation, or when merged, the ActionBytes also serve as effective action proposals. Experiments demonstrate that our boundary-guided training generalizes to unknown action classes and localizes actions in long videos of Thumos14, MultiThumos, and ActivityNet1.2. Furthermore, we show the advantage of ActionBytes for zero-shot localization as well as traditional weakly supervised localization, that train on long videos, to achieve state-of-the-art results.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at CVPR 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

22/11/2021

Few-Shot Temporal Action Localization with Query Adaptive Transformer

Sauradip Nag, Xiatian Zhu, Tao Xiang

Keywords Paper

temporal action localization, few shot learning, transformer, class imbalance, meta learning, action detection

0

0

0

0

2:56

14/06/2020

Set-Constrained Viterbi for Set-Supervised Action Segmentation

Jun Li, Sinisa Todorovic

Keywords Paper

weakly supervised learning, action segmentation, set-constrained viterbi

0

0

0

0

1:01

14/06/2020

Searching for Actions on the Hyperbole

Teng Long, Pascal Mettes, Heng Tao Shen, Cees G. M. Snoek

Keywords Paper

video retrieval, hyperbolic learning, hierarchical, zero-shot learning, action recognition, hyperbolic geometry

0

0

0

0

1:00

02/02/2021

Weakly Supervised Temporal Action Localization Through Learning Explicit Subspaces for Action and Context

Ziyi Liu, Le Wang, Wei Tang and
Junsong Yuan, Nanning Zheng, Gang Hua

Keywords Paper

0

0

0

0

19:49

14/06/2020

SCT: Set Constrained Temporal Transformer for Set Supervised Action Segmentation

Mohsen Fayyaz, Jürgen Gall

Keywords Paper

action segmentation, action recognition, weakly supervised, set

0

0

0

0

1:01

22/11/2021

Temporal Alignment via Event Boundary for Few-shot Action Recongnition

Shuyuan Li, Huabin Liu, Mengjuan Fei and
Xiaoyuan Yu, Weiyao Lin

Keywords Paper

few-shot action recognition, temporal alignment, event boundary

0

0

0

0

2:32

02/02/2021

A Hybrid Attention Mechanism for Weakly-Supervised Temporal Action Localization

Ashraful Islam, Chengjiang Long, Richard Radke

Keywords Paper

0

0

0

0

16:53

14/06/2020

ZSTAD: Zero-Shot Temporal Activity Detection

Lingling Zhang, Xiaojun Chang, Jun Liu and
Minnan Luo, Sen Wang, Zongyuan Ge, Alexander Hauptmann

Keywords Paper

zero-shot learning, temporal activity detetction, r-c3d, super class

0

0

0

0

1:01

06/12/2021

Self-Supervised Multi-Object Tracking with Cross-input Consistency

Favyen Bastani, Songtao He, Samuel Madden

Keywords Paper

self-supervised learning

0

0

0

0

14:59

05/01/2021

Action Duration Prediction for Segment-Level Alignment of Weakly-Labeled Videos

Reza Ghoddoosian, Saif Sayed, Vassilis Athitsos

Keywords Paper

0

0

0

0

5:00

06/12/2020

Hierarchical Patch VAE-GAN: Generating Diverse Videos from a Single Sample

Shir Gur, Sagie Benaim, Lior Wolf

Keywords Paper

0

0

0

0

3:20

06/12/2021

CLIP-It! Language-Guided Video Summarization

Medhini Narasimhan, Anna Rohrbach, Trevor Darrell

Keywords Paper

transformers

0

0

0

0

6:14

14/06/2020

Video Playback Rate Perception for Self-Supervised Spatio-Temporal Representation Learning

Yuan Yao, Chang Liu, Dezhao Luo and
Yu Zhou, Qixiang Ye

Keywords Paper

self-supervised spatio-temporal representation learning, multi-temporal resolution characteristic, playback rate perception, motion attention mechanism

0

0

0

0

1:01

02/02/2021

ACSNet: Action-Context Separation Network for Weakly Supervised Temporal Action Localization

Ziyi Liu, Le Wang, Qilin Zhang and
Wei Tang, Junsong Yuan, Nanning Zheng, Gang Hua

Keywords Paper

0

0

0

0

18:34

19/08/2021

Self-Supervised Video Action Localization with Adversarial Temporal Transforms

Guoqiang Gong, Liangfeng Zheng, Wenhao Jiang, Yadong Mu

Keywords Paper

Computer Vision, Action Recognition, Video

0

0

0

0

14:39

14/06/2020

METAL: Minimum Effort Temporal Activity Localization in Untrimmed Videos

Da Zhang, Xiyang Dai, Yuan-Fang Wang

Keywords Paper

temporal activity localization, few-shot learning, weakly supervised learning, video understanding, multi-scale relation network, temporal feature pyramid, action recognition, 3d convolutional network

0

0

0

0

5:00

14/06/2020

Few-Shot Video Classification via Temporal Alignment

Kaidi Cao, Jingwei Ji, Zhangjie Cao and
Chien-Yi Chang, Juan Carlos Niebles

Keywords Paper

video classification, few-shot learning, action recognition, temporal alignment

0

0

0

0

0:57

19/08/2021

Learning Implicit Temporal Alignment for Few-shot Video Classification

Songyang Zhang, Jiale Zhou, Xuming He

Keywords Paper

Computer Vision, Action Recognition, Deep Learning

0

0

0

0

6:20

14/06/2020

Action Modifiers: Learning From Adverbs in Instructional Videos

Hazel Doughty, Ivan Laptev, Walterio Mayol-Cuevas, Dima Damen

Keywords Paper

vision and language, video understanding, action recognition, action retrieval, instructional videos, weakly-supervised videos, action and behaviour, attributes, attention, adverbs

0

0

0

0

1:01

14/06/2020

Syntax-Aware Action Targeting for Video Captioning

Qi Zheng, Chaoyue Wang, Dacheng Tao

Keywords Paper

video and language, video captioning, action predicting

0

0

0

0

1:01

07/09/2020

A Better Use of Audio-Visual Cues: Dense Video Captioning with Bi-modal Transformer

Vladimir Iashin, Esa Rahtu

Keywords Paper

Dense Video Captioning, Temporal Action Proposal Generation, Bi-modal Transformer, Audio-visual Training, Cross-modal, Multi-modal, ActivityNet Captions

0

0

0

0

9:11

14/06/2020

Straight to the Point: Fast-Forwarding Videos via Reinforcement Learning Using Textual Data

Washington Ramos, Michel Silva, Edson Araujo and
Leandro Soriano Marcolino, Erickson Nascimento

Keywords Paper

video fast-forwarding, vision and language, reinforcement learning, multi-modal embedding, hyperlapse, video processing, video acceleration, textual-visual embedding space, reinforce, instructional videos

0

0

0

0

1:01

05/01/2021

Data-Efficient Alignment of Multimodal Sequences by Aligning Gradient Updates and Internal Feature Distributions

Jianan Wang, Boyang Li, Xiangyu Fan and
Jing Lin, Yanwei Fu

Keywords Paper

0

0

0

0

4:49

17/08/2020

Unpaired motion style transfer from video to animation

Kfir Aberman, Yijia Weng, Dani Lischinski and
Daniel Cohen-Or, Baoquan Chen

Keywords Paper

style transfer, motion analysis

0

0

0

0

16:08

14/06/2020

Learning a Weakly-Supervised Video Actor-Action Segmentation Model With a Wise Selection

Jie Chen, Zhiheng Li, Jiebo Luo, Chenliang Xu

Keywords Paper

video object segmentation, video actor action segmentation, weakly-supervised learning, action recognition, non-reference metric, attention map, self-supervised learning, video understanding, action localization, pseudo-annotation

0

0

0

0

5:00

07/09/2020

Procedure Completion by Learning from Partial Summaries

Ehsan Elhamifar, Zwe Naing

Keywords Paper

procedure learning, instructional videos, summarization, subset selection, representation learning, partial summaries

0

0

0

0

7:34

18/07/2021

Unsupervised Co-part Segmentation through Assembly

Qingzhe Gao, Bin Wang, Libin Liu, Baoquan Chen

Keywords Paper

Applications, Computer Vision

0

0

0

0

5:01

22/11/2021

Few-shot Action Recognition with Prototype-centered Attentive Learning

Xiatian Zhu, Antoine S Toisoul, Juan-Manuel Perez-Rua and
Li Zhang, Brais Martinez, Tao Xiang

Keywords Paper

Few-shot learning, Video recognition, Action classification, Small training data, Model pre-training, Meta-learning, Transformer, Self-attention learning, Cross-attention learning, Prototype learning, Prototype-centered learning, Hybrid-attention learning

0

0

0

0

2:22

22/11/2021

Deep Video Decaptioning

Pengpeng Chu, Weize Quan, Tong Wang and
Pan Wang, Peiran Ren, Dong-Ming Yan

Keywords Paper

video decaptioning, caption mask extraction, frame attention, real time

0

0

0

0

2:59

14/06/2020

Rethinking Zero-Shot Video Classification: End-to-End Training for Realistic Applications

Biagio Brattoli, Joseph Tighe, Fedor Zhdanov and
Pietro Perona, Krzysztof Chalupka

Keywords Paper

zero-shot learning, video classification, end-to-end, word2vec, visual to semantic, limited supervision, r3d, kinetics, sun, ucf101

0

0

0

0

1:01

05/01/2021

Set Augmented Triplet Loss for Video Person Re-Identification

Pengfei Fang, Pan Ji, Lars Petersson, Mehrtash Harandi

Keywords Paper

0

0

0

0

4:56

22/11/2021

StyleVideoGAN: A Temporal Generative Model using a Pretrained StyleGAN

Gereon Fox, Ayush Tewari, Mohamed Elgharib, Christian Theobalt

Keywords Paper

video generation, StyleGAN, GAN, embedding, faces, hands, cars, RNN

0

0

0

0

8:07

18/11/2020

AARM: Action attention recalibration module for action recognition

Li Zhonghong, Yi Yang, She Ying and
Song Jialun, Wu Yukun

Keywords Paper

0

0

0

0

13:27

14/06/2020

Non-Adversarial Video Synthesis With Learned Priors

Abhishek Aich, Akash Gupta, Rameswar Panda and
Rakib Hyder, M. Salman Asif, Amit K. Roy-Chowdhury

Keywords Paper

video synthesis, non-adversarial learning, generative network, latent space, triplet condition, latent space

0

0

0

0

0:58

02/02/2021

RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning

Peihao Chen, Deng Huang, Dongliang He and
Xiang Long, Runhao Zeng, Shilei Wen, Mingkui Tan, Chuang Gan

Keywords Paper

0

0

0

0

14:14

05/01/2021

Towards Contextual Learning in Few-Shot Object Classification

Mathieu Page Fortin, Brahim Chaib-draa

Keywords Paper

0

0

0

0

4:57

30/11/2020

Semi-supervised Facial Action Unit Intensity Estimation with Contrastive Learning

Enrique Sanchez, Adrian Bulat, Anestis Zaganidis, Georgios Tzimiropoulos

Keywords Paper

0

0

0

0

8:47

18/07/2021

Compositional Video Synthesis with Action Graphs

Amir Bar, Roei Herzig, Xiaolong Wang and
Anna Rohrbach, Gal Chechik, Prof. Darrell, Amir Globerson

Keywords Paper

Applications, Computer Vision

0

0

0

0

4:55

05/01/2021

How to Make a BLT Sandwich? Learning VQA Towards Understanding Web Instructional Videos

Shaojie Wang, Wentian Zhao, Ziyi Kou and
Jing Shi, Chenliang Xu

Keywords Paper

0

0

0

0

4:33

18/07/2021

Optimization Planning for 3D ConvNets

Zhaofan Qiu, Ting Yao, Chong-Wah Ngo, Tao Mei

Keywords Paper

Applications, Activity and Event Recognition

0

0

0

0

5:13