Tripping through time: Efficient Localization of Activities in Videos

07/09/2020

Tripping through time: Efficient Localization of Activities in Videos

Meera Hahn, Asim Kadav, James Rehg, Hans Peter Graf

Keywords: Activity Localization, Reinforcement learning, Vision and Language

Abstract Paper Similar Papers

Abstract: Localizing moments in untrimmed videos via language queries is a new and interesting task that requires the ability to accurately ground language into video. Previous works have approached this task by processing the entire video, often more than once, to localize relevant activities. In the real world applications that this task lends itself to, such as surveillance, efficiency is a pivotal trait of a system. In this paper, we present TripNet, an end-to-end system that uses a gated attention architecture to model fine-grained textual and visual representations in order to align text and video content. Furthermore, TripNet uses reinforcement learning to efficiently localize relevant activity clips in long videos, by learning how to intelligently skip around the video. It extracts visual features for few frames to perform activity classification. In our evaluation over Charades-STA, ActivityNet Captions and the TACoS dataset, we find that TripNet achieves high accuracy and saves process- ing time by only looking at 32-41% of the entire video.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at BMVC 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

03/05/2021

Self-Supervised Learning of Compressed Video Representations

Youngjae Yu, Sangho Lee, Gunhee Kim, Yale Song

Keywords Paper

self-supervised learning, Compressed videos

0

0

0

0

4:34

03/05/2021

VA-RED$^2$: Video Adaptive Redundancy Reduction

Bowen Pan, Rameswar Panda, Camilo L Fosco and
Chung-Ching Lin, Alex J Andonian, Yue Meng, Kate Saenko, Aude Oliva, Rogerio Feris

Keywords Paper

0

0

0

0

5:02

22/11/2021

TEAM-Net: Multi-modal Learning for Video Action Recognition with Partial Decoding

Zhengwei Wang, Qi She, Aljosa Smolic

Keywords Paper

video action recognition, partially decoded video, multi-modal fusion

0

0

0

0

3:24

06/12/2021

CLIP-It! Language-Guided Video Summarization

Medhini Narasimhan, Anna Rohrbach, Trevor Darrell

Keywords Paper

transformers

0

0

0

0

6:14

06/12/2021

Dynamic Normalization and Relay for Video Action Recognition

Dongqi Cai, Anbang Yao, Yurong Chen

Keywords Paper

deep learning, representation learning

0

0

0

0

10:42

14/06/2020

End-to-End Learning of Visual Representations From Uncurated Instructional Videos

Antoine Miech, Jean-Baptiste Alayrac, Lucas Smaira and
Ivan Laptev, Josef Sivic, Andrew Zisserman

Keywords Paper

video, language, representation, self-supervised, instructional, narrated, nce, text, retrieval, contrastive

0

0

0

0

5:00

22/11/2021

Fine-grained Multi-Modal Self-Supervised Learning

Duo Wang, Salah Karout

Keywords Paper

self-supervised learning, multi-modal learning

0

0

0

0

2:46

22/11/2021

Stacked Temporal Attention: Improving First-person Action Recognition by Emphasizing Discriminative Clips

Lijin Yang, Yifei Huang, Yusuke Sugano, Yoichi Sato

Keywords Paper

Egocentric action recognition, Action recognition, Temporal attention

0

0

0

0

3:01

05/01/2021

DORi: Discovering Object Relationships for Moment Localization of a Natural Language Query in a Video

Cristian Rodriguez-Opazo, Edison Marrese-Taylor, Basura Fernando and
Hongdong Li, Stephen Gould

Keywords Paper

0

0

0

0

5:02

02/02/2021

Semantic Grouping Network for Video Captioning

Hobin Ryu, Sunghun Kang, Haeyong Kang, Chang D. Yoo

Keywords Paper

0

0

0

0

17:41

22/11/2021

StyleVideoGAN: A Temporal Generative Model using a Pretrained StyleGAN

Gereon Fox, Ayush Tewari, Mohamed Elgharib, Christian Theobalt

Keywords Paper

video generation, StyleGAN, GAN, embedding, faces, hands, cars, RNN

0

0

0

0

8:07

06/12/2021

MERLOT: Multimodal Neural Script Knowledge Models

Rowan Zellers, Ximing Lu, Jack Hessel and
Youngjae Yu, Jae Sung Park, Jize Cao, Ali Farhadi, Yejin Choi

Keywords Paper

representation learning

0

0

0

0

18:15

05/01/2021

Intro and Recap Detection for Movies and TV Series

Xiang Hao, Kripa Chettiar, Ben Cheung and
Vernon Germano, Raffay Hamid

Keywords Paper

0

0

0

0

5:01

22/11/2021

Temporal Meta-Adaptor for Video Object Detection

Chi Wang, Yang Hua, ZHENG LU and
Jian Gao, Neil Robertson

Keywords Paper

video object detection, temporal aggregation, meta-learning, ImageNet VID

0

0

0

0

6:58

19/08/2021

Learning Implicit Temporal Alignment for Few-shot Video Classification

Songyang Zhang, Jiale Zhou, Xuming He

Keywords Paper

Computer Vision, Action Recognition, Deep Learning

0

0

0

0

6:20

18/11/2020

AARM: Action attention recalibration module for action recognition

Li Zhonghong, Yi Yang, She Ying and
Song Jialun, Wu Yukun

Keywords Paper

0

0

0

0

13:27

02/02/2021

Weakly Supervised Temporal Action Localization Through Learning Explicit Subspaces for Action and Context

Ziyi Liu, Le Wang, Wei Tang and
Junsong Yuan, Nanning Zheng, Gang Hua

Keywords Paper

0

0

0

0

19:49

06/12/2021

Look at What I’m Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos

Reuben Tan, Bryan Plummer, Kate Saenko and
Hailin Jin, Bryan Russell

Keywords Paper

optimization

0

0

0

0

12:28

07/09/2020

Procedure Completion by Learning from Partial Summaries

Ehsan Elhamifar, Zwe Naing

Keywords Paper

procedure learning, instructional videos, summarization, subset selection, representation learning, partial summaries

0

0

0

0

7:34

14/06/2020

Listen to Look: Action Recognition by Previewing Audio

Ruohan Gao, Tae-Hyun Oh, Kristen Grauman, Lorenzo Torresani

Keywords Paper

action recognition, audio-visual learning, multi-modal learning, cross-modal learning, video understanding

0

0

0

0

1:01

14/06/2020

Few-Shot Video Classification via Temporal Alignment

Kaidi Cao, Jingwei Ji, Zhangjie Cao and
Chien-Yi Chang, Juan Carlos Niebles

Keywords Paper

video classification, few-shot learning, action recognition, temporal alignment

0

0

0

0

0:57

14/06/2020

Unsupervised Learning From Video With Deep Neural Embeddings

Chengxu Zhuang, Tianwei She, Alex Andonian and
Max Sobol Mark, Daniel Yamins

Keywords Paper

unsupervised learning, self-supervised learning, video learning, contrastive learning, deep neural networks, action recognition, object recognition, two-pathway models

0

0

0

0

1:01

03/05/2021

Learning Invariant Representations for Reinforcement Learning without Reconstruction

Amy Zhang, Rowan T McAllister, Roberto Calandra and
Yarin Gal, Sergey Levine

Keywords Paper

state abstractions, bisimulation metrics, rich observations, representation learning

0

0

0

0

14:36

14/06/2020

TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting

Zhuoqian Yang, Wentao Zhu, Wayne Wu and
Chen Qian, Qiang Zhou, Bolei Zhou, Chen Change Loy

Keywords Paper

motion retargeting, disentanglement, representation learning, video generation

0

0

0

0

1:02

14/06/2020

ActionBytes: Learning From Trimmed Videos to Localize Actions

Mihir Jain, Amir Ghodrati, Cees G. M. Snoek

Keywords Paper

action localization, weakly-supervised, self-supervised learning, action proposals, zero-shot, thumos14, activitynet, multithumos, self-training, temporal segmentation

0

0

0

0

1:01

18/07/2021

Learning Transferable Visual Models From Natural Language Supervision

Alec Radford, Jong Wook Kim, Chris Hallacy and
Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever

Keywords Paper

Algorithms, Multitask, Transfer, and Meta Learning

0

0

0

0

19:40

22/11/2021

Spatial-Temporal Residual Aggregation for High Resolution Video Inpainting

Vishnu Sanjay Ramiya Srinivasan, Rui Ma, Qiang Tang and
Zili Yi, Zhan Xu

Keywords Paper

high resolution video inpainting, spatial-temporal aggregation, residual aggregation, spatial-temporal attention, image alignment

0

0

0

0

2:58

06/12/2020

Self-Supervised MultiModal Versatile Networks

Jean-Baptiste Alayrac, Adria Recasens, Rosalia Schneider and
Relja Arandjelović, Jason Ramapuram, Jeffrey De Fauw, Lucas Smaira, Sander Dieleman, Andrew Zisserman

Keywords Paper

1

0

0

0

3:25

14/06/2020

An End-to-End Edge Aggregation Network for Moving Object Segmentation

Prashant W. Patil, Kuldeep M. Biradar, Akshay Dudhane, Subrahmanyam Murala

Keywords Paper

edge extraction mechanism, bridge network, training-testing configurations, moving object segmentation

0

0

0

0

1:00

14/06/2020

Straight to the Point: Fast-Forwarding Videos via Reinforcement Learning Using Textual Data

Washington Ramos, Michel Silva, Edson Araujo and
Leandro Soriano Marcolino, Erickson Nascimento

Keywords Paper

video fast-forwarding, vision and language, reinforcement learning, multi-modal embedding, hyperlapse, video processing, video acceleration, textual-visual embedding space, reinforce, instructional videos

0

0

0

0

1:01

22/11/2021

TNT: Text-Conditioned Network with Transductive Inference for Few-Shot Video Classification

Andrés Villa, Juan-Manuel Perez-Rua, Vladimir Araujo and
Juan Carlos Niebles, Victor A Escorcia, Alvaro Soto

Keywords Paper

Few-Shot Learning, Adaptive Network, Multimodal Information, Action Classification, Transductive Classification

0

0

0

0

3:00

06/12/2021

Exploring Cross-Video and Cross-Modality Signals for Weakly-Supervised Audio-Visual Video Parsing

Yan-Bo Lin, Hung-Yu Tseng, Hsin-Ying Lee and
Yen-Yu Lin, Ming-Hsuan Yang

Keywords Paper

0

0

0

0

14:06

14/06/2020

Action Genome: Actions As Compositions of Spatio-Temporal Scene Graphs

Jingwei Ji, Ranjay Krishna, Li Fei-Fei, Juan Carlos Niebles

Keywords Paper

action recognition, scene graph, video understanding, relationships, composition, action, activity, video

0

0

0

0

1:01

06/12/2020

SMYRF - Efficient Attention using Asymmetric Clustering

Giannis Daras, Nikita Kitaev, Augustus Odena, Alex Dimakis

Keywords Paper

0

0

0

0

3:28

02/02/2021

RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning

Peihao Chen, Deng Huang, Dongliang He and
Xiang Long, Runhao Zeng, Shilei Wen, Mingkui Tan, Chuang Gan

Keywords Paper

0

0

0

0

14:14

02/02/2021

SMART Frame Selection for Action Recognition

Shreyank N Gowda, Marcus Rohrbach, Laura Sevilla-Lara

Keywords Paper

0

0

0

0

14:10

06/12/2021

Compressed Video Contrastive Learning

Yuqi Huo, Mingyu Ding, Haoyu Lu and
Nanyi Fei, Zhiwu Lu, Ji-Rong Wen, Ping Luo

Keywords Paper

self-supervised learning, contrastive learning, representation learning

0

0

0

0

9:07

14/06/2020

AdaCoF: Adaptive Collaboration of Flows for Video Frame Interpolation

Hyeongmin Lee, Taeoh Kim, Tae-young Chung and
Daehyun Pak, Yuseok Ban, Sangyoun Lee

Keywords Paper

video frame interpolation, video temporal super-resolution, frame rate up conversion, frame synthesis, motion estimation, motion compensation, frame warping

0

0

0

0

1:01

14/06/2020

Non-Adversarial Video Synthesis With Learned Priors

Abhishek Aich, Akash Gupta, Rameswar Panda and
Rakib Hyder, M. Salman Asif, Amit K. Roy-Chowdhury

Keywords Paper

video synthesis, non-adversarial learning, generative network, latent space, triplet condition, latent space

0

0

0

0

0:58

06/12/2021

Align before Fuse: Vision and Language Representation Learning with Momentum Distillation

Junnan Li, Ramprasaath Selvaraju, Akhilesh Gotmare and
Shafiq Joty, Caiming Xiong, Steven Chu Hong Hoi

Keywords Paper

transformers, vision, representation learning

0

0

0

0

9:40