We Don't Need Thousand Proposals: Single Shot Actor-Action Detection in Videos

05/01/2021

We Don't Need Thousand Proposals: Single Shot Actor-Action Detection in Videos

Aayush J. Rana, Yogesh S. Rawat

Keywords:

Abstract Paper Similar Papers

Abstract: We propose SSA2D, a simple yet effective end-to-end deep network for actor-action detection in videos. The existing methods take a top-down approach based on region-proposals (RPN), where the action is estimated based on the detected proposals followed by post-processing such as non-maximal suppression. While effective in terms of performance, these methods pose limitations in scalability for dense video scenes with a high memory requirement for thousands of proposals. We propose to solve this problem from a different perspective where we don't need any proposals. SSA2D is a unified network, which performs pixel level joint actor-action detection in a single-shot, where every pixel of the detected actor is assigned an action label. SSA2D has two main advantages: 1) It is a fully convolutional network which does not require any proposals and post-processing making it memory as well as time efficient, 2) It is easily scalable to dense video scenes as its memory requirement is independent of the number of actors present in the scene. We evaluate the proposed method on the Actor-Action dataset (A2D) and Video Object Relation (VidOR) dataset, demonstrating its effectiveness in multiple actors and action detection in a video. SSA2D is 11x faster during inference with comparable (sometimes better) performance and fewer network parameters when compared with the prior works. Code available at https://github.com/aayushjr/ssa2d

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at WACV 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2021

Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation

Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang

Keywords Paper

0

0

0

0

14:41

14/06/2020

Attention Mechanism Exploits Temporal Contexts: Real-Time 3D Human Pose Reconstruction

Ruixu Liu, Ju Shen, He Wang and
Chen Chen, Sen-ching Cheung, Vijayan Asari

Keywords Paper

3d human pose, attention mechanism, multi-scale dilation convolution, monocular motion reconstruction

0

0

0

0

5:01

14/06/2020

Memory Aggregation Networks for Efficient Interactive Video Object Segmentation

Jiaxu Miao, Yunchao Wei, Yi Yang

Keywords Paper

interactive video object segmentation, pixel embedding learning, memory aggregation networks

0

0

0

0

0:59

03/05/2021

VA-RED$^2$: Video Adaptive Redundancy Reduction

Bowen Pan, Rameswar Panda, Camilo L Fosco and
Chung-Ching Lin, Alex J Andonian, Yue Meng, Kate Saenko, Aude Oliva, Rogerio Feris

Keywords Paper

0

0

0

0

5:02

06/12/2021

Dynamic Normalization and Relay for Video Action Recognition

Dongqi Cai, Anbang Yao, Yurong Chen

Keywords Paper

deep learning, representation learning

0

0

0

0

10:42

14/06/2020

TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting

Zhuoqian Yang, Wentao Zhu, Wayne Wu and
Chen Qian, Qiang Zhou, Bolei Zhou, Chen Change Loy

Keywords Paper

motion retargeting, disentanglement, representation learning, video generation

0

0

0

0

1:02

18/07/2021

Is Space-Time Attention All You Need for Video Understanding?

Gedas Bertasius, Heng Wang, Lorenzo Torresani

Keywords Paper

, Algorithms, AutoML, Deep Learning, Architectures

0

0

0

0

5:15

06/12/2021

Reformulating Zero-shot Action Recognition for Multi-label Actions

Alec Kerrigan, Kevin Duarte, Yogesh Rawat, Mubarak Shah

Keywords Paper

machine learning, vision

0

0

0

0

15:01

22/11/2021

Knowing What, Where and When to Look: Video Action modelling with Attention

Juan-Manuel Perez-Rua, Brais Martinez, Xiatian Zhu and
Antoine S Toisoul, Victor A Escorcia, Tao Xiang

Keywords Paper

Action recognition, Fine-grained action, video attention, Spatial attention, Channel attention, Temporal attention, Spatio-temporal attention, Feature refinement

0

0

0

0

2:46

14/06/2020

Listen to Look: Action Recognition by Previewing Audio

Ruohan Gao, Tae-Hyun Oh, Kristen Grauman, Lorenzo Torresani

Keywords Paper

action recognition, audio-visual learning, multi-modal learning, cross-modal learning, video understanding

0

0

0

0

1:01

06/12/2021

CLIP-It! Language-Guided Video Summarization

Medhini Narasimhan, Anna Rohrbach, Trevor Darrell

Keywords Paper

transformers

0

0

0

0

6:14

02/02/2021

MVFNet: Multi-View Fusion Network for Efficient Video Recognition

Wenhao Wu, Dongliang He, Tianwei Lin and
Fu Li, Chuang Gan, Errui Ding

Keywords Paper

0

0

0

0

14:02

14/06/2020

Resolution Adaptive Networks for Efficient Inference

Le Yang, Yizeng Han, Xi Chen and
Shiji Song, Jifeng Dai, Gao Huang

Keywords Paper

adaptive inference, efficient deep learning, multi-scale feature learning, budgeted batch classification

0

0

0

0

0:59

02/02/2021

SMART Frame Selection for Action Recognition

Shreyank N Gowda, Marcus Rohrbach, Laura Sevilla-Lara

Keywords Paper

0

0

0

0

14:10

22/11/2021

Zero-Shot Action Recognition from Diverse Object-Scene Compositions

Carlo Bretti, Pascal Mettes

Keywords Paper

action recognition, zero-shot learning, object-scene compositions

0

0

0

0

2:43

12/07/2020

Striving for simplicity and performance in off-policy DRL: Output Normalization and Non-Uniform Sampling

Che Wang, Yanqiu Wu, Quan Vuong, Keith Ross

Keywords Paper

Reinforcement Learning - Deep RL

0

0

0

0

14:47

22/11/2021

Unsupervised computation of salient motion maps from the interpretation of a frame-based classification network

Etienne Meunier, Patrick Bouthemy

Keywords Paper

Motion saliency, motion segmentation, interpretation neural network, LRP

0

0

0

0

2:44

07/09/2020

Making a Case for 3D Convolutions for Object Segmentation in Videos

Sabarinath Mahadevan, Ali Athar, Aljosa Osep and
Laura Leal-Taixé, Bastian Leibe, Sebastian Hennen

Keywords Paper

object tracking, video segmentation, video object segmentation, video scene understanding, object segmentation

0

0

0

0

8:16

22/11/2021

Temporal Meta-Adaptor for Video Object Detection

Chi Wang, Yang Hua, ZHENG LU and
Jian Gao, Neil Robertson

Keywords Paper

video object detection, temporal aggregation, meta-learning, ImageNet VID

0

0

0

0

6:58

06/12/2021

Channel Permutations for N:M Sparsity

Jeff Pool, Chong Yu

Keywords Paper

optimization

0

0

0

0

12:41

14/06/2020

X3D: Expanding Architectures for Efficient Video Recognition

Christoph Feichtenhofer

Keywords Paper

video classification, action recognition, video detection, video understanding, deep learning, neural networks

0

0

0

0

4:56

07/09/2020

STQ-Nets: Unifying Network Binarization and Structured Pruning

Aurobindo Munagala, Ameya Prabhu, Anoop Namboodiri

Keywords Paper

quantization, binary networks, binarization, pruning, compression, inference

0

0

0

0

5:19

03/05/2021

Denoising Diffusion Implicit Models

Jiaming Song, Chenlin Meng, Stefano Ermon

Keywords Paper

denoising score matching, variational inference, generative models, variational autoencoders

0

0

0

0

5:05

02/02/2021

Weakly Supervised Temporal Action Localization Through Learning Explicit Subspaces for Action and Context

Ziyi Liu, Le Wang, Wei Tang and
Junsong Yuan, Nanning Zheng, Gang Hua

Keywords Paper

0

0

0

0

19:49

06/12/2020

Make One-Shot Video Object Segmentation Efficient Again

Tim Meinhardt, Laura Leal-Taixé

Keywords Paper

0

0

0

0

3:17

14/06/2020

Self-Trained Deep Ordinal Regression for End-to-End Video Anomaly Detection

Guansong Pang, Cheng Yan, Chunhua Shen and
Anton van den Hengel, Xiao Bai

Keywords Paper

anomaly detection, deep ordinal regression, human-in-the-loop machine learning, anomaly explanation, self-training, unsupervised representation learning, abnormal activity detection, video learning

0

0

0

0

1:01

06/12/2021

Space-time Mixing Attention for Video Transformer

Adrian Bulat, Juan Manuel Perez Rua, Swathikiran Sudhakaran and
Brais Martinez, Georgios Tzimiropoulos

Keywords Paper

transformers

0

0

0

0

10:25

06/12/2021

Temporal-attentive Covariance Pooling Networks for Video Recognition

Zilin Gao, Qilong Wang, Bingbing Zhang and
Qinghua Hu, Peihua Li

Keywords Paper

0

0

0

1

8:13

14/06/2020

Unsupervised Learning From Video With Deep Neural Embeddings

Chengxu Zhuang, Tianwei She, Alex Andonian and
Max Sobol Mark, Daniel Yamins

Keywords Paper

unsupervised learning, self-supervised learning, video learning, contrastive learning, deep neural networks, action recognition, object recognition, two-pathway models

0

0

0

0

1:01

26/04/2020

Stochastic Conditional Generative Networks with Basis Decomposition

Ze Wang, Xiuyuan Cheng, Guillermo Sapiro, Qiang Qiu

Keywords Paper

0

0

0

0

4:00

04/07/2020

Neural Data-to-Text Generation via Jointly Learning the Segmentation and Correspondence

Xiaoyu Shen, Ernie Chang, Hui Su and
Cheng Niu, Dietrich Klakow

Keywords Paper

Neural Generation, Segmentation, data-to-text tasks, neural model

0

0

0

0

9:09

30/11/2020

Discovering Multi-Label Actor-Action Association in a Weakly Supervised Setting

Sovan Biswas, Juergen Gall

Keywords Paper

0

0

0

0

10:06

02/02/2021

Explicitly Modeled Attention Maps for Image Classification

Andong Tan, Duc Tam Nguyen, Maximilian Dax and
Matthias Nießner, Thomas Brox

Keywords Paper

0

0

0

0

16:59

06/12/2020

Unsupervised Learning of Visual Features by Contrasting Cluster Assignments

Mathilde Caron, Ishan Misra, Julien Mairal and
Priya Goyal, Piotr Bojanowski, Armand Joulin

Keywords Paper

0

1

0

0

3:22

14/06/2020

Regularization on Spatio-Temporally Smoothed Feature for Action Recognition

Jinhyung Kim, Seunghwan Cha, Dongyoon Wee and
Soonmin Bae, Junmo Kim

Keywords Paper

regularization, action recognition, video classification

0

0

0

0

1:01

06/12/2021

Adversarial Robustness without Adversarial Training: A Teacher-Guided Curriculum Learning Approach

Anindya Sarkar, Anirban Sarkar, Sowrya Gali, Vineeth N Balasubramanian

Keywords Paper

robustness, adversarial robustness and security

0

0

0

0

15:21

26/04/2020

Sign Bits Are All You Need for Black-Box Attacks

Abdullah Al-Dujaili, Una-May O'Reilly

Keywords Paper

Black-box adversarial attack models, Deep Nets, Adversarial Examples, Black-Box Optimization, Zeroth-Order Optimization

0

0

0

0

3:50

22/11/2021

Joint Detection of Motion Boundaries and Occlusions

Hannah H Kim, Shuzhi Yu, Carlo Tomasi

Keywords Paper

motion boundary, occlusion, optical flow, motion estimation

0

0

0

0

3:22

02/02/2021

PGNet: Real-time Arbitrarily-Shaped Text Spotting with Point Gathering Network

Pengfei Wang, Chengquan Zhang, Fei Qi and
Shanshan Liu, Xiaoqiang Zhang, Pengyuan Lyu, Junyu Han, Jingtuo Liu, Errui Ding, Guangming Shi

Keywords Paper

0

0

0

0

18:06

05/01/2021

Real-Time Localized Photorealistic Video Style Transfer

Xide Xia, Tianfan Xue, Wei-Sheng Lai and
Zheng Sun, Abby Chang, Brian Kulis, Jiawen Chen

Keywords Paper

0

0

0

0

5:00