Listen to Look: Action Recognition by Previewing Audio

14/06/2020

Listen to Look: Action Recognition by Previewing Audio

Ruohan Gao, Tae-Hyun Oh, Kristen Grauman, Lorenzo Torresani

Keywords: action recognition, audio-visual learning, multi-modal learning, cross-modal learning, video understanding

Abstract Paper Similar Papers

Abstract: In the face of the video data deluge, today's expensive clip-level classifiers are increasingly impractical. We propose a framework for efficient action recognition in untrimmed video that uses audio as a preview mechanism to eliminate both short-term and long-term visual redundancies. First, we devise an ImgAud2Vid framework that hallucinates clip-level features by distilling from lighter modalities---a single frame and its accompanying audio---reducing short-term temporal redundancy for efficient clip-level recognition. Second, building on ImgAud2Vid, we further propose ImgAud-Skimming, an attention-based long short-term memory network that iteratively selects useful moments in untrimmed videos, reducing long-term temporal redundancy for efficient video-level recognition. Extensive experiments on four action recognition datasets demonstrate that our method achieves the state-of-the-art in terms of both recognition accuracy and speed.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at CVPR 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

02/02/2021

SMART Frame Selection for Action Recognition

Shreyank N Gowda, Marcus Rohrbach, Laura Sevilla-Lara

Keywords Paper

0

0

0

0

14:10

03/05/2021

Self-Supervised Learning of Compressed Video Representations

Youngjae Yu, Sangho Lee, Gunhee Kim, Yale Song

Keywords Paper

self-supervised learning, Compressed videos

0

0

0

0

4:34

22/11/2021

Knowing What, Where and When to Look: Video Action modelling with Attention

Juan-Manuel Perez-Rua, Brais Martinez, Xiatian Zhu and
Antoine S Toisoul, Victor A Escorcia, Tao Xiang

Keywords Paper

Action recognition, Fine-grained action, video attention, Spatial attention, Channel attention, Temporal attention, Spatio-temporal attention, Feature refinement

0

0

0

0

2:46

03/05/2021

VA-RED$^2$: Video Adaptive Redundancy Reduction

Bowen Pan, Rameswar Panda, Camilo L Fosco and
Chung-Ching Lin, Alex J Andonian, Yue Meng, Kate Saenko, Aude Oliva, Rogerio Feris

Keywords Paper

0

0

0

0

5:02

05/01/2021

A Variational Information Bottleneck Based Method to Compress Sequential Networks for Human Action Recognition

Ayush Srivastava, Oshin Dutta, Jigyasa Gupta and
Sumeet Agarwal, Prathosh AP

Keywords Paper

0

0

0

0

4:29

06/12/2021

Low-Fidelity Video Encoder Optimization for Temporal Action Localization

Mengmeng Xu, Juan Manuel Perez Rua, Xiatian Zhu and
Bernard Ghanem, Brais Martinez

Keywords Paper

optimization, machine learning, transfer learning

0

0

0

0

14:34

18/07/2021

Is Space-Time Attention All You Need for Video Understanding?

Gedas Bertasius, Heng Wang, Lorenzo Torresani

Keywords Paper

, Algorithms, AutoML, Deep Learning, Architectures

0

0

0

0

5:15

14/06/2020

Scale-Space Flow for End-to-End Optimized Video Compression

Eirikur Agustsson, David Minnen, Nick Johnston and
Johannes Ballé, Sung Jin Hwang, George Toderici

Keywords Paper

learned video compression, scale-space flow, bilinear warping

0

0

0

0

0:55

06/12/2021

Compressed Video Contrastive Learning

Yuqi Huo, Mingyu Ding, Haoyu Lu and
Nanyi Fei, Zhiwu Lu, Ji-Rong Wen, Ping Luo

Keywords Paper

self-supervised learning, contrastive learning, representation learning

0

0

0

0

9:07

06/12/2021

Dynamic Normalization and Relay for Video Action Recognition

Dongqi Cai, Anbang Yao, Yurong Chen

Keywords Paper

deep learning, representation learning

0

0

0

0

10:42

14/06/2020

Attention Mechanism Exploits Temporal Contexts: Real-Time 3D Human Pose Reconstruction

Ruixu Liu, Ju Shen, He Wang and
Chen Chen, Sen-ching Cheung, Vijayan Asari

Keywords Paper

3d human pose, attention mechanism, multi-scale dilation convolution, monocular motion reconstruction

0

0

0

0

5:01

14/06/2020

Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution

Xiaoyu Xiang, Yapeng Tian, Yulun Zhang and
Yun Fu, Jan P. Allebach, Chenliang Xu

Keywords Paper

space-time video super-resolution, high-resolution, slow motion, one-stage, fast and accurate, feature temporal interpolation, deformable convlstm, temporal alignment, temporal aggregation, video restoration

0

0

0

0

1:00

06/12/2021

Exploring Cross-Video and Cross-Modality Signals for Weakly-Supervised Audio-Visual Video Parsing

Yan-Bo Lin, Hung-Yu Tseng, Hsin-Ying Lee and
Yen-Yu Lin, Ming-Hsuan Yang

Keywords Paper

0

0

0

0

14:06

22/11/2021

Stacked Temporal Attention: Improving First-person Action Recognition by Emphasizing Discriminative Clips

Lijin Yang, Yifei Huang, Yusuke Sugano, Yoichi Sato

Keywords Paper

Egocentric action recognition, Action recognition, Temporal attention

0

0

0

0

3:01

06/12/2021

Improved Transformer for High-Resolution GANs

Long Zhao, Zizhao Zhang, Ting Chen and
Dimitris Metaxas, Han Zhang

Keywords Paper

transformers, generative model

0

0

0

0

12:11

14/06/2020

Unsupervised Learning From Video With Deep Neural Embeddings

Chengxu Zhuang, Tianwei She, Alex Andonian and
Max Sobol Mark, Daniel Yamins

Keywords Paper

unsupervised learning, self-supervised learning, video learning, contrastive learning, deep neural networks, action recognition, object recognition, two-pathway models

0

0

0

0

1:01

06/12/2021

Temporal-attentive Covariance Pooling Networks for Video Recognition

Zilin Gao, Qilong Wang, Bingbing Zhang and
Qinghua Hu, Peihua Li

Keywords Paper

0

0

0

1

8:13

22/11/2021

ERA: Entity–relationship Aware Video Summarization with Wasserstein GAN

Guande Wu, Jianzhe Peter Lin, Claudio Silva

Keywords Paper

video summarization, spatio-temporal graph neural network

0

0

0

0

2:59

02/02/2021

Semantic Grouping Network for Video Captioning

Hobin Ryu, Sunghun Kang, Haeyong Kang, Chang D. Yoo

Keywords Paper

0

0

0

0

17:41

22/11/2021

Spatial-Temporal Residual Aggregation for High Resolution Video Inpainting

Vishnu Sanjay Ramiya Srinivasan, Rui Ma, Qiang Tang and
Zili Yi, Zhan Xu

Keywords Paper

high resolution video inpainting, spatial-temporal aggregation, residual aggregation, spatial-temporal attention, image alignment

0

0

0

0

2:58

02/02/2021

MAMBA: Multi-level Aggregation via Memory Bank for Video Object Detection

Guanxiong Sun, Yang Hua, Guosheng Hu, Neil Robertson

Keywords Paper

0

0

0

0

16:48

06/12/2020

Convolutional Tensor-Train LSTM for Spatio-Temporal Learning

Jiahao Su, Wonmin Byeon, Jean Kossaifi and
Furong Huang, Jan Kautz, Anima Anandkumar

Keywords Paper

0

0

0

0

3:29

22/11/2021

Fine-grained Multi-Modal Self-Supervised Learning

Duo Wang, Salah Karout

Keywords Paper

self-supervised learning, multi-modal learning

0

0

0

0

2:46

06/12/2021

CLIP-It! Language-Guided Video Summarization

Medhini Narasimhan, Anna Rohrbach, Trevor Darrell

Keywords Paper

transformers

0

0

0

0

6:14

05/01/2021

PDAN: Pyramid Dilated Attention Network for Action Detection

Rui Dai, Srijan Das, Luca Minciullo and
Lorenzo Garattoni, Gianpiero Francesca, Francois Bremond

Keywords Paper

0

0

0

0

5:00

14/06/2020

Learning Event-Based Motion Deblurring

Zhe Jiang, Yu Zhang, Dongqing Zou and
Jimmy Ren, Jiancheng Lv, Yebin Liu

Keywords Paper

deblur, event camera, video reconstruction, image restoration, low-level vision, neural networks, adversarial training, adaptive sampling, supervised learning, dynamic vision sensor

0

0

0

0

1:01

05/01/2021

Splatty- a Unified Image Demosaicing and Rectification Method

Pranav Verma, Dominique E. Meyer, Hanyang Xu, Falko Kuester

Keywords Paper

0

0

0

0

4:43

03/05/2021

AdaFuse: Adaptive Temporal Fusion Network for Efficient Action Recognition

Yue Meng, Rameswar Panda, Chung-Ching Lin and
Prasanna Sattigeri, Leonid Karlinsky, Kate Saenko, Aude Oliva, Rogerio Feris

Keywords Paper

0

0

0

0

4:46

16/11/2020

Multistage Fusion with Forget Gate for Multimodal Summarization in Open-Domain Videos

Nayu Liu, Xian Sun, Hongfeng Yu and
Wenkai Zhang, Guangluan Xu

Keywords Paper

multimodal summarization, multimodal tasks, multiencoder-decoder frameworks, multistage network

0

0

0

0

11:24

30/11/2020

Lossless Image Compression Using a Multi-Scale Progressive Statistical Model

Honglei Zhang, Francesco Cricri, Hamed R. Tavakoli and
Nannan Zou, Emre Aksu, Miska M. Hannuksela

Keywords Paper

0

0

0

0

9:33

05/01/2021

DynaVSR: Dynamic Adaptive Blind Video Super-Resolution

Suyoung Lee, Myungsub Choi, Kyoung Mu Lee

Keywords Paper

0

0

0

0

4:56

05/01/2021

Unsupervised Multimodal Video-to-Video Translation via Self-Supervised Learning

Kangning Liu, Shuhang Gu, Andres Romero, Radu Timofte

Keywords Paper

0

0

0

0

5:00

22/11/2021

Faster-FCoViAR: Faster Frequency-Domain Compressed Video Action Recognition

Lu Xiong, Xia Jia, Yue Ming and
Jiangwan Zhou, Fan Feng, Nan nan Hu

Keywords Paper

action recognition, frequency-domain, compressed videos, teacher-student network

0

0

0

0

3:00

02/02/2021

RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning

Peihao Chen, Deng Huang, Dongliang He and
Xiang Long, Runhao Zeng, Shilei Wen, Mingkui Tan, Chuang Gan

Keywords Paper

0

0

0

0

14:14

06/12/2021

VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text

Hassan Akbari, Liangzhe Yuan, Rui Qian and
Wei-Hong Chuang, Shih-Fu Chang, Yin Cui, Boqing Gong

Keywords Paper

machine learning, self-supervised learning, transformers, vision, contrastive learning

0

0

0

0

15:59

14/06/2020

Context-Aware and Scale-Insensitive Temporal Repetition Counting

Huaidong Zhang, Xuemiao Xu, Guoqiang Han, Shengfeng He

Keywords Paper

repetition counting, computer vision, deep learning, regression network, video processing

0

0

0

0

1:01

22/11/2021

Temporal Meta-Adaptor for Video Object Detection

Chi Wang, Yang Hua, ZHENG LU and
Jian Gao, Neil Robertson

Keywords Paper

video object detection, temporal aggregation, meta-learning, ImageNet VID

0

0

0

0

6:58

14/06/2020

Memory Enhanced Global-Local Aggregation for Video Object Detection

Yihong Chen, Yue Cao, Han Hu, Liwei Wang

Keywords Paper

video object detection, video analysis, object detection, memory, global-local aggregation

0

0

0

0

1:00

03/05/2021

Cross-Attentional Audio-Visual Fusion for Weakly-Supervised Action Localization

Juntae Lee, Mihir Jain, Hyoungwoo Park, Sungrack Yun

Keywords Paper

Action localization, Multimodal Attention, Audio-Visual, Weak-supervision, Event localization

0

0

0

0

5:11

14/06/2020

TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting

Zhuoqian Yang, Wentao Zhu, Wayne Wu and
Chen Qian, Qiang Zhou, Bolei Zhou, Chen Change Loy

Keywords Paper

motion retargeting, disentanglement, representation learning, video generation

0

0

0

0

1:02