Gate-Shift Networks for Video Action Recognition

14/06/2020

Gate-Shift Networks for Video Action Recognition

Swathikiran Sudhakaran, Sergio Escalera, Oswald Lanz

Keywords: action recognition, video representation learning, spatio-temporal interactions, video classification

Abstract Paper Similar Papers

Abstract: Deep 3D CNNs for video action recognition are designed to learn powerful representations in the joint spatio-temporal feature space. In practice however, because of the large number of parameters and computations involved, they may under-perform in the lack of sufficiently large datasets for training them at scale. In this paper we introduce spatial gating in spatial-temporal decomposition of 3D kernels. We implement this concept with Gate-Shift Module (GSM). GSM is lightweight and turns a 2D-CNN into a highly efficient spatio-temporal feature extractor. With GSM plugged in, a 2D-CNN learns to adaptively route features through time and combine them, at almost no additional parameters and computational overhead. We perform an extensive evaluation of the proposed module to study its effectiveness in video action recognition, achieving state-of-the-art results on Something Something-V1 and Diving48 datasets, and obtaining competitive results on EPIC-Kitchens with far less model complexity.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at CVPR 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

05/01/2021

Temporal Stochastic Softmax for 3D CNNs: An Application in Facial Expression Recognition

Theo Ayral, Marco Pedersoli, Simon Bacon, Eric Granger

Keywords Paper

0

0

0

0

5:00

06/12/2021

Dynamic Normalization and Relay for Video Action Recognition

Dongqi Cai, Anbang Yao, Yurong Chen

Keywords Paper

deep learning, representation learning

0

0

0

0

10:42

07/09/2020

Few-Shot Learning with Complex-valued Neural Networks

Zhen Liu, Baochang Zhang, Guodong Guo

Keywords Paper

few-shot learning, complex-valued network, metric-learning, image classification

0

0

0

0

7:15

07/09/2020

Making a Case for 3D Convolutions for Object Segmentation in Videos

Sabarinath Mahadevan, Ali Athar, Aljosa Osep and
Laura Leal-Taixé, Bastian Leibe, Sebastian Hennen

Keywords Paper

object tracking, video segmentation, video object segmentation, video scene understanding, object segmentation

0

0

0

0

8:16

26/04/2020

Efficient and Information-Preserving Future Frame Prediction and Beyond

Wei Yu, Yichao Lu, Steve Easterbrook, Sanja Fidler

Keywords Paper

self-supervised learning, generative pre-training, video prediction, reversible architecture

0

0

0

0

4:18

14/06/2020

Cascaded Deep Video Deblurring Using Temporal Sharpness Prior

Jinshan Pan, Haoran Bai, Jinhui Tang

Keywords Paper

video deblurring, deep convolutional neural network, motion blur estimation, optical flow, temporal sharpness prior, image restoration

0

0

0

0

0:53

07/09/2020

A Simple and Scalable Shape Representation for 3D Reconstruction.

Mateusz Michalkiewicz, Eugene Belilovsky, Mahsa Baktashmotlagh, Anders Eriksson

Keywords Paper

shape from x, 3d reconstruction from a single image, implicit shape representation, deep level sets

0

0

0

0

9:57

14/06/2020

Discrete Model Compression With Resource Constraint for Deep Neural Networks

Shangqian Gao, Feihu Huang, Jian Pei, Heng Huang

Keywords Paper

covutional neural networks, model compression, channel pruning, discrete optimization

0

0

0

0

1:01

14/06/2020

Improving Convolutional Networks With Self-Calibrated Convolutions

Jiang-Jiang Liu, Qibin Hou, Ming-Ming Cheng and
Changhu Wang, Jiashi Feng

Keywords Paper

self-calibrated, feature transformation, image classification, network architecture, convolutional neural networks

0

0

0

0

1:00

06/12/2021

Efficient Training of Visual Transformers with Small Datasets

Yahui Liu, Enver Sangineto, Wei Bi and
Nicu Sebe, Bruno Lepri, Marco Nadai

Keywords Paper

robustness, transformers, vision

0

0

0

0

8:23

14/06/2020

Adaptive Fractional Dilated Convolution Network for Image Aesthetics Assessment

Qiuyu Chen, Wei Zhang, Ning Zhou and
Peng Lei, Yi Xu, Yu Zheng, Jianping Fan

Keywords Paper

image aesthetics assessment, kernel embedding, adaptive convolution, parameter-free, aspect ratio

0

0

0

0

1:01

14/06/2020

Temporally Distributed Networks for Fast Video Semantic Segmentation

Ping Hu, Fabian Caba, Oliver Wang and
Zhe Lin, Stan Sclaroff, Federico Perazzi

Keywords Paper

video semantic segmentation, semantic segmentation, low-latency video processing, temporally distributed computation, attention propagation, grouped knowledge distillation

0

0

0

0

1:00

02/02/2021

Near Lossless Transfer Learning for Spiking Neural Networks

Zhanglu Yan, Jun Zhou, Weng-Fai Wong

Keywords Paper

0

0

0

0

16:34

14/06/2020

Factorized Higher-Order CNNs With an Application to Spatio-Temporal Emotion Estimation

Jean Kossaifi, Antoine Toisoul, Adrian Bulat and
Yannis Panagakis, Timothy M. Hospedales, Maja Pantic

Keywords Paper

tensor methods, deep learning, spatiotemporal, emotion, cnn, tensor decomposition, low-rank, valence, arousal

0

0

0

0

1:01

14/06/2020

Deep Facial Non-Rigid Multi-View Stereo

Ziqian Bai, Zhaopeng Cui, Jamal Ahmed Rahim and
Xiaoming Liu, Ping Tan

Keywords Paper

multi-view 3d, non-rigid reconstruction, face reconstruction, deep learning

0

0

0

0

1:01

02/02/2021

MVFNet: Multi-View Fusion Network for Efficient Video Recognition

Wenhao Wu, Dongliang He, Tianwei Lin and
Fu Li, Chuang Gan, Errui Ding

Keywords Paper

0

0

0

0

14:02

14/06/2020

Self-Supervised Monocular Trained Depth Estimation Using Self-Attention and Discrete Disparity Volume

Adrian Johnston, Gustavo Carneiro

Keywords Paper

self-supervised depth estimation, self-supervised learning, self-attention, depth estimation, uncertainty

0

0

0

0

1:01

14/06/2020

Self-Supervised Monocular Scene Flow Estimation

Junhwa Hur, Stefan Roth

Keywords Paper

monocular scene flow, self-supervised learning, 3d scene flow, optical flow, monocular depth estimation

0

0

0

0

5:00

03/05/2021

Domain Generalization with MixStyle

Kaiyang Zhou, Yongxin Yang, Yu Qiao, Tao Xiang

Keywords Paper

Style Mixing, Domain Generalization

0

0

0

0

4:28

14/06/2020

Deep Optics for Single-Shot High-Dynamic-Range Imaging

Christopher A. Metzler, Hayato Ikoma, Yifan Peng, Gordon Wetzstein

Keywords Paper

high-dynamic-range imaging, point-spread-function engineering, end-to-end learning, computational imaging, deep learning, optics, photography

0

0

0

0

5:01

06/12/2021

A self consistent theory of Gaussian Processes captures feature learning effects in finite CNNs

Gadi Naveh, Zohar Ringel

Keywords Paper

theory, deep learning, optimization, kernel methods

0

0

0

0

9:13

03/05/2021

Attentional Constellation Nets for Few-Shot Learning

Weijian Xu, Yifan Xu, Huaijin Wang, Zhuowen Tu

Keywords Paper

few-shot learning, constellation models

0

0

0

0

5:10

14/06/2020

Attention Mechanism Exploits Temporal Contexts: Real-Time 3D Human Pose Reconstruction

Ruixu Liu, Ju Shen, He Wang and
Chen Chen, Sen-ching Cheung, Vijayan Asari

Keywords Paper

3d human pose, attention mechanism, multi-scale dilation convolution, monocular motion reconstruction

0

0

0

0

5:01

06/12/2021

Neural Routing by Memory

Kaipeng Zhang, Zhenqiang Li, Zhifeng Li and
Wei Liu, Yoichi Sato

Keywords Paper

deep learning

0

0

0

0

6:41

14/06/2020

A Unified Object Motion and Affinity Model for Online Multi-Object Tracking

Junbo Yin, Wenguan Wang, Qinghao Meng and
Ruigang Yang, Jianbing Shen

Keywords Paper

mot, multi-task learning, motion, affinity, attention, online

0

0

0

0

1:03

06/12/2021

Container: Context Aggregation Networks

peng gao, Jiasen Lu, hongsheng Li and
Roozbeh Mottaghi, Aniruddha Kembhavi

Keywords Paper

deep learning, self-supervised learning, transformers, vision, language

0

0

0

0

8:50

06/12/2021

Generalized Depthwise-Separable Convolutions for Adversarially Robust and Efficient Neural Networks

Hassan Dbouk, Naresh Shanbhag

Keywords Paper

deep learning, robustness, adversarial robustness and security

0

0

0

0

15:01

22/11/2021

MVT: Multi-view Vision Transformer for 3D Object Recognition

Shuo Chen, Tan Yu, Ping Li

Keywords Paper

3D object recognition, Transformer-based methods

0

0

0

0

2:51

02/02/2021

DenserNet: Weakly Supervised Visual Localization Using Multi-Scale Feature Aggregation

Dongfang Liu, Yiming Cui, Liqi Yan and
Christos Mousas, Baijian Yang, Yingjie Chen

Keywords Paper

0

0

0

0

16:15

14/06/2020

Gated Channel Transformation for Visual Recognition

Zongxin Yang, Linchao Zhu, Yu Wu, Yi Yang

Keywords Paper

visual recognition, normalization methods, attention mechanisms

0

0

0

0

1:01

06/12/2021

MLP-Mixer: An all-MLP Architecture for Vision

Ilya Tolstikhin, Neil Houlsby, Alexander Kolesnikov and
Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jessica Yung, Andreas Steiner, Daniel Keysers, Jakob Uszkoreit, Mario Lucic, Alexey Dosovitskiy

Keywords Paper

deep learning, machine learning, transformers, vision, transfer learning

0

0

0

0

11:18

05/01/2021

Phase-Wise Parameter Aggregation for Improving SGD Optimization

Takumi Kobayashi

Keywords Paper

0

0

0

0

4:36

22/11/2021

GhostShiftAddNet: More Features from Energy-Efficient Operations

Jia Bi, Jonathon Hare, Geoff V Merrett

Keywords Paper

Efficient convolutional neural network, embedded platform, feature redundancy, image classifier.

0

0

0

0

3:37

22/11/2021

Contextual Convolution Blocks

David Marwood, Shumeet Baluja

Keywords Paper

spatially selective features, convolutional layer, cc-block, self-attention, se-block, squeeze and excitation, excitation map

0

0

0

0

2:45

02/02/2021

Learning Comprehensive Motion Representation for Action Recognition

Mingyu Wu, Boyuan Jiang, Donghao Luo and
Junchi Yan, Yabiao Wang, Ying Tai, Chengjie Wang, Jilin Li, Feiyue Huang, Xiaokang Yang

Keywords Paper

0

0

0

0

15:15

05/01/2021

OverNet: Lightweight Multi-Scale Super-Resolution With Overscaling Network

Parichehr Behjati, Pau Rodriguez, Armin Mehri and
Isabelle Hupont, Carles Fernandez Tena, Jordi Gonzalez

Keywords Paper

0

0

0

0

4:24

05/01/2021

3D Human Pose and Shape Estimation Through Collaborative Learning and Multi-View Model-Fitting

Zhongguo Li, Magnus Oskarsson, Anders Heyden

Keywords Paper

0

0

0

0

5:13

14/06/2020

ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks

Qilong Wang, Banggu Wu, Pengfei Zhu and
Peihua Li, Wangmeng Zuo, Qinghua Hu

Keywords Paper

channel attention, efficient, adaptive 1d convolution, deep cnns, image classifcation, object detection, instance segmentation

0

0

0

0

0:57

06/12/2021

Learning Compact Representations of Neural Networks using DiscriminAtive Masking (DAM)

Jie Bu, Arka Daw, M. Maruf, Anuj Karpatne

Keywords Paper

deep learning, machine learning, vision, graph learning, representation learning

0

0

0

0

13:59

22/11/2021

Separable Batch Normalization for Robust Facial Landmark Localization

Shuangping Jin, Zhenhua Feng, Wankou Yang, Josef Kittler

Keywords Paper

face alignment, batch normalization, dynamic network

0

0

0

0

3:00