AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures

26/04/2020

AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures

Michael S. Ryoo, AJ Piergiovanni, Mingxing Tan, Anelia Angelova

Keywords: video representation learning, video understanding, activity recognition, neural architecture search

Abstract Paper Similar Papers

Abstract: Learning to represent videos is a very challenging task both algorithmically and computationally. Standard video CNN architectures have been designed by directly extending architectures devised for image understanding to include the time dimension, using modules such as 3D convolutions, or by using two-stream design to capture both appearance and motion in videos. We interpret a video CNN as a collection of multi-stream convolutional blocks connected to each other, and propose the approach of automatically finding neural architectures with better connectivity and spatio-temporal interactions for video understanding. This is done by evolving a population of overly-connected architectures guided by connection weight learning. Architectures combining representations that abstract different input types (i.e., RGB and optical flow) at multiple temporal resolutions are searched for, allowing different types or sources of information to interact with each other. Our method, referred to as AssembleNet, outperforms prior approaches on public video datasets, in some cases by a great margin. We obtain 58.6% mAP on Charades and 34.27% accuracy on Moments-in-Time.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICLR 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2021

Dynamic Normalization and Relay for Video Action Recognition

Dongqi Cai, Anbang Yao, Yurong Chen

Keywords Paper

deep learning, representation learning

0

0

0

0

10:42

14/06/2020

AdaCoF: Adaptive Collaboration of Flows for Video Frame Interpolation

Hyeongmin Lee, Taeoh Kim, Tae-young Chung and
Daehyun Pak, Yuseok Ban, Sangyoun Lee

Keywords Paper

video frame interpolation, video temporal super-resolution, frame rate up conversion, frame synthesis, motion estimation, motion compensation, frame warping

0

0

0

0

1:01

05/01/2021

High-Quality Frame Interpolation via Tridirectional Inference

Jinsoo Choi, Jaesik Park, In So Kweon

Keywords Paper

0

0

0

0

4:08

06/12/2021

Relational Self-Attention: What's Missing in Attention for Video Understanding

Manjin Kim, Heeseung Kwon, CHUNYU WANG and
Suha Kwak, Minsu Cho

Keywords Paper

deep learning, transformers

0

0

0

0

13:31

14/06/2020

Deep Optics for Single-Shot High-Dynamic-Range Imaging

Christopher A. Metzler, Hayato Ikoma, Yifan Peng, Gordon Wetzstein

Keywords Paper

high-dynamic-range imaging, point-spread-function engineering, end-to-end learning, computational imaging, deep learning, optics, photography

0

0

0

0

5:01

05/04/2021

IOS: Inter-Operator Scheduler for CNN Acceleration

Yaoyao Ding, Ligeng Zhu, Zhihao Jia and
Gennady Pekhimenko, Song Han

Keywords Paper

0

0

0

0

18:27

05/04/2021

IOS: Inter-Operator Scheduler for CNN Acceleration

Yaoyao Ding, Ligeng Zhu, Zhihao Jia and
Gennady Pekhimenko, Song Han

Keywords Paper

0

0

0

0

4:44

14/06/2020

Cascaded Deep Video Deblurring Using Temporal Sharpness Prior

Jinshan Pan, Haoran Bai, Jinhui Tang

Keywords Paper

video deblurring, deep convolutional neural network, motion blur estimation, optical flow, temporal sharpness prior, image restoration

0

0

0

0

0:53

05/04/2021

sensAI: ConvNets Decomposition via Class Parallelism for Fast Inference on Live Data

Guanhua Wang, Zhuang Liu, Brandon Hsieh and
Siyuan Zhuang, Joseph Gonzalez, Trevor Darrell, Ion Stoica

Keywords Paper

0

0

0

0

5:23

05/04/2021

sensAI: ConvNets Decomposition via Class Parallelism for Fast Inference on Live Data

Guanhua Wang, Zhuang Liu, Brandon Hsieh and
Siyuan Zhuang, Joseph Gonzalez, Trevor Darrell, Ion Stoica

Keywords Paper

0

0

0

0

21:08

30/11/2020

Video-Based Crowd Counting Using a Multi-Scale Optical Flow Pyramid Network

Mohammad Asiful Hossain, Kevin Cannons, Daesik Jang and
Fabio Cuzzolin, Zhan Xu

Keywords Paper

0

0

0

0

9:54

22/11/2021

Dynamic Graph Warping Transformer for Video Alignment

Junyan Wang, Yang Long, Maurice Pagnucco, Yang Song

Keywords Paper

Video alignment, Transformer, Graph Neural Network

0

0

0

0

2:45

12/07/2020

VideoOneNet: Bidirectional Convolutional Recurrent OneNet with Trainable Data Steps for Video Processing

Zoltán Milacski, Barnabás Póczos, Andras Lorincz

Keywords Paper

Sequential, Network, and Time-Series Modeling

0

0

0

0

14:17

26/04/2020

On the Relationship between Self-Attention and Convolutional Layers

Jean-Baptiste Cordonnier, Andreas Loukas, Martin Jaggi

Keywords Paper

self-attention, attention, transformers, convolution, CNN, image, expressivity, capacity

0

0

0

0

5:18

14/06/2020

Online Depth Learning Against Forgetting in Monocular Videos

Zhenyu Zhang, Stéphane Lathuilière, Elisa Ricci and
Nicu Sebe, Yan Yan, Jian Yang

Keywords Paper

depth estimation, online adaptation, domain adaptation, meta-learning, online learning

0

0

0

0

0:59

02/02/2021

CompFeat: Comprehensive Feature Aggregation for Video Instance Segmentation

Yang Fu, Linjie Yang, Ding Liu and
Thomas S. Huang, Humphrey Shi

Keywords Paper

0

0

0

0

16:24

06/12/2021

MLP-Mixer: An all-MLP Architecture for Vision

Ilya Tolstikhin, Neil Houlsby, Alexander Kolesnikov and
Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jessica Yung, Andreas Steiner, Daniel Keysers, Jakob Uszkoreit, Mario Lucic, Alexey Dosovitskiy

Keywords Paper

deep learning, machine learning, transformers, vision, transfer learning

0

0

0

0

11:18

14/06/2020

Temporally Distributed Networks for Fast Video Semantic Segmentation

Ping Hu, Fabian Caba, Oliver Wang and
Zhe Lin, Stan Sclaroff, Federico Perazzi

Keywords Paper

video semantic segmentation, semantic segmentation, low-latency video processing, temporally distributed computation, attention propagation, grouped knowledge distillation

0

0

0

0

1:00

05/01/2021

Adaptive Streaming of 360-Degree Videos With Reinforcement Learning

Sohee Park, Minh Hoai, Arani Bhattacharya, Samir R. Das

Keywords Paper

0

0

0

0

4:51

19/10/2020

Deep adaptive feature aggregation in multi-task convolutional neural networks

Zhen Shen, Chaoran Cui, Jin Huang and
Jian Zong, Meng Chen, Yilong Yin

Keywords Paper

convolutional neural networks, multi-task learning, adaptive feature aggregation

0

0

0

0

6:36

14/06/2020

Learning Video Stabilization Using Optical Flow

Jiyang Yu, Ravi Ramamoorthi

Keywords Paper

video stabilization, optical flow, deep learning, video processing, computational photography

0

0

0

0

1:01

22/11/2021

Knowing What, Where and When to Look: Video Action modelling with Attention

Juan-Manuel Perez-Rua, Brais Martinez, Xiatian Zhu and
Antoine S Toisoul, Victor A Escorcia, Tao Xiang

Keywords Paper

Action recognition, Fine-grained action, video attention, Spatial attention, Channel attention, Temporal attention, Spatio-temporal attention, Feature refinement

0

0

0

0

2:46

05/01/2021

Temporal Context Aggregation for Video Retrieval With Contrastive Learning

Jie Shao, Xin Wen, Bingchen Zhao, Xiangyang Xue

Keywords Paper

0

0

0

0

4:50

14/06/2020

Unsupervised Learning From Video With Deep Neural Embeddings

Chengxu Zhuang, Tianwei She, Alex Andonian and
Max Sobol Mark, Daniel Yamins

Keywords Paper

unsupervised learning, self-supervised learning, video learning, contrastive learning, deep neural networks, action recognition, object recognition, two-pathway models

0

0

0

0

1:01

05/01/2021

VideoSSL: Semi-Supervised Learning for Video Classification

Longlong Jing, Toufiq Parag, Zhe Wu and
Yingli Tian, Hongcheng Wang

Keywords Paper

0

0

0

0

4:56

03/05/2021

Self-Supervised Learning of Compressed Video Representations

Youngjae Yu, Sangho Lee, Gunhee Kim, Yale Song

Keywords Paper

self-supervised learning, Compressed videos

0

0

0

0

4:34

07/09/2020

Making a Case for 3D Convolutions for Object Segmentation in Videos

Sabarinath Mahadevan, Ali Athar, Aljosa Osep and
Laura Leal-Taixé, Bastian Leibe, Sebastian Hennen

Keywords Paper

object tracking, video segmentation, video object segmentation, video scene understanding, object segmentation

0

0

0

0

8:16

03/05/2021

Attentional Constellation Nets for Few-Shot Learning

Weijian Xu, Yifan Xu, Huaijin Wang, Zhuowen Tu

Keywords Paper

few-shot learning, constellation models

0

0

0

0

5:10

22/11/2021

GTA: Global Temporal Attention for Video Action Understanding

Bo He, Xitong Yang, Zuxuan Wu and
Hao Chen, Ser-Nam Lim, Abhinav Shrivastava

Keywords Paper

action recognition, self-attention, temporal modeling

0

0

0

0

2:55

02/02/2021

MAMBA: Multi-level Aggregation via Memory Bank for Video Object Detection

Guanxiong Sun, Yang Hua, Guosheng Hu, Neil Robertson

Keywords Paper

0

0

0

0

16:48

14/06/2020

Video Instance Segmentation Tracking With a Modified VAE Architecture

Chung-Ching Lin, Ying Hung, Rogerio Feris, Linglin He

Keywords Paper

video instance segmentation, video object tracking, variational autoencoder, vae, gaussian process, multi-task learning

0

0

0

0

1:01

30/11/2020

Interpreting Video Features: A Comparison of 3D Convolutional Networks and Convolutional LSTM Networks

Joonatan Mänttäri, Sofia Broomé, John Folkesson, Hedvig Kjellström

Keywords Paper

0

0

0

0

9:52

07/09/2020

High-speed Light-weight CNN Inference via Strided Convolutions on a Pixel Processor Array

Yanan Liu, Laurie Bose, Jianing Chen and
Stephen Carey, Piotr Dudek, Walterio Mayol-Cuevas

Keywords Paper

Binary CNN, CNN on embedded system, Pixel Processor Array, SCAMP, high-speed CNN, Light-weight CNN

0

0

0

0

8:06

03/05/2021

VA-RED$^2$: Video Adaptive Redundancy Reduction

Bowen Pan, Rameswar Panda, Camilo L Fosco and
Chung-Ching Lin, Alex J Andonian, Yue Meng, Kate Saenko, Aude Oliva, Rogerio Feris

Keywords Paper

0

0

0

0

5:02

05/01/2021

Joint Visual-Temporal Embedding for Unsupervised Learning of Actions in Untrimmed Sequences

Rosaura G. VidalMata, Walter J. Scheirer, Anna Kukleva and
David Cox, Hilde Kuehne

Keywords Paper

0

0

0

0

4:59

02/02/2021

Classifying Sequences of Extreme Length with Constant Memory Applied to Malware Detection

Edward Raff, William Fleshman, Richard Zak and
Hyrum S. Anderson, Bobby Filar, Mark McLean

Keywords Paper

0

0

0

0

19:55

14/06/2020

Video Modeling With Correlation Networks

Heng Wang, Du Tran, Lorenzo Torresani, Matt Feiszli

Keywords Paper

action recognition, video classification, motion, correlation, temporal information, kinetics, something-something.

0

0

0

0

1:05

06/07/2020

Locating Cephalometric X-Ray Landmarks with Foveated Pyramid Attention

Logan Gilmour, Nilanjan Ray

Keywords Paper

0

0

0

0

14:56

14/06/2020

SCATTER: Selective Context Attentional Scene Text Recognizer

Ron Litman, Oron Anschel, Shahar Tsiper and
Roee Litman, Shai Mazor, R. Manmatha

Keywords Paper

text recognition, scene text, irregular text, stacked decoders, repetitive processing, intermediate supervision, attention decoder, two-step attention, sequence modeling, deep lstm

0

0

0

0

0:55

05/01/2021

Intro and Recap Detection for Movies and TV Series

Xiang Hao, Kripa Chettiar, Ben Cheung and
Vernon Germano, Raffay Hamid

Keywords Paper

0

0

0

0

5:01