CATER: A diagnostic dataset for Compositional Actions & TEmporal Reasoning

Abstract: Computer vision has undergone a dramatic revolution in performance, driven in large part through deep features trained on large-scale supervised datasets. However, much of these improvements have focused on static image analysis; video understanding has seen rather modest improvements. Even though new datasets and spatiotemporal models have been proposed, simple frame-by-frame classification methods often still remain competitive. We posit that current video datasets are plagued with implicit biases over scene and object structure that can dwarf variations in temporal structure. In this work, we build a video dataset with fully observable and controllable object and scene bias, and which truly requires spatiotemporal understanding in order to be solved. Our dataset, named CATER, is rendered synthetically using a library of standard 3D objects, and tests the ability to recognize compositions of object movements that require long-term reasoning. In addition to being a challenging dataset, CATER also provides a plethora of diagnostic tools to analyze modern spatiotemporal video architectures by being completely observable and controllable. Using CATER, we provide insights into some of the most recent state of the art deep video architectures.

CATER: A diagnostic dataset for Compositional Actions & TEmporal Reasoning

Rohit Girdhar, Deva Ramanan

Comments

Similar Papers

AdaCoF: Adaptive Collaboration of Flows for Video Frame Interpolation

Hyeongmin Lee, Taeoh Kim, Tae-young Chung and Daehyun Pak, Yuseok Ban, Sangyoun Lee

Keywords Abstract Paper

video frame interpolation, video temporal super-resolution, frame rate up conversion, frame synthesis, motion estimation, motion compensation, frame warping

Making a Case for 3D Convolutions for Object Segmentation in Videos

Sabarinath Mahadevan, Ali Athar, Aljosa Osep and Laura Leal-Taixé, Bastian Leibe, Sebastian Hennen

Keywords Abstract Paper

object tracking, video segmentation, video object segmentation, video scene understanding, object segmentation

Attention Mechanism Exploits Temporal Contexts: Real-Time 3D Human Pose Reconstruction

Ruixu Liu, Ju Shen, He Wang and Chen Chen, Sen-ching Cheung, Vijayan Asari

Keywords Abstract Paper

3d human pose, attention mechanism, multi-scale dilation convolution, monocular motion reconstruction

Knowing What, Where and When to Look: Video Action modelling with Attention

Juan-Manuel Perez-Rua, Brais Martinez, Xiatian Zhu and Antoine S Toisoul, Victor A Escorcia, Tao Xiang

Keywords Abstract Paper

Action recognition, Fine-grained action, video attention, Spatial attention, Channel attention, Temporal attention, Spatio-temporal attention, Feature refinement

Deep Homography Estimation for Dynamic Scenes

Hoang Le, Feng Liu, Shu Zhang, Aseem Agarwala

Keywords Abstract Paper

homography estimation, dynamic scenes, motion estimation, multi-task learning, deep learning

DeepFaceFlow: In-the-Wild Dense 3D Facial Motion Estimation

Mohammad Rami Koujan, Anastasios Roussos, Stefanos Zafeiriou

Keywords Abstract Paper

3d flow, dense 3d facial motion capture, optical flow, scene flow, 3d reconstruction and tracking, in-the-wild monocular tracking, facial reenactment, expression recognition, performance capture, non-rigid facial deformations

JA-POLS: A Moving-Camera Background Model via Joint Alignment and Partially-Overlapping Local Subspaces

Irit Chelly, Vlad Winter, Dor Litvak and David Rosen, Oren Freifeld

Keywords Abstract Paper

background subtraction, video analysis, computer vision, machine learning, robust pca, deep learning, moving camera, transfer learning, video surveillance, lie groups

Deep 3D Pan via Local adaptive "t-shaped" convolutions with global and local adaptive dilations

Juan Luis Gonzalez Bello, Munchurl Kim

Keywords Abstract Paper

Deep learning, Stereoscopic view synthesis, Monocular depth, Deep 3D Pan

S3CNet: A Sparse Semantic Scene Completion Network for LiDAR Point Clouds

Ran Cheng, Christopher Agia, Yuan Ren and Xinhai Li, Liu Bingbing

Keywords Abstract Paper

X3D: Expanding Architectures for Efficient Video Recognition

Christoph Feichtenhofer

Keywords Abstract Paper

video classification, action recognition, video detection, video understanding, deep learning, neural networks

DynaVSR: Dynamic Adaptive Blind Video Super-Resolution

Suyoung Lee, Myungsub Choi, Kyoung Mu Lee

Keywords Abstract Paper

Unsupervised Learning From Video With Deep Neural Embeddings

Chengxu Zhuang, Tianwei She, Alex Andonian and Max Sobol Mark, Daniel Yamins

Keywords Abstract Paper

unsupervised learning, self-supervised learning, video learning, contrastive learning, deep neural networks, action recognition, object recognition, two-pathway models

CT-Net: Channel Tensorization Network for Video Classification

Kunchang Li, xianhang li, Yali Wang and Jun Wang, Yu Qiao

Keywords Abstract Paper

3D Convolution, Video Classification, Channel Tensorization

Video Region Annotation with Sparse Bounding Boxes

Yuzheng Xu, Yang Wu, Nur Sabrina binti Zuraimi and Shohei Nobuhara, Ko Nishino

Keywords Abstract Paper

video annotation, semi-automatic annotation, graph convolutional network, region boundaries, sparse bounding boxes, automatic boundary finding

MLIFeat: Multi-level information fusion based deep local features

Yuyang Zhang Institute of Automation, Chinese Academy of Sciences, University of Chinese Academy of Sciences and Jinge Wang, Shibiao Xu, Xiao Liu, Xiaopeng Zhang

Keywords Abstract Paper

Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation

Nicklas Hansen, Hao Su, Xiaolong Wang

Keywords Abstract Paper

reinforcement learning and planning, transformers

gradSim: Differentiable simulation for system identification and visuomotor control

Krishna Murthy Jatavallabhula, Miles Macklin, Florian Golemo and Vikram Voleti, Linda Petrini, Martin Weiss, Breandan Considine, Jérôme Parent-Lévesque, Kevin Xie, Kenny Erleben, Liam Paull, Florian Shkurti, Derek Nowrouzezahrai, Sanja Fidler

Keywords Abstract Paper

3D scene understanding, Physical parameter estimation, System identification, Differentiable simulation, Differentiable physics, Differentiable rendering, 3D vision

Neural supersampling for real-time rendering

Lei Xiao, Salah Nouri, Matt Chapman and Alexander Fix, Douglas Lanman, Anton Kaplanyan

Keywords Abstract Paper

virtual reality, rendering, deep learning, superresolution, upsampling

Self-Learning Transformations for Improving Gaze and Head Redirection

Yufeng Zheng, Seonwook Park, Xucong Zhang and Shalini De Mello, Otmar Hilliges

Keywords Abstract Paper

Self-Supervised 4D Spatio-Temporal Feature Learning via Order Prediction of Sequential Point Cloud Clips

Haiyan Wang, Liang Yang, Xuejian Rong and Jinglun Feng, Yingli Tian

Keywords Abstract Paper

Self-Supervised Monocular Trained Depth Estimation Using Self-Attention and Discrete Disparity Volume

Hyeongmin Lee, Taeoh Kim, Tae-young Chung and
Daehyun Pak, Yuseok Ban, Sangyoun Lee

Keywords Paper

Sabarinath Mahadevan, Ali Athar, Aljosa Osep and
Laura Leal-Taixé, Bastian Leibe, Sebastian Hennen

Keywords Paper

Ruixu Liu, Ju Shen, He Wang and
Chen Chen, Sen-ching Cheung, Vijayan Asari

Keywords Paper

Juan-Manuel Perez-Rua, Brais Martinez, Xiatian Zhu and
Antoine S Toisoul, Victor A Escorcia, Tao Xiang

Keywords Paper

Keywords Paper

Keywords Paper

Irit Chelly, Vlad Winter, Dor Litvak and
David Rosen, Oren Freifeld

Keywords Paper

Keywords Paper

Ran Cheng, Christopher Agia, Yuan Ren and
Xinhai Li, Liu Bingbing

Keywords Paper

Keywords Paper

Keywords Paper

Chengxu Zhuang, Tianwei She, Alex Andonian and
Max Sobol Mark, Daniel Yamins

Keywords Paper

Kunchang Li, xianhang li, Yali Wang and
Jun Wang, Yu Qiao

Keywords Paper

Yuzheng Xu, Yang Wu, Nur Sabrina binti Zuraimi and
Shohei Nobuhara, Ko Nishino

Keywords Paper

Yuyang Zhang Institute of Automation, Chinese Academy of Sciences, University of Chinese Academy of Sciences and
Jinge Wang, Shibiao Xu, Xiao Liu, Xiaopeng Zhang

Keywords Paper

Keywords Paper

Krishna Murthy Jatavallabhula, Miles Macklin, Florian Golemo and
Vikram Voleti, Linda Petrini, Martin Weiss, Breandan Considine, Jérôme Parent-Lévesque, Kevin Xie, Kenny Erleben, Liam Paull, Florian Shkurti, Derek Nowrouzezahrai, Sanja Fidler

Keywords Paper

Lei Xiao, Salah Nouri, Matt Chapman and
Alexander Fix, Douglas Lanman, Anton Kaplanyan

Keywords Paper

Yufeng Zheng, Seonwook Park, Xucong Zhang and
Shalini De Mello, Otmar Hilliges

Keywords Paper

Haiyan Wang, Liang Yang, Xuejian Rong and
Jinglun Feng, Yingli Tian

Keywords Paper

Keywords Paper

Alejandro Pardo, Humam Alwassel, Fabian Caba and
Ali Thabet, Bernard Ghanem

Keywords Paper

Bryan Chen, Alexander Sax, Francis Lewis and
Iro Armeni, Silvio Savarese, Amir Zamir, Jitendra Malik, Lerrel Pinto

Keywords Paper

Bowen Pan, Rameswar Panda, Camilo L Fosco and
Chung-Ching Lin, Alex J Andonian, Yue Meng, Kate Saenko, Aude Oliva, Rogerio Feris

Keywords Paper

Yang Fu, Linjie Yang, Ding Liu and
Thomas S. Huang, Humphrey Shi

Keywords Paper

Jiahao Su, Wonmin Byeon, Jean Kossaifi and
Furong Huang, Jan Kautz, Anima Anandkumar

Keywords Paper

Wenhao Wu, Dongliang He, Tianwei Lin and
Fu Li, Chuang Gan, Errui Ding

Keywords Paper

Guansong Pang, Cheng Yan, Chunhua Shen and
Anton van den Hengel, Xiao Bai

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper