Spatio-Temporal Graph for Video Captioning With Knowledge Distillation

14/06/2020

Spatio-Temporal Graph for Video Captioning With Knowledge Distillation

Boxiao Pan, Haoye Cai, De-An Huang, Kuan-Hui Lee, Adrien Gaidon, Ehsan Adeli, Juan Carlos Niebles

Keywords: video captioning, spatio-temporal graph, video understanding, vision and language, knowledge distillation, transformer, computer vision.

Abstract Paper Similar Papers

Abstract: Video captioning is a challenging task that requires a deep understanding of visual scenes. State-of-the-art methods generate captions using either scene-level or object-level information but without explicitly modeling object interactions. Thus, they often fail to make visually grounded predictions, and are sensitive to spurious correlations. In this paper, we propose a novel spatio-temporal graph model for video captioning that exploits object interactions in space and time. Our model builds interpretable links and is able to provide explicit visual grounding. To avoid unstable performance caused by the variable number of objects, we further propose an object-aware knowledge distillation mechanism, in which local object information is used to regularize global scene features. We demonstrate the efficacy of our approach through extensive experiments on two benchmarks, showing our approach yields competitive performance with interpretable predictions.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at CVPR 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

05/01/2021

Exploration of Spatial and Temporal Modeling Alternatives for HOI

Rishabh Dabral, Srijon Sarkar, Sai Praneeth Reddy, Ganesh Ramakrishnan

Keywords Paper

0

0

0

0

4:48

03/05/2021

gradSim: Differentiable simulation for system identification and visuomotor control

Krishna Murthy Jatavallabhula, Miles Macklin, Florian Golemo and
Vikram Voleti, Linda Petrini, Martin Weiss, Breandan Considine, Jérôme Parent-Lévesque, Kevin Xie, Kenny Erleben, Liam Paull, Florian Shkurti, Derek Nowrouzezahrai, Sanja Fidler

Keywords Paper

3D scene understanding, Physical parameter estimation, System identification, Differentiable simulation, Differentiable physics, Differentiable rendering, 3D vision

0

0

0

0

5:01

05/01/2021

Multi-Frame Recurrent Adversarial Network for Moving Object Segmentation

Prashant W. Patil, Akshay Dudhane, Subrahmanyam Murala

Keywords Paper

0

0

0

0

5:00

06/12/2020

Learning Physical Graph Representations from Visual Scenes

Daniel Bear, Chaofei Fan, Damian Mrowca and
Yunzhu Li, Seth Alter, Aran Nayebi, Jeremy Schwartz, Li Fei-Fei, Jiajun Wu, Josh Tenenbaum, Daniel Yamins

Keywords Paper

0

0

0

0

3:19

06/12/2021

Knowledge-inspired 3D Scene Graph Prediction in Point Cloud

Shoulong Zhang, shuai li, Aimin Hao, Hong Qin

Keywords Paper

deep learning, graph learning

0

0

0

0

11:20

22/11/2021

Planar Shape Based Registration for Multi-modal Geometry

Muxingzi Li, Florent Lafarge

Keywords Paper

global registration, energy minimization, geometric primitives, point cloud, polygonal mesh

0

0

0

0

3:00

14/06/2020

Learning Visual Motion Segmentation Using Event Surfaces

Anton Mitrokhin, Zhiyuan Hua, Cornelia Fermüller, Yiannis Aloimonos

Keywords Paper

event camera, dynamic vision sensors, graph convolutional network, motion segmentation, surface normals, 3d point cloud segmentation, semantic segmentation, real time segmentation, event cloud processing, asynchronous signal

0

0

0

0

1:01

02/02/2021

Anticipating Future Relations via Graph Growing for Action Prediction

Xinxiao Wu, Jianwei Zhao, Ruiqi Wang

Keywords Paper

0

0

0

0

14:44

30/11/2020

MLIFeat: Multi-level information fusion based deep local features

Yuyang Zhang Institute of Automation, Chinese Academy of Sciences, University of Chinese Academy of Sciences and
Jinge Wang, Shibiao Xu, Xiao Liu, Xiaopeng Zhang

Keywords Paper

0

0

0

0

5:28

02/02/2021

Object-Centric Image Generation from Layouts

Tristan Sylvain, Pengchuan Zhang, Yoshua Bengio and
R Devon Hjelm, Shikhar Sharma

Keywords Paper

0

0

0

0

17:44

14/06/2020

Joint Spatial-Temporal Optimization for Stereo 3D Object Tracking

Peiliang Li, Jieqi Shi, Shaojie Shen

Keywords Paper

3d object tracking, stereo cameras, autonomous driving

0

0

0

0

1:01

14/06/2020

Self-Supervised Monocular Scene Flow Estimation

Junhwa Hur, Stefan Roth

Keywords Paper

monocular scene flow, self-supervised learning, 3d scene flow, optical flow, monocular depth estimation

0

0

0

0

5:00

14/06/2020

Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation

Hao Tang, Dan Xu, Yan Yan and
Philip H.S. Torr, Nicu Sebe

Keywords Paper

generative adversarial networks, local, global, semantic guided, scene generation, semantic image synthesis, cross-view image generation, class-specific feature representation, attention fusion

0

0

0

0

1:00

05/01/2021

Towards Visually Explaining Video Understanding Networks With Perturbation

Zhenqiang Li, Weimin Wang, Zuoyue Li and
Yifei Huang, Yoichi Sato

Keywords Paper

0

0

0

0

4:53

05/01/2021

Only Time Can Tell: Discovering Temporal Data for Temporal Modeling

Laura Sevilla-Lara, Shengxin Zha, Zhicheng Yan and
Vedanuj Goswami, Matt Feiszli, Lorenzo Torresani

Keywords Paper

0

0

0

0

4:14

30/11/2020

Dynamic Depth Fusion and Transformation for Monocular 3D Object Detection

Erli Ouyang, Li Zhang, Mohan Chen and
Anurag Arnab, Yanwei Fu

Keywords Paper

0

0

0

0

6:30

06/12/2020

Learning Object-Centric Representations of Multi-Object Scenes from Multiple Views

Nanbo Li, Cian Eastwood, Robert Fisher

Keywords Paper

0

0

0

0

3:19

02/02/2021

Optical Flow Estimation from a Single Motion-blurred Image

Dawit Mureja Argaw, Junsik Kim, Francois Rameau and
Jae Won Cho, In So Kweon

Keywords Paper

0

0

0

0

14:12

22/11/2021

Wide and Narrow: Video Prediction from Context and Motion

Jaehoon Cho, Jiyoung Lee, Changjae Oh and
Wonil Song, Kwanghoon Sohn

Keywords Paper

video prediction, local filter memory networks, adaptive filter kernels, global context propagation networks, non-local neighboring representations

0

0

0

0

2:50

26/04/2020

Neural Outlier Rejection for Self-Supervised Keypoint Learning

Jiexiong Tang, Hanme Kim, Vitor Guizilini and
Sudeep Pillai, Rares Ambrus

Keywords Paper

Self-Supervised Learning, Keypoint Detection, Outlier Rejection, Deep Learning

0

0

0

0

4:55

26/04/2020

SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attention and Decomposition

Zhixuan Lin, Yi-Fu Wu, Skand Vishwanath Peri and
Weihao Sun, Gautam Singh, Fei Deng, Jindong Jiang, Sungjin Ahn

Keywords Paper

Generative models, Unsupervised scene representation, Object-oriented representation, spatial attention

0

0

0

0

4:55

06/12/2021

Neural Production Systems

Anirudh Goyal ALIAS PARTH GOYAL, Aniket Didolkar, Nan Rosemary Ke and
Charles Blundell, Philippe Beaudoin, Nicolas Heess, Michael Mozer, Yoshua Bengio

Keywords Paper

deep learning, graph learning

0

0

0

0

12:47

07/09/2020

Attribute-Guided Image Generation from Layout

Ke Ma, Bo Zhao, Leonid Sigal

Keywords Paper

conditional image generation, GAN

0

0

0

0

9:41

06/12/2020

Targeted Adversarial Perturbations for Monocular Depth Prediction

Alex Wong, Safa Cicek, Stefano Soatto

Keywords Paper

0

0

0

0

3:21

06/12/2020

Self-Learning Transformations for Improving Gaze and Head Redirection

Yufeng Zheng, Seonwook Park, Xucong Zhang and
Shalini De Mello, Otmar Hilliges

Keywords Paper

0

0

0

0

3:20

06/12/2021

Object-Centric Representation Learning with Generative Spatial-Temporal Factorization

Nanbo Li, Muhammad Ahmed Raza, Wenbin Hu and
Zhaole Sun, Robert Fisher

Keywords Paper

vision, generative model, representation learning

0

0

0

0

12:25

02/02/2021

Exploiting Relationship for Complex-scene Image Generation

Tianyu Hua, Hongdong Zheng, Yalong Bai and
Wei Zhang, Xiao-Ping Zhang, Tao Mei

Keywords Paper

0

0

0

0

15:01

14/06/2020

Instance-Aware Image Colorization

Jheng-Wei Su, Hung-Kuo Chu, Jia-Bin Huang

Keywords Paper

colorization, instance-aware, deep learning, computer vision

0

0

0

0

1:01

07/09/2020

Mixup-CAM: Weakly-supervised Semantic Segmentation via Uncertainty Regularization

Yu-Ting Chang, Qiaosong Wang, Wei-Chih Hung and
Robinson Piramuthu, Yi-Hsuan Tsai, Ming-Hsuan Yang

Keywords Paper

semantic segmentation, weakly-supervised learning, class activatin map, mixup augmentation, entropy regularization

0

0

0

0

8:22

22/11/2021

Duplicate Latent Representation Suppression for Multi-object Variational Autoencoders

Li Nanbo, Robert B Fisher

Keywords Paper

object-centric representation learning, variational autoencoders, scene representation

0

0

0

0

2:58

02/02/2021

Temporal Relational Modeling with Self-Supervision for Action Segmentation

Dong Wang, Di Hu, Xingjian Li, Dejing Dou

Keywords Paper

0

0

0

0

17:10

05/01/2021

Regional Attention Networks With Context-Aware Fusion for Group Emotion Recognition

Ahmed Shehab Khan, Zhiyuan Li, Jie Cai, Yan Tong

Keywords Paper

0

0

0

0

5:00

26/04/2020

Contrastive Learning of Structured World Models

Thomas Kipf, Elise van der Pol, Max Welling

Keywords Paper

state representation learning, graph neural networks, model-based reinforcement learning, relational learning, object discovery

0

0

0

0

14:51

30/11/2020

SDP-Net: Scene Flow Based Real-time Object Detection and Prediction from Sequential 3D Point Clouds

Yi Zhang, Yuwen Ye, Zhiyu Xiang, Jiaqi Gu

Keywords Paper

0

0

0

0

9:45

14/06/2020

Progressive Mirror Detection

Jiaying Lin, Guodong Wang, Rynson W.H. Lau

Keywords Paper

low-level and physics-based vision, recognition (detection, categorization)

0

0

0

0

1:02

26/04/2020

Physics-as-Inverse-Graphics: Unsupervised Physical Parameter Estimation from Video

Miguel Jaques, Michael Burke, Timothy Hospedales

Keywords Paper

0

0

0

0

4:41

05/12/2020

Are scene graphs good enough to improve image captioning?

Victor Siemen Janusz Milewski, Marie-Francine Moens, Iacer Calixto

Keywords Paper

0

0

0

0

13:20

05/01/2021

Supervoxel Attention Graphs for Long-Range Video Modeling

Yang Wang, Gedas Bertasius, Tae-Hyun Oh and
Abhinav Gupta, Minh Hoai, Lorenzo Torresani

Keywords Paper

0

0

0

0

2:01

05/01/2021

Triangle-Net: Towards Robustness in Point Cloud Learning

Chenxi Xiao, Juan Wachs

Keywords Paper

0

0

0

0

4:58

14/06/2020

Associate-3Ddet: Perceptual-to-Conceptual Association for 3D Point Cloud Object Detection

Liang Du, Xiaoqing Ye, Xiao Tan and
Jianfeng Feng, Zhenbo Xu, Errui Ding, Shilei Wen

Keywords Paper

3d object detection, domain adaptation, associative recognition, lidar, point cloud, convolutional neural network, autonomous driving

0

0

0

0

1:01