Video Object Grounding Using Semantic Roles in Language Description

14/06/2020

Video Object Grounding Using Semantic Roles in Language Description

Arka Sadhu, Kan Chen, Ram Nevatia

Keywords: video grounding, object relation, video understanding, activity net, vision and language, visual semantic role, transformer, activity recognition, natural language processing

Abstract Paper Similar Papers

Abstract: We explore the task of Video Object Grounding (VOG), which grounds objects in videos referred to in natural language descriptions. Previous methods apply image grounding based algorithms to address VOG, fail to explore the object relation information and suffer from limited generalization. Here, we investigate the role of object relations in VOG and propose a novel framework VOGNet to encode multi-modal object relations via self-attention with relative position encoding. To evaluate VOGNet, we propose novel contrasting sampling methods to generate more challenging grounding input samples, and construct a new dataset called ActivityNet-SRL (ASRL) based on existing caption and grounding datasets. Experiments on ASRL validate the need of encoding object relations in VOG, and our VOGNet outperforms competitive baselines by a significant margin.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at CVPR 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

02/02/2021

Mind-the-Gap! Unsupervised Domain Adaptation for Text-Video Retrieval

Qingchao Chen, Yang Liu, Samuel Albanie

Keywords Paper

0

0

0

0

15:19

06/12/2021

Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Background Mixing

Aadarsh Sahoo, Rutav Shah, Rameswar Panda and
Kate Saenko, Abir Das

Keywords Paper

domain adaptation, contrastive learning

0

0

0

0

13:20

06/12/2021

Class-agnostic Reconstruction of Dynamic Objects from Videos

Zhongzheng Ren, Xiaoming Zhao, Alex Schwing

Keywords Paper

0

0

0

0

13:29

02/02/2021

Contrastive Transformation for Self-supervised Correspondence Learning

Ning Wang, Wengang Zhou, Houqiang Li

Keywords Paper

0

0

0

0

13:41

30/11/2020

Mask-Ranking Network for Semi-Supervised Video Object Segmentation

Wenjing Li, Xiang Zhang, Yujie Hu, Yingqi Tang

Keywords Paper

0

0

0

0

5:36

06/12/2021

End-to-end Multi-modal Video Temporal Grounding

Yi-Wen Chen, Yi-Hsuan Tsai, Ming-Hsuan Yang

Keywords Paper

self-supervised learning, transformers, vision, contrastive learning

0

0

0

0

8:46

22/11/2021

FlowVOS: Weakly-Supervised Visual Warping for Detail-Preserving and Temporally Consistent Single-Shot Video Object Segmentation

Julia Gong, F. Christopher Holsinger, Serena Yeung

Keywords Paper

video object segmentation, single shot video object segmentation, segmentation, object tracking, optical flow, motion tracking, visual warping, weak supervision, video analysis, object segmentation

0

0

0

0

2:57

14/06/2020

Syntax-Aware Action Targeting for Video Captioning

Qi Zheng, Chaoyue Wang, Dacheng Tao

Keywords Paper

video and language, video captioning, action predicting

0

0

0

0

1:01

22/11/2021

Deep Video Inpainting Detection

Peng Zhou, Ning Yu, Zuxuan Wu and
Larry Davis, Abhinav Shrivastava, Ser-Nam Lim

Keywords Paper

Video Inpainting Detection, Manipulation Detection, DeepFake Detection

0

0

0

0

3:01

02/02/2021

F2Net: Learning to Focus on the Foreground for Unsupervised Video Object Segmentation

Daizong Liu, Dongdong Yu, Changhu Wang, Pan Zhou

Keywords Paper

0

0

0

0

16:59

05/01/2021

DORi: Discovering Object Relationships for Moment Localization of a Natural Language Query in a Video

Cristian Rodriguez-Opazo, Edison Marrese-Taylor, Basura Fernando and
Hongdong Li, Stephen Gould

Keywords Paper

0

0

0

0

5:02

04/07/2020

Span-based Localizing Network for Natural Language Video Localization

Hao Zhang, Aixin Sun, Wei Jing, Joey Tianyi Zhou

Keywords Paper

Natural Localization, NLVL, ranking task, regression task

0

0

0

0

11:23

14/06/2020

Learning Video Object Segmentation From Unlabeled Videos

Xiankai Lu, Wenguan Wang, Jianbing Shen and
Yu-Wing Tai, David J. Crandall, Steven C. H. Hoi

Keywords Paper

unsupervised/weakly supervised vos, four granularity, video pattern learning

0

0

0

0

1:01

14/06/2020

Telling Left From Right: Learning Spatial Correspondence of Sight and Sound

Karren Yang, Bryan Russell, Justin Salamon

Keywords Paper

audio-visual learning in video, self-supervision, video dataset, spatial audio, localization, spatialization, upmixing, source separation

0

0

0

0

4:41

05/01/2021

Temporal Context Aggregation for Video Retrieval With Contrastive Learning

Jie Shao, Xin Wen, Bingchen Zhao, Xiangyang Xue

Keywords Paper

0

0

0

0

4:50

22/11/2021

Hierarchical Interaction Network for Video Object Segmentation from Referring Expressions

Zhao Yang, Yansong Tang, Luca Bertinetto and
Hengshuang Zhao, Philip Torr

Keywords Paper

segmentation, video object segmentation, referring segmentation, referring video object segmentation, video object segmentation from referring expressions, referring image segmentation, referring image comprehension, optical flow, visual grounding

0

0

0

0

2:57

02/02/2021

CompFeat: Comprehensive Feature Aggregation for Video Instance Segmentation

Yang Fu, Linjie Yang, Ding Liu and
Thomas S. Huang, Humphrey Shi

Keywords Paper

0

0

0

0

16:24

14/06/2020

Classifying, Segmenting, and Tracking Object Instances in Video with Mask Propagation

Gedas Bertasius, Lorenzo Torresani

Keywords Paper

instance segmentation, object detection, object tracking, video analysis.

0

0

0

0

4:59

02/02/2021

Proposal-Free Video Grounding with Contextual Pyramid Network

Kun Li, Dan Guo, Meng Wang

Keywords Paper

0

0

0

0

14:19

04/07/2020

Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA

Hyounghun Kim, Zineng Tang, Mohit Bansal

Keywords Paper

Dense-Caption Matching, Temporal VideoQA, answering questions, frame problem

0

0

0

0

10:56

22/11/2021

Revisiting spatio-temporal layouts for compositional action recognition

Gorjan Radevski, Marie-Francine Moens, Tinne Tuytelaars

Keywords Paper

compositional action recognition, video understanding, something-something, action genome, charades, video transformer, multimodal fusion, spatial reasoning, spatio-temporal action recognition, revisiting spatio-temporal layouts

0

0

0

0

9:58

14/06/2020

LiDAR-Based Online 3D Video Object Detection With Graph-Based Message Passing and Spatiotemporal Transformer Attention

Junbo Yin, Jianbing Shen, Chenye Guan and
Dingfu Zhou, Ruigang Yang

Keywords Paper

3d object detection, point cloud, video, graph, attention, autonomous driving

0

0

0

0

1:02

02/02/2021

Activity Image-to-Video Retrieval by Disentangling Appearance and Motion

Liu Liu, Jiangtong Li, Li Niu and
Ruicong Xu, Liqing Zhang

Keywords Paper

0

0

0

1

14:34

02/02/2021

Temporal ROI Align for Video Object Recognition

Tao Gong, Kai Chen, Xinjiang Wang and
Qi Chu, Feng Zhu, Dahua Lin, Nenghai Yu, Huamin Feng

Keywords Paper

0

0

0

0

14:29

02/02/2021

Visualization of Supervised and Self-Supervised Neural Networks via Attribution Guided Factorization

Shir Gur, Ameen Ali, Lior Wolf

Keywords Paper

0

0

0

0

14:14

02/02/2021

Query-Memory Re-Aggregation for Weakly-supervised Video Object Segmentation

Fanchao Lin, Hongtao Xie, Yan Li, Yongdong Zhang

Keywords Paper

0

0

0

0

14:19

05/01/2021

Coarse Temporal Attention Network (CTA-Net) for Driver's Activity Recognition

Zachary Wharton, Ardhendu Behera, Yonghuai Liu, Nik Bessis

Keywords Paper

0

0

0

0

5:30

14/06/2020

Visual-Textual Capsule Routing for Text-Based Video Segmentation

Bruce McIntosh, Kevin Duarte, Yogesh S Rawat, Mubarak Shah

Keywords Paper

segmentation, localization, video, capsule, natural language, action, a2d, routing

0

0

0

0

4:58

22/11/2021

V3GAN: Decomposing Background, Foreground and Motion for Video Generation

Arti Keshari, Sonam Gupta, Sukhendu Das

Keywords Paper

video generation, unconditional video generation, shuffling loss, feature level masking, unsupervised learning, GAN, foreground, background, motion decomposition

0

0

0

0

3:02

22/11/2021

Paying Attention to Varying Receptive Fields: Object Detection with Atrous Filters and Vision Transformers

Arthur Jian Shun Lam, Jun Yi Lim, Ricky Sutopo, Vishnu Monn Baskaran

Keywords Paper

object detection, atrous convolution, vision transformers, attention mechanism

0

0

0

0

3:01

14/06/2020

Local-Global Video-Text Interactions for Temporal Grounding

Jonghwan Mun, Minsu Cho, Bohyung Han

Keywords Paper

temporal grounding, temporal moment retrieval, localization by natural language, video understanding, vision and language

0

0

0

0

1:01

25/07/2020

3D self-attention for unsupervised video quantization

Jingkuan Song, Ruimin Lang, Xiaosu Zhu and
Xing Xu, Lianli Gao, Heng Tao Shen

Keywords Paper

quantization, video retrieval, ann search

0

0

0

0

9:44

22/11/2021

Duplicate Latent Representation Suppression for Multi-object Variational Autoencoders

Li Nanbo, Robert B Fisher

Keywords Paper

object-centric representation learning, variational autoencoders, scene representation

0

0

0

0

2:58

14/06/2020

Learning to Discriminate Information for Online Action Detection

Hyunjun Eun, Jinyoung Moon, Jongyoul Park and
Chanho Jung, Changick Kim

Keywords Paper

online action detection, information discrimination unit, information discrimination network, early embedding module, ongoing action, temporal action detection, video understanding, gated recurrent unit, streaming video, relevant chunks.

0

0

0

0

1:04

19/08/2021

Weakly-Supervised Spatio-Temporal Anomaly Detection in Surveillance Video

Jie Wu, Wei Zhang, Guanbin Li and
Wenhao Wu, Xiao Tan, Yingying Li, Errui Ding, Liang Lin

Keywords Paper

Computer Vision, Video, Weakly Supervised Learning

0

0

0

0

12:10

14/06/2020

Learning Invariant Representation for Unsupervised Image Restoration

Wenchao Du, Hu Chen, Hongyu Yang

Keywords Paper

unsupervised image restoraion, representation learning, adversarial domain adaption, self-supervised contraints

0

0

0

0

0:59

05/01/2021

Interpretable and Trustworthy Deepfake Detection via Dynamic Prototypes

Loc Trinh, Michael Tsang, Sirisha Rambhatla, Yan Liu

Keywords Paper

0

0

0

0

5:00

30/11/2020

Image Captioning through Image Transformer

Sen He, Wentong Liao, Hamed R. Tavakoli and
Michael Yang, Bodo Rosenhahn, Nicolas Pugeault

Keywords Paper

0

0

0

0

9:49

22/11/2021

Recurrence-in-Recurrence Networks for Video Deblurring

JoonKyu Park, Seungjun Nah, Kyoung Mu Lee

Keywords Paper

video deblurring, recurrence-in-recurrence, inner-recurrence, recurrent neural networks, attention

0

0

0

0

2:58

14/06/2020

Video Playback Rate Perception for Self-Supervised Spatio-Temporal Representation Learning

Yuan Yao, Chang Liu, Dezhao Luo and
Yu Zhou, Qixiang Ye

Keywords Paper

self-supervised spatio-temporal representation learning, multi-temporal resolution characteristic, playback rate perception, motion attention mechanism

0

0

0

0

1:01