Condensed Movies: Story Based Retrieval with Contextual Embeddings

30/11/2020

Condensed Movies: Story Based Retrieval with Contextual Embeddings

Max Bain, Arsha Nagrani, Andrew Brown, Andrew Zisserman

Keywords:

Abstract Paper Similar Papers

Abstract: Our objective in this work is the long range understandingof the narrative structure of movies. Instead of considering the entire movie, we propose to learn from the `key scenes' of the movie, providing a condensed look at the full storyline. To this end, we make the following three contributions: (i) We create the Condensed Movie Dataset (CMD) consisting of the key scenes from over 3K movies: each key scene is accompanied by a high level semantic description of the scene, character face tracks, and metadata about the movie. Our dataset is scalable, obtained automatically from YouTube, and is freely available for anybody to download and use. It is also an order of magnitude larger than existing movie datasets in the number of movies; (ii) We provide a deep network baseline for text-to-video retrieval on our dataset, combining character, speech and visual cues into a single video embedding; and finally (iii) We demonstrate how the addition of context from other video clips improves retrieval performance.

The video of this talk cannot be embedded. You can watch it here:

https://accv2020.github.io/miniconf/poster_13.html

(Link will open in new window)

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ACCV 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

14/06/2020

A Local-to-Global Approach to Multi-Modal Movie Scene Segmentation

Anyi Rao, Linning Xu, Yu Xiong and
Guodong Xu, Qingqiu Huang, Bolei Zhou, Dahua Lin

Keywords Paper

video-analysis, multi-modal, dataset, long-video, structural-representation, high-level-understanding, story/plot-understanding

0

0

0

0

1:01

19/04/2021

DOCENT: Learning self-supervised entity representations from large document collections

Yury Zemlyanskiy, Sudeep Gandhe, Ruining He and
Bhargav Kanagal, Anirudh Ravula, Juraj Gottweis, Fei Sha, Ilya Eckstein

Keywords Paper

0

0

0

0

6:37

08/12/2020

A hierarchical approach to vision-based language generation: from simple sentences to complex natural language

Simion-Vlad Bogolin, Ioana Croitoru, Marius Leordeanu

Keywords Paper

0

0

0

0

12:15

05/01/2021

High-Quality Frame Interpolation via Tridirectional Inference

Jinsoo Choi, Jaesik Park, In So Kweon

Keywords Paper

0

0

0

0

4:08

06/12/2021

Deep Contextual Video Compression

Jiahao Li, Bin Li, Yan Lu

Keywords Paper

0

0

0

0

6:33

06/12/2020

Self-Supervised MultiModal Versatile Networks

Jean-Baptiste Alayrac, Adria Recasens, Rosalia Schneider and
Relja Arandjelović, Jason Ramapuram, Jeffrey De Fauw, Lucas Smaira, Sander Dieleman, Andrew Zisserman

Keywords Paper

1

0

0

0

3:25

22/11/2021

TEAM-Net: Multi-modal Learning for Video Action Recognition with Partial Decoding

Zhengwei Wang, Qi She, Aljosa Smolic

Keywords Paper

video action recognition, partially decoded video, multi-modal fusion

0

0

0

0

3:24

05/01/2021

Temporal Context Aggregation for Video Retrieval With Contrastive Learning

Jie Shao, Xin Wen, Bingchen Zhao, Xiangyang Xue

Keywords Paper

0

0

0

0

4:50

04/07/2020

MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning

Jie Lei, Liwei Wang, Yelong Shen and
Dong Yu, Tamara Berg, Mohit Bansal

Keywords Paper

Coherent Captioning, Generating descriptions, captioning tasks, coherent generation

0

0

0

0

10:51

22/11/2021

AudViSum: Self-Supervised Deep Reinforcement Learning for Diverse Audio-Visual Summary Generation

Sanjoy Chowdhury, Aditya Patra, Subhrajyoti Dasgupta, Ujjwal Bhattacharya

Keywords Paper

video summarization, audio-viusal summarization, multi-modal learning, self-supervised learning, contrastive loss

0

0

0

0

3:05

14/06/2020

Fine-Grained Video-Text Retrieval With Hierarchical Graph Reasoning

Shizhe Chen, Yida Zhao, Qin Jin, Qi Wu

Keywords Paper

video-text retrieval, cross-modal matching, graph neural network

0

0

0

0

1:01

16/11/2020

Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning

Zhiyuan Fang, Tejas Gokhale, Pratyay Banerjee and
Chitta Baral, Yezhou Yang

Keywords Paper

captioning, video understanding, video captioning, generating captions

0

0

0

0

12:02

14/06/2020

Unsupervised Learning From Video With Deep Neural Embeddings

Chengxu Zhuang, Tianwei She, Alex Andonian and
Max Sobol Mark, Daniel Yamins

Keywords Paper

unsupervised learning, self-supervised learning, video learning, contrastive learning, deep neural networks, action recognition, object recognition, two-pathway models

0

0

0

0

1:01

04/07/2020

ScriptWriter: Narrative-Guided Script Generation

Yutao Zhu, Ruihua Song, Zhicheng Dou and
Jian-Yun Nie, Jin Zhou

Keywords Paper

Narrative-Guided Generation, dialogue systems, ScriptWriter, model ScriptWriter

0

0

0

0

10:13

16/11/2020

HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training

Linjie Li, Yen-Chun Chen, Yu Cheng and
Zhe Gan, Licheng Yu, Jingjing Liu

Keywords Paper

large-scale learning, pre-training tasks, video-subtitle matching, text-based retrieval

0

0

0

0

11:47

19/10/2020

ContentWise impressions: An industrial dataset with impressions included

Fernando B. Pérez Maurera, Maurizio Ferrari Dacrema, Lorenzo Saule and
Mario Scriminaci, Paolo Cremonesi

Keywords Paper

implicit feedback, collaborative filtering, dataset, impressions, open source

0

0

0

0

9:59

06/12/2021

CLIP-It! Language-Guided Video Summarization

Medhini Narasimhan, Anna Rohrbach, Trevor Darrell

Keywords Paper

transformers

0

0

0

0

6:14

14/06/2020

Video Super-Resolution With Temporal Group Attention

Takashi Isobe, Songjiang Li, Xu Jia and
Shanxin Yuan, Gregory Slabaugh, Chunjing Xu, Ya-Li Li, Shengjin Wang, Qi Tian

Keywords Paper

video processing, video super-resolution

0

0

0

0

1:00

02/02/2021

CompFeat: Comprehensive Feature Aggregation for Video Instance Segmentation

Yang Fu, Linjie Yang, Ding Liu and
Thomas S. Huang, Humphrey Shi

Keywords Paper

0

0

0

0

16:24

06/12/2021

MERLOT: Multimodal Neural Script Knowledge Models

Rowan Zellers, Ximing Lu, Jack Hessel and
Youngjae Yu, Jae Sung Park, Jize Cao, Ali Farhadi, Yejin Choi

Keywords Paper

representation learning

0

0

0

0

18:15

16/11/2020

VMSMO: Learning to Generate Multimodal Summary for Video-based News Articles

Mingzhe Li, Xiuying Chen, Shen Gao and
Zhangming Chan, Dongyan Zhao, Rui Yan

Keywords Paper

video-based summarization, human evaluations, vmsmo, dual-interaction-based summarizer

0

0

0

0

12:22

02/02/2021

Dense Events Grounding in Video

Peijun Bao, Qian Zheng, Yadong Mu

Keywords Paper

0

0

0

0

14:19

06/12/2021

Compressed Video Contrastive Learning

Yuqi Huo, Mingyu Ding, Haoyu Lu and
Nanyi Fei, Zhiwu Lu, Ji-Rong Wen, Ping Luo

Keywords Paper

self-supervised learning, contrastive learning, representation learning

0

0

0

0

9:07

16/11/2020

MovieChats: Chat like Humans in a Closed Domain

Hui Su, Xiaoyu Shen, Zhou Xiao and
Zheng Zhang, Ernie Chang, Cheng Zhang, Cheng Niu, Jie Zhou

Keywords Paper

in-depth chat, intent prediction, knowledge retrieval, neural approach

0

0

0

0

10:05

03/05/2021

Self-Supervised Learning of Compressed Video Representations

Youngjae Yu, Sangho Lee, Gunhee Kim, Yale Song

Keywords Paper

self-supervised learning, Compressed videos

0

0

0

0

4:34

06/12/2021

Temporal-attentive Covariance Pooling Networks for Video Recognition

Zilin Gao, Qilong Wang, Bingbing Zhang and
Qinghua Hu, Peihua Li

Keywords Paper

0

0

0

1

8:13

05/01/2021

Data-Efficient Alignment of Multimodal Sequences by Aligning Gradient Updates and Internal Feature Distributions

Jianan Wang, Boyang Li, Xiangyu Fan and
Jing Lin, Yanwei Fu

Keywords Paper

0

0

0

0

4:49

06/12/2021

Relational Self-Attention: What's Missing in Attention for Video Understanding

Manjin Kim, Heeseung Kwon, CHUNYU WANG and
Suha Kwak, Minsu Cho

Keywords Paper

deep learning, transformers

0

0

0

0

13:31

02/02/2021

DramaQA: Character-Centered Video Story Understanding with Hierarchical QA

Seongho Choi, Kyoung-Woon On, Yu-Jung Heo and
Ahjeong Seo, Youwon Jang, Minsu Lee, Byoung-Tak Zhang

Keywords Paper

0

0

0

0

17:51

02/02/2021

Translate the Facial Regions You Like Using Self-Adaptive Region Translation

Wenshuang Liu, Wenting Chen, Zhanjia Yang, Linlin Shen

Keywords Paper

0

0

0

0

14:19

19/08/2021

Dig into Multi-modal Cues for Video Retrieval with Hierarchical Alignment

Wenzhe Wang, Mengdan Zhang, Runnan Chen and
Guanyu Cai, Penghao Zhou, Pai Peng, Xiaowei Guo, Jian Wu, Xing Sun

Keywords Paper

Computer Vision, Language and Vision, Deep Learning

0

0

0

0

9:07

07/09/2020

Sentence Guided Temporal Modulation for Dynamic Video Thumbnail Generation

Mrigank Rochan, Mahesh Kumar Krishna Reddy, Yang Wang

Keywords Paper

video thumbnail generation, conditional normalization

0

0

0

0

7:40

14/06/2020

Hierarchical Conditional Relation Networks for Video Question Answering

Thao Minh Le, Vuong Le, Svetha Venkatesh, Truyen Tran

Keywords Paper

video question answering, visual question answering, conditional relation network, vision-language neural network

0

0

0

0

5:00

05/01/2021

PDAN: Pyramid Dilated Attention Network for Action Detection

Rui Dai, Srijan Das, Luca Minciullo and
Lorenzo Garattoni, Gianpiero Francesca, Francois Bremond

Keywords Paper

0

0

0

0

5:00

06/12/2021

End-to-end Multi-modal Video Temporal Grounding

Yi-Wen Chen, Yi-Hsuan Tsai, Ming-Hsuan Yang

Keywords Paper

self-supervised learning, transformers, vision, contrastive learning

0

0

0

0

8:46

14/06/2020

Memory Enhanced Global-Local Aggregation for Video Object Detection

Yihong Chen, Yue Cao, Han Hu, Liwei Wang

Keywords Paper

video object detection, video analysis, object detection, memory, global-local aggregation

0

0

0

0

1:00

26/04/2020

AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures

Michael S. Ryoo, AJ Piergiovanni, Mingxing Tan, Anelia Angelova

Keywords Paper

video representation learning, video understanding, activity recognition, neural architecture search

0

0

0

0

5:02

02/02/2021

Write-a-speaker: Text-based Emotional and Rhythmic Talking-head Generation

Lincheng Li, Suzhen Wang, Zhimeng Zhang and
Yu Ding, Yixing Zheng, Xin Yu, Changjie Fan

Keywords Paper

0

0

0

0

15:58

01/07/2020

On Incorporating Structural Information to improve Dialogue Response Generation

Nikita Moghe, Priyesh Vijayan, Balaraman Ravindran, Mitesh M. Khapra

Keywords Paper

0

0

0

0

13:00

22/11/2021

Inter-intra Variant Dual Representations for Self-supervised Video Recognition

Lin ZHANG, Qi She, Zhengyang Shen, Changhu Wang

Keywords Paper

video action recognition, self-supervised learning, contrastive learning, representation learning

0

0

0

0

2:55