Sentence Guided Temporal Modulation for Dynamic Video Thumbnail Generation

07/09/2020

Sentence Guided Temporal Modulation for Dynamic Video Thumbnail Generation

Mrigank Rochan, Mahesh Kumar Krishna Reddy, Yang Wang

Keywords: video thumbnail generation, conditional normalization

Abstract Paper Similar Papers

Abstract: We consider the problem of sentence specified dynamic video thumbnail generation. Given an input video and a user query sentence, the goal is to generate a video thumbnail that not only provides the preview of the video content, but also semantically corresponds to the sentence. In this paper, we propose a sentence guided temporal modulation (SGTM) mechanism that utilizes the sentence embedding to modulate the normalized temporal activations of the video thumbnail generation network. Unlike the existing state-of-the-art method that uses recurrent architectures, we propose a non-recurrent framework that is simple and allows much more parallelization. Extensive experiments and analysis on a large-scale dataset demonstrate the effectiveness of our framework.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at BMVC 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

05/01/2021

DORi: Discovering Object Relationships for Moment Localization of a Natural Language Query in a Video

Cristian Rodriguez-Opazo, Edison Marrese-Taylor, Basura Fernando and
Hongdong Li, Stephen Gould

Keywords Paper

0

0

0

0

5:02

14/06/2020

ActBERT: Learning Global-Local Video-Text Representations

Linchao Zhu, Yi Yang

Keywords Paper

actbert, cross-modal pretraining, video and language, transformer, tangled transformer, instructional videos

0

0

0

0

4:58

22/11/2021

Back to the Future: Cycle Encoding Prediction for Self-supervised Video Representation Learning

Xinyu Yang, Majid Mirmehdi, Tilo Burghardt

Keywords Paper

unsupervised learning, self-supervised learning, video self-supervised learning, contrastive learning, representation learning, cycle consistency, temporal prediction, action recognition

0

0

0

0

2:59

14/06/2020

Hierarchical Conditional Relation Networks for Video Question Answering

Thao Minh Le, Vuong Le, Svetha Venkatesh, Truyen Tran

Keywords Paper

video question answering, visual question answering, conditional relation network, vision-language neural network

0

0

0

0

5:00

02/02/2021

Dynamic Graph Representation Learning for Video Dialog via Multi-Modal Shuffled Transformers

Shijie Geng, Peng Gao, Moitreya Chatterjee and
Chiori Hori, Jonathan Le Roux, Yongfeng Zhang, Hongsheng Li, Anoop Cherian

Keywords Paper

0

0

0

0

19:36

04/07/2020

MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning

Jie Lei, Liwei Wang, Yelong Shen and
Dong Yu, Tamara Berg, Mohit Bansal

Keywords Paper

Coherent Captioning, Generating descriptions, captioning tasks, coherent generation

0

0

0

0

10:51

06/12/2020

COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning

Simon Ging, Mohammadreza Zolfaghari, Hamed Pirsiavash, Thomas Brox

Keywords Paper

0

0

0

0

3:16

14/06/2020

Time Flies: Animating a Still Image With Time-Lapse Video As Reference

Chia-Chi Cheng, Hung-Yu Chen, Wei-Chen Chiu

Keywords Paper

time-lapse video animation, self-supervised learning, style transfer, temporal consistency

0

0

0

0

1:01

02/02/2021

Mind-the-Gap! Unsupervised Domain Adaptation for Text-Video Retrieval

Qingchao Chen, Yang Liu, Samuel Albanie

Keywords Paper

0

0

0

0

15:19

06/12/2021

Learning from Inside: Self-driven Siamese Sampling and Reasoning for Video Question Answering

Weijiang Yu, Haoteng Zheng, Mengfei Li and
Lei Ji, Lijun Wu, Nong Xiao, Nan Duan

Keywords Paper

transformers

0

0

0

0

13:47

06/12/2020

Delving into the Cyclic Mechanism in Semi-supervised Video Object Segmentation

Yuxi Li, Ning Xu, Jinlong Peng and
John See, Weiyao Lin

Keywords Paper

0

0

0

0

2:56

14/06/2020

Image Search With Text Feedback by Visiolinguistic Attention Learning

Yanbei Chen, Shaogang Gong, Loris Bazzani

Keywords Paper

vision and language, image search, text feedback, attention mechanism, transformer, multimodal learning, representation learning, composition, image retrieval, interactive image search

0

0

0

0

1:00

07/09/2020

A Better Use of Audio-Visual Cues: Dense Video Captioning with Bi-modal Transformer

Vladimir Iashin, Esa Rahtu

Keywords Paper

Dense Video Captioning, Temporal Action Proposal Generation, Bi-modal Transformer, Audio-visual Training, Cross-modal, Multi-modal, ActivityNet Captions

0

0

0

0

9:11

06/12/2021

MERLOT: Multimodal Neural Script Knowledge Models

Rowan Zellers, Ximing Lu, Jack Hessel and
Youngjae Yu, Jae Sung Park, Jize Cao, Ali Farhadi, Yejin Choi

Keywords Paper

representation learning

0

0

0

0

18:15

02/02/2021

Write-a-speaker: Text-based Emotional and Rhythmic Talking-head Generation

Lincheng Li, Suzhen Wang, Zhimeng Zhang and
Yu Ding, Yixing Zheng, Xin Yu, Changjie Fan

Keywords Paper

0

0

0

0

15:58

14/06/2020

Syntax-Aware Action Targeting for Video Captioning

Qi Zheng, Chaoyue Wang, Dacheng Tao

Keywords Paper

video and language, video captioning, action predicting

0

0

0

0

1:01

02/02/2021

SeCo: Exploring Sequence Supervision for Unsupervised Representation Learning

Ting Yao, Yiheng Zhang, Zhaofan Qiu and
Yingwei Pan, Tao Mei

Keywords Paper

0

0

0

0

16:17

06/12/2021

Deep Contextual Video Compression

Jiahao Li, Bin Li, Yan Lu

Keywords Paper

0

0

0

0

6:33

08/12/2020

A hierarchical approach to vision-based language generation: from simple sentences to complex natural language

Simion-Vlad Bogolin, Ioana Croitoru, Marius Leordeanu

Keywords Paper

0

0

0

0

12:15

04/07/2020

Video-Grounded Dialogues with Pretrained Generation Language Models

Hung Le, Steven C.H. Hoi

Keywords Paper

downstream tasks, video-grounded tasks, sequence-to-sequence task, Pretrained Models

0

0

0

0

7:22

14/06/2020

Attention Mechanism Exploits Temporal Contexts: Real-Time 3D Human Pose Reconstruction

Ruixu Liu, Ju Shen, He Wang and
Chen Chen, Sen-ching Cheung, Vijayan Asari

Keywords Paper

3d human pose, attention mechanism, multi-scale dilation convolution, monocular motion reconstruction

0

0

0

0

5:01

22/11/2021

LARNet: Latent Action Representation for Human Action Synthesis

Naman Biyani, Aayush Jung Bahadur Rana, Shruti Vyas, Yogesh Rawat

Keywords Paper

action synthesis, video synthesis, joint generative model, human action generation, end-to-end learning, conditional video generation

0

0

0

0

3:02

05/01/2021

Temporal Context Aggregation for Video Retrieval With Contrastive Learning

Jie Shao, Xin Wen, Bingchen Zhao, Xiangyang Xue

Keywords Paper

0

0

0

0

4:50

06/12/2020

Unsupervised Text Generation by Learning from Search

Jingjing Li, Zichao Li, Lili Mou and
Xin Jiang, Michael Lyu, Irwin King

Keywords Paper

0

0

0

0

3:24

14/06/2020

Learning Video Object Segmentation From Unlabeled Videos

Xiankai Lu, Wenguan Wang, Jianbing Shen and
Yu-Wing Tai, David J. Crandall, Steven C. H. Hoi

Keywords Paper

unsupervised/weakly supervised vos, four granularity, video pattern learning

0

0

0

0

1:01

06/12/2021

End-to-end Multi-modal Video Temporal Grounding

Yi-Wen Chen, Yi-Hsuan Tsai, Ming-Hsuan Yang

Keywords Paper

self-supervised learning, transformers, vision, contrastive learning

0

0

0

0

8:46

02/02/2021

Semantic Grouping Network for Video Captioning

Hobin Ryu, Sunghun Kang, Haeyong Kang, Chang D. Yoo

Keywords Paper

0

0

0

0

17:41

07/09/2020

Attention Distillation for Learning Video Representations

Miao Liu, Xin Chen, Yun Zhang and
Yin Li, James Rehg

Keywords Paper

Action Recognition, Deep Learning, Representation Learning

0

0

0

0

9:50

06/12/2020

Self-Supervised MultiModal Versatile Networks

Jean-Baptiste Alayrac, Adria Recasens, Rosalia Schneider and
Relja Arandjelović, Jason Ramapuram, Jeffrey De Fauw, Lucas Smaira, Sander Dieleman, Andrew Zisserman

Keywords Paper

1

0

0

0

3:25

04/07/2020

Improving Image Captioning with Better Use of Caption

Zhan Shi, Xu Zhou, Xipeng Qiu, Xiaodan Zhu

Keywords Paper

Image Captioning, multimodal problem, natural processing, computer community

0

0

0

0

11:11

19/08/2021

Weakly Supervised Dense Video Captioning via Jointly Usage of Knowledge Distillation and Cross-modal Matching

Bofeng Wu, Guocheng Niu, Jun Yu and
Xinyan Xiao, Jian Zhang, Hua Wu

Keywords Paper

Computer Vision, Language and Vision, Multi-instance; Multi-label; Multi-view learning

0

0

0

0

12:03

14/06/2020

Local-Global Video-Text Interactions for Temporal Grounding

Jonghwan Mun, Minsu Cho, Bohyung Han

Keywords Paper

temporal grounding, temporal moment retrieval, localization by natural language, video understanding, vision and language

0

0

0

0

1:01

26/04/2020

Neural Machine Translation with Universal Visual Representation

Zhuosheng Zhang, Kehai Chen, Rui Wang and
Masao Utiyama, Eiichiro Sumita, Zuchao Li, Hai Zhao

Keywords Paper

Neural Machine Translation, Visual Representation, Multimodal Machine Translation, Language Representation

0

0

0

0

4:50

14/06/2020

Video Playback Rate Perception for Self-Supervised Spatio-Temporal Representation Learning

Yuan Yao, Chang Liu, Dezhao Luo and
Yu Zhou, Qixiang Ye

Keywords Paper

self-supervised spatio-temporal representation learning, multi-temporal resolution characteristic, playback rate perception, motion attention mechanism

0

0

0

0

1:01

06/12/2020

A Simple Language Model for Task-Oriented Dialogue

Ehsan Hosseini-Asl, Bryan McCann, Chien-Sheng Wu and
Semih Yavuz, Richard Socher

Keywords Paper

0

0

0

0

3:21

03/05/2021

Self-Supervised Learning of Compressed Video Representations

Youngjae Yu, Sangho Lee, Gunhee Kim, Yale Song

Keywords Paper

self-supervised learning, Compressed videos

0

0

0

0

4:34

06/12/2021

Multi-modal Dependency Tree for Video Captioning

Wentian Zhao, Xinxiao Wu, Jiebo Luo

Keywords Paper

reinforcement learning and planning, graph learning, language

0

0

0

0

6:02

06/12/2021

CLIP-It! Language-Guided Video Summarization

Medhini Narasimhan, Anna Rohrbach, Trevor Darrell

Keywords Paper

transformers

0

0

0

0

6:14

05/01/2021

Improving Video Captioning With Temporal Composition of a Visual-Syntactic Embedding

Jesus Perez-Martin, Benjamin Bustos, Jorge Perez

Keywords Paper

0

0

0

0

5:01

04/07/2020

Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA

Hyounghun Kim, Zineng Tang, Mohit Bansal

Keywords Paper

Dense-Caption Matching, Temporal VideoQA, answering questions, frame problem

0

0

0

0

10:56