Uncovering Hidden Challenges in Query-Based Video Moment Retrieval

07/09/2020

Uncovering Hidden Challenges in Query-Based Video Moment Retrieval

Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkila

Keywords: video moment retrieval, temporal sentence grounding, dataset analysis, negative result

Abstract Paper Code Similar Papers

Abstract: The query-based moment retrieval is a problem of localising a specific clip from an untrimmed video according a query sentence. This is a challenging task that requires interpretation of both the natural language query and the video content. Like in many other areas in computer vision and machine learning, the progress in query-based moment retrieval is heavily driven by the benchmark datasets and, therefore, their quality has significant impact on the field. In this paper, we present a series of experiments assessing how well the benchmark results reflect the true progress in solving the moment retrieval task. Our results indicate substantial biases in the popular datasets and unexpected behaviour of the state-of-the-art models. Moreover, we present new sanity check experiments and approaches for visualising the results. Finally, we suggest possible directions to improve the temporal sentence grounding in the future.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at BMVC 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

16/11/2020

What is More Likely to Happen Next? Video-and-Language Future Event Prediction

Jie Lei, Licheng Yu, Tamara Berg, Mohit Bansal

Keywords Paper

video-and-language prediction, ai models, vlep, adversarial procedure

0

0

0

0

11:58

05/01/2021

Only Time Can Tell: Discovering Temporal Data for Temporal Modeling

Laura Sevilla-Lara, Shengxin Zha, Zhicheng Yan and
Vedanuj Goswami, Matt Feiszli, Lorenzo Torresani

Keywords Paper

0

0

0

0

4:14

06/12/2020

Video Frame Interpolation without Temporal Priors

Youjian Zhang, Chaoyue Wang, Dacheng Tao

Keywords Paper

0

0

0

0

3:18

22/11/2021

Diagnosing Errors in Video Relation Detectors

Shuo Chen, Pascal Mettes, Cees Snoek

Keywords Paper

video relation detection, error diagnosis

0

0

0

0

3:02

16/11/2020

BiST: Bi-directional Spatio-Temporal Reasoning for Video-Grounded Dialogues

Hung Le, Doyen Sahoo, Nancy Chen, Steven C.H. Hoi

Keywords Paper

video-grounded dialogues, high-resolution queries, video setting, bi-directional learning

0

0

0

0

11:05

05/01/2021

Coarse Temporal Attention Network (CTA-Net) for Driver's Activity Recognition

Zachary Wharton, Ardhendu Behera, Yonghuai Liu, Nik Bessis

Keywords Paper

0

0

0

0

5:30

22/11/2021

Gradient Frequency Modulation for Visually Explaining Video Understanding Models

Xin Miao Lin, Wentao Bao, Matthew Wright, Yu Kong

Keywords Paper

model explanation, model explainability, explainable AI, video action recognition, Discrete Fourier Transform, video perturbation, interpretable machine learning, video model explanation, frequency modulation, spatiotemporal consistency

0

0

0

0

2:53

12/07/2020

Stochastic Latent Residual Video Prediction

Jean-Yves Franceschi, Edouard Delasalles, Mickael Chen and
Sylvain Lamprier, Patrick Gallinari

Keywords Paper

Sequential, Network, and Time-Series Modeling

0

0

0

0

14:36

22/11/2021

SVD-GAN for Real-Time Unsupervised Video Anomaly Detection

Dinesh Jackson Samuel, Fabio Cuzzolin

Keywords Paper

Unsupervised anomaly detection, SVD-GAN, depth-wise separable convolutions, spatiotemporal features, GAN convergence, Singular Value Decomposition loss, GAN reconstruction, lightweight GAN model, minimized KL divergence

0

0

0

0

2:54

05/01/2021

Temporal Context Aggregation for Video Retrieval With Contrastive Learning

Jie Shao, Xin Wen, Bingchen Zhao, Xiangyang Xue

Keywords Paper

0

0

0

0

4:50

14/06/2020

Unsupervised Learning From Video With Deep Neural Embeddings

Chengxu Zhuang, Tianwei She, Alex Andonian and
Max Sobol Mark, Daniel Yamins

Keywords Paper

unsupervised learning, self-supervised learning, video learning, contrastive learning, deep neural networks, action recognition, object recognition, two-pathway models

0

0

0

0

1:01

14/06/2020

Probabilistic Video Prediction From Noisy Data With a Posterior Confidence

Yunbo Wang, Jiajun Wu, Mingsheng Long, Joshua B. Tenenbaum

Keywords Paper

video prediction, predictive learning, bayesian predictive networks, spatiotemporal modeling

0

0

0

0

1:00

02/02/2021

Temporal ROI Align for Video Object Recognition

Tao Gong, Kai Chen, Xinjiang Wang and
Qi Chu, Feng Zhu, Dahua Lin, Nenghai Yu, Huamin Feng

Keywords Paper

0

0

0

0

14:29

02/02/2021

Mind-the-Gap! Unsupervised Domain Adaptation for Text-Video Retrieval

Qingchao Chen, Yang Liu, Samuel Albanie

Keywords Paper

0

0

0

0

15:19

19/08/2021

Detecting Deepfake Videos with Temporal Dropout 3DCNN

Daichi Zhang, Chenyu Li, Fanzhao Lin and
Dan Zeng, Shiming Ge

Keywords Paper

Computer Vision, Biometrics, Face and Gesture Recognition, Fairness, Surveillance, Manipulation of People

0

0

0

0

8:30

02/02/2021

BSN++: Complementary Boundary Regressor with Scale-Balanced Relation Modeling for Temporal Action Proposal Generation

Haisheng Su, Weihao Gan, Wei Wu and
Yu Qiao, Junjie Yan

Keywords Paper

0

0

0

0

11:34

02/02/2021

Spatial-temporal Causal Inference for Partial Image-to-video Adaptation

Jin Chen, Xinxiao Wu, Yao Hu, Jiebo Luo

Keywords Paper

0

0

0

0

20:01

12/07/2020

Video Prediction via Example Guidance

Jingwei Xu, Harry (Huazhe) Xu, Bingbing Ni and
Xiaokang Yang, Trevor Darrell

Keywords Paper

Applications - Computer Vision

0

0

0

0

12:20

26/04/2020

Physics-as-Inverse-Graphics: Unsupervised Physical Parameter Estimation from Video

Miguel Jaques, Michael Burke, Timothy Hospedales

Keywords Paper

0

0

0

0

4:41

04/07/2020

Reverse Engineering Configurations of Neural Text Generation Models

Yi Tay, Dara Bahri, Che Zheng and
Clifford Brunk, Donald Metzler, Andrew Tomkins

Keywords Paper

Reverse Models, neural modeling, Neural Models, generative models

0

0

0

0

6:16

06/12/2021

Exploring Cross-Video and Cross-Modality Signals for Weakly-Supervised Audio-Visual Video Parsing

Yan-Bo Lin, Hung-Yu Tseng, Hsin-Ying Lee and
Yen-Yu Lin, Ming-Hsuan Yang

Keywords Paper

0

0

0

0

14:06

14/06/2020

Time Flies: Animating a Still Image With Time-Lapse Video As Reference

Chia-Chi Cheng, Hung-Yu Chen, Wei-Chen Chiu

Keywords Paper

time-lapse video animation, self-supervised learning, style transfer, temporal consistency

0

0

0

0

1:01

04/07/2020

Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA

Hyounghun Kim, Zineng Tang, Mohit Bansal

Keywords Paper

Dense-Caption Matching, Temporal VideoQA, answering questions, frame problem

0

0

0

0

10:56

22/11/2021

Knowing What, Where and When to Look: Video Action modelling with Attention

Juan-Manuel Perez-Rua, Brais Martinez, Xiatian Zhu and
Antoine S Toisoul, Victor A Escorcia, Tao Xiang

Keywords Paper

Action recognition, Fine-grained action, video attention, Spatial attention, Channel attention, Temporal attention, Spatio-temporal attention, Feature refinement

0

0

0

0

2:46

05/01/2021

Breaking Shortcuts by Masking for Robust Visual Reasoning

Keren Ye, Mingda Zhang, Adriana Kovashka

Keywords Paper

0

0

0

0

5:01

05/01/2021

Towards Visually Explaining Video Understanding Networks With Perturbation

Zhenqiang Li, Weimin Wang, Zuoyue Li and
Yifei Huang, Yoichi Sato

Keywords Paper

0

0

0

0

4:53

02/02/2021

Quantum Cognitively Motivated Decision Fusion for Video Sentiment Analysis

Dimitris Gkoumas, Qiuchi Li, Shahram Dehdashti and
Massimo Melucci, Yijun Yu, Dawei Song

Keywords Paper

0

0

0

0

15:47

06/12/2020

Convolutional Tensor-Train LSTM for Spatio-Temporal Learning

Jiahao Su, Wonmin Byeon, Jean Kossaifi and
Furong Huang, Jan Kautz, Anima Anandkumar

Keywords Paper

0

0

0

0

3:29

06/12/2021

CCVS: Context-aware Controllable Video Synthesis

Guillaume Le Moing, Jean Ponce, Cordelia Schmid

Keywords Paper

adversarial robustness and security, self-supervised learning, transformers, generative model

0

0

0

0

11:59

22/11/2021

Deep Video Inpainting Detection

Peng Zhou, Ning Yu, Zuxuan Wu and
Larry Davis, Abhinav Shrivastava, Ser-Nam Lim

Keywords Paper

Video Inpainting Detection, Manipulation Detection, DeepFake Detection

0

0

0

0

3:01

14/06/2020

Spatio-Temporal Graph for Video Captioning With Knowledge Distillation

Boxiao Pan, Haoye Cai, De-An Huang and
Kuan-Hui Lee, Adrien Gaidon, Ehsan Adeli, Juan Carlos Niebles

Keywords Paper

video captioning, spatio-temporal graph, video understanding, vision and language, knowledge distillation, transformer, computer vision.

0

0

0

0

1:01

18/07/2021

Temporal Predictive Coding For Model-Based Planning In Latent Space

Tung Nguyen, Rui Shu, Tuan Pham and
Hung Bui, Stefano Ermon

Keywords Paper

Deep Learning, Embedding and Representation learning

0

0

0

0

5:19

26/04/2020

VideoFlow: A Conditional Flow-Based Model for Stochastic Video Generation

Manoj Kumar, Mohammad Babaeizadeh, Dumitru Erhan and
Chelsea Finn, Sergey Levine, Laurent Dinh, Durk Kingma

Keywords Paper

Video generation, flow-based generative models, stochastic video prediction

0

0

0

0

5:02

14/06/2020

Syntax-Aware Action Targeting for Video Captioning

Qi Zheng, Chaoyue Wang, Dacheng Tao

Keywords Paper

video and language, video captioning, action predicting

0

0

0

0

1:01

22/11/2021

Local-Global Associative Frame Assemble in Video Re-ID

Qilei Li, Jiabo Huang, Shaogang Gong

Keywords Paper

video Re-ID, local aligned quality, global appearance correlations, associative frame assemble

0

0

0

0

2:24

05/01/2021

Multi-Frame Recurrent Adversarial Network for Moving Object Segmentation

Prashant W. Patil, Akshay Dudhane, Subrahmanyam Murala

Keywords Paper

0

0

0

0

5:00

14/06/2020

AdaCoF: Adaptive Collaboration of Flows for Video Frame Interpolation

Hyeongmin Lee, Taeoh Kim, Tae-young Chung and
Daehyun Pak, Yuseok Ban, Sangyoun Lee

Keywords Paper

video frame interpolation, video temporal super-resolution, frame rate up conversion, frame synthesis, motion estimation, motion compensation, frame warping

0

0

0

0

1:01

02/02/2021

RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning

Peihao Chen, Deng Huang, Dongliang He and
Xiang Long, Runhao Zeng, Shilei Wen, Mingkui Tan, Chuang Gan

Keywords Paper

0

0

0

0

14:14

30/11/2020

Transforming Multi-Concept Attention into Video Summarization

Yen-Ting Liu, Yu-Jhe Li, Yu-Chiang Frank Wang

Keywords Paper

0

0

0

0

7:07

30/11/2020

MLIFeat: Multi-level information fusion based deep local features

Yuyang Zhang Institute of Automation, Chinese Academy of Sciences, University of Chinese Academy of Sciences and
Jinge Wang, Shibiao Xu, Xiao Liu, Xiaopeng Zhang

Keywords Paper

0

0

0

0

5:28