14/06/2020

G-TAD: Sub-Graph Localization for Temporal Action Detection

Mengmeng Xu, Chen Zhao, David S. Rojas, Ali Thabet, Bernard Ghanem

Keywords: temporal action detection, adaptive semantic context, subgraph localization, graph convolution, gcnext, graph alignment, thumos14, activitynet1.3

Abstract: Temporal action detection is a fundamental yet challenging task in video understanding. Video context is a critical cue to effectively detect actions, but current works mainly focus on temporal context, while neglecting semantic context as well as other important context properties. In this work, we propose a graph convolutional network (GCN) model to adaptively incorporate multi-level semantic context into video features and cast temporal action detection as a sub-graph localization problem. Specifically, we formulate video snippets as graph nodes, snippet-snippet correlations as edges, and actions associated with context as target sub-graphs. With graph convolution as the basic operation, we design a GCN block called GCNeXt, which learns the features of each node by aggregating its context and dynamically updates the edges in the graph. To localize each sub-graph, we also design an SGAlign layer to embed each sub-graph into the Euclidean space. Extensive experiments show that G-TAD is capable of finding effective video context without extra supervision and achieves state-of-the-art performance on two detection benchmarks. On ActivityNet-1.3 it obtains an average mAP of 34.09%. on THUMOS14 it reaches 51.6% at IoU@0.5 when combined with a proposal processing method. The code has been made available at https://github.com/frostinassiky/gtad.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at CVPR 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd Characters remaining: 140

Similar Papers