05/01/2021

PDAN: Pyramid Dilated Attention Network for Action Detection

Rui Dai, Srijan Das, Luca Minciullo, Lorenzo Garattoni, Gianpiero Francesca, Francois Bremond

Keywords:

Abstract: Handling long and complex temporal information is an important factor for action detection tasks. This challenge is further aggravated by densely distributed actions in untrimmed videos. Previous action detection methods are failing in selecting the key temporal information in videos of long length. To this end, we introduce the Dilated Attention Layer (DAL). Compared to previous temporal convolution layer, DAL allocates attentional weights to each feature in the kernel, which enables DAL to learn better local representation across time. Furthermore, DAL when accompanied by dilated kernels is able to learn a global representation of several minutes long videos which is crucial for the task of action detection. Finally, we introduce Pyramid Dilated Attention Network (PDAN) which is build upon DAL. With the help of DAL combining with dilation and residual links, PDAN can model short-term and long-term temporal relations simultaneously by focusing on local segments at the level of low and high temporal receptive fields. This property enables PDAN to handle complex temporal relations between different action instances in long untrimmed videos. To corroborate the effectiveness and robustness of our proposed method, we evaluate it on three densely annotated, multi-label datasets: MultiTHUMOS, Charades and an Inhouse dataset, outperforming the state-of-the-art results.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at WACV 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd Characters remaining: 140

Similar Papers