Temporal Relational Modeling with Self-Supervision for Action Segmentation

02/02/2021

Temporal Relational Modeling with Self-Supervision for Action Segmentation

Dong Wang, Di Hu, Xingjian Li, Dejing Dou

Keywords:

Abstract Paper Similar Papers

Abstract: Temporal relational modeling in video is essential for human action understanding, such as action recognition and action segmentation. Although Graph Convolution Networks (GCNs) have shown promising advantages in relation reasoning on many tasks, it is still a challenge to apply graph convolution networks on long video sequences effectively. The main reason is that large number of nodes (i.e., video frames) makes GCNs hard to capture and model temporal relations in videos. To tackle this problem, in this paper, we introduce an effective GCN module, Dilated Temporal Graph Reasoning Module (DTGRM), designed to model temporal relations and dependencies between video frames at various time spans. In particular, we capture and model temporal relations via constructing multi-level dilated temporal graphs where the nodes represent frames from different moments in video. Moreover, to enhance temporal reasoning ability of the proposed model, an auxiliary self-supervised task is proposed to encourage the dilated temporal graph reasoning module to find and correct wrong temporal relations in videos. Our DTGRM model outperforms state-of-the-art action segmentation models on three challenging datasets: 50Salads, Georgia Tech Egocentric Activities (GTEA), and the Breakfast dataset. The code is available at https://github.com/redwang/DTGRM.

The video of this talk cannot be embedded. You can watch it here:

https://slideslive.com/38947988

(Link will open in new window)

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at AAAI 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

05/01/2021

Supervoxel Attention Graphs for Long-Range Video Modeling

Yang Wang, Gedas Bertasius, Tae-Hyun Oh and
Abhinav Gupta, Minh Hoai, Lorenzo Torresani

Keywords Paper

0

0

0

0

2:01

02/02/2021

Structured Co-reference Graph Attention for Video-grounded Dialogue

Junyeong Kim, Sunjae Yoon, Dahyun Kim, Chang D. Yoo

Keywords Paper

0

0

0

0

14:54

14/06/2020

G-TAD: Sub-Graph Localization for Temporal Action Detection

Mengmeng Xu, Chen Zhao, David S. Rojas and
Ali Thabet, Bernard Ghanem

Keywords Paper

temporal action detection, adaptive semantic context, subgraph localization, graph convolution, gcnext, graph alignment, thumos14, activitynet1.3

0

0

0

0

1:01

14/06/2020

Improving Action Segmentation via Graph-Based Temporal Reasoning

Yifei Huang, Yusuke Sugano, Yoichi Sato

Keywords Paper

action segmentation, temporal reasoning, graph convolution network

0

0

0

0

1:00

05/01/2021

Only Time Can Tell: Discovering Temporal Data for Temporal Modeling

Laura Sevilla-Lara, Shengxin Zha, Zhicheng Yan and
Vedanuj Goswami, Matt Feiszli, Lorenzo Torresani

Keywords Paper

0

0

0

0

4:14

06/12/2020

Learning Physical Graph Representations from Visual Scenes

Daniel Bear, Chaofei Fan, Damian Mrowca and
Yunzhu Li, Seth Alter, Aran Nayebi, Jeremy Schwartz, Li Fei-Fei, Jiajun Wu, Josh Tenenbaum, Daniel Yamins

Keywords Paper

0

0

0

0

3:19

22/11/2021

GTA: Global Temporal Attention for Video Action Understanding

Bo He, Xitong Yang, Zuxuan Wu and
Hao Chen, Ser-Nam Lim, Abhinav Shrivastava

Keywords Paper

action recognition, self-attention, temporal modeling

0

0

0

0

2:55

22/11/2021

CTRN: Class-Temporal Relational Network for Action Detection

Rui Dai, Srijan Das, Francois Bremond

Keywords Paper

action detection, graph reasoning, graph convolutional network, temporal modelling, multi-label classification

0

0

0

0

7:02

22/11/2021

Dynamic Graph Warping Transformer for Video Alignment

Junyan Wang, Yang Long, Maurice Pagnucco, Yang Song

Keywords Paper

Video alignment, Transformer, Graph Neural Network

0

0

0

0

2:45

02/02/2021

Anticipating Future Relations via Graph Growing for Action Prediction

Xinxiao Wu, Jianwei Zhao, Ruiqi Wang

Keywords Paper

0

0

0

0

14:44

23/08/2020

Comprehensive information integration modeling framework for video titling

Shengyu Zhang, Ziqi Tan, Zhou Zhao and
Jin Yu, Kun Kuang, Tan Jiang, Jingren Zhou, Hongxia Yang, Fei Wu

Keywords Paper

graph neural network, video title generation, mobile e-commerce, video recommendation

0

0

0

0

4:34

26/04/2020

Relational State-Space Model for Stochastic Multi-Object Systems

Fan Yang, Ling Chen, Fan Zhou and
Yusong Gao, Wei Cao

Keywords Paper

state-space model, time series, deep sequential model, graph neural network

0

0

0

0

4:22

05/01/2021

Adaptive Streaming of 360-Degree Videos With Reinforcement Learning

Sohee Park, Minh Hoai, Arani Bhattacharya, Samir R. Das

Keywords Paper

0

0

0

0

4:51

03/05/2021

VA-RED$^2$: Video Adaptive Redundancy Reduction

Bowen Pan, Rameswar Panda, Camilo L Fosco and
Chung-Ching Lin, Alex J Andonian, Yue Meng, Kate Saenko, Aude Oliva, Rogerio Feris

Keywords Paper

0

0

0

0

5:02

14/06/2020

Fine-Grained Video-Text Retrieval With Hierarchical Graph Reasoning

Shizhe Chen, Yida Zhao, Qin Jin, Qi Wu

Keywords Paper

video-text retrieval, cross-modal matching, graph neural network

0

0

0

0

1:01

02/02/2021

Spatial-temporal Causal Inference for Partial Image-to-video Adaptation

Jin Chen, Xinxiao Wu, Yao Hu, Jiebo Luo

Keywords Paper

0

0

0

0

20:01

14/06/2020

Spatio-Temporal Graph for Video Captioning With Knowledge Distillation

Boxiao Pan, Haoye Cai, De-An Huang and
Kuan-Hui Lee, Adrien Gaidon, Ehsan Adeli, Juan Carlos Niebles

Keywords Paper

video captioning, spatio-temporal graph, video understanding, vision and language, knowledge distillation, transformer, computer vision.

0

0

0

0

1:01

14/06/2020

Learning Multi-Granular Hypergraphs for Video-Based Person Re-Identification

Yichao Yan, Jie Qin, Jiaxin Chen and
Li Liu, Fan Zhu, Ying Tai, Ling Shao

Keywords Paper

person re-id, hypergraph, graph neural network

0

0

0

0

1:00

14/06/2020

Learning Visual Motion Segmentation Using Event Surfaces

Anton Mitrokhin, Zhiyuan Hua, Cornelia Fermüller, Yiannis Aloimonos

Keywords Paper

event camera, dynamic vision sensors, graph convolutional network, motion segmentation, surface normals, 3d point cloud segmentation, semantic segmentation, real time segmentation, event cloud processing, asynchronous signal

0

0

0

0

1:01

06/12/2021

Relational Self-Attention: What's Missing in Attention for Video Understanding

Manjin Kim, Heeseung Kwon, CHUNYU WANG and
Suha Kwak, Minsu Cho

Keywords Paper

deep learning, transformers

0

0

0

0

13:31

18/07/2021

Compositional Video Synthesis with Action Graphs

Amir Bar, Roei Herzig, Xiaolong Wang and
Anna Rohrbach, Gal Chechik, Prof. Darrell, Amir Globerson

Keywords Paper

Applications, Computer Vision

0

0

0

0

4:55

05/01/2021

A Multi-Task Learning Approach for Human Activity Segmentation and Ergonomics Risk Assessment

Behnoosh Parsa, Ashis G. Banerjee

Keywords Paper

0

0

0

0

4:53

07/09/2020

Refinement of Boundary Regression Using Uncertainty in Temporal Action Localization

Yunze Chen, Mengjuan Chen, Rui Wu and
Jiagang Zhu, Zheng Zhu, Qingyi Gu

Keywords Paper

Temporal Action Localization, Temporal Action Detection, Activity recognition and understanding

0

0

0

0

5:09

05/01/2021

PDAN: Pyramid Dilated Attention Network for Action Detection

Rui Dai, Srijan Das, Luca Minciullo and
Lorenzo Garattoni, Gianpiero Francesca, Francois Bremond

Keywords Paper

0

0

0

0

5:00

14/06/2020

Beyond Short-Term Snippet: Video Relation Detection With Spatio-Temporal Global Context

Chenchen Liu, Yang Jin, Kehan Xu and
Guoqiang Gong, Yadong Mu

Keywords Paper

video visual relation detection, visual relation detection, deep learning

0

0

0

0

1:01

22/11/2021

Knowing What, Where and When to Look: Video Action modelling with Attention

Juan-Manuel Perez-Rua, Brais Martinez, Xiatian Zhu and
Antoine S Toisoul, Victor A Escorcia, Tao Xiang

Keywords Paper

Action recognition, Fine-grained action, video attention, Spatial attention, Channel attention, Temporal attention, Spatio-temporal attention, Feature refinement

0

0

0

0

2:46

22/11/2021

Gradient Frequency Modulation for Visually Explaining Video Understanding Models

Xin Miao Lin, Wentao Bao, Matthew Wright, Yu Kong

Keywords Paper

model explanation, model explainability, explainable AI, video action recognition, Discrete Fourier Transform, video perturbation, interpretable machine learning, video model explanation, frequency modulation, spatiotemporal consistency

0

0

0

0

2:53

02/02/2021

Interpretable Graph Capsule Networks for Object Recognition

Jindong Gu

Keywords Paper

0

0

0

0

17:40

06/12/2021

Adaptive Data Augmentation on Temporal Graphs

Yiwei Wang, Yujun Cai, Yuxuan Liang and
Henghui Ding, Changhu Wang, Siddharth Bhatia, Bryan Hooi

Keywords Paper

deep learning, machine learning, graph learning

0

0

0

0

8:59

22/11/2021

An attention-driven hierarchical multi-scale representation for visual recognition

Zachary Wharton, Ardhendu Behera, Asish Bera

Keywords Paper

Hierarchical multiscale regions/patches, fine-grained visual classification, graph convolutional network, visual-spatial structural relationships, structure-driven message propagation, graph pooling, gated attention, graph-level prediction

0

0

0

0

3:03

02/02/2021

Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation

Xueyi Li, Tianfei Zhou, Jianwu Li and
Yi Zhou, Zhaoxiang Zhang

Keywords Paper

0

0

0

0

14:47

22/11/2021

Stacked Temporal Attention: Improving First-person Action Recognition by Emphasizing Discriminative Clips

Lijin Yang, Yifei Huang, Yusuke Sugano, Yoichi Sato

Keywords Paper

Egocentric action recognition, Action recognition, Temporal attention

0

0

0

0

3:01

02/02/2021

Graph-Enhanced Multi-Task Learning of Multi-Level Transition Dynamics for Session-based Recommendation

Chao Huang, Jiahui Chen, Lianghao Xia and
Yong Xu, Peng Dai, Yanqing Chen, Liefeng Bo, Jiashu Zhao, Jimmy Xiangji Huang

Keywords Paper

0

0

0

0

17:12

02/02/2021

Weakly Supervised Temporal Action Localization Through Learning Explicit Subspaces for Action and Context

Ziyi Liu, Le Wang, Wei Tang and
Junsong Yuan, Nanning Zheng, Gang Hua

Keywords Paper

0

0

0

0

19:49

26/04/2020

Adaptive Structural Fingerprints for Graph Attention Networks

Kai Zhang, Yaokang Zhu, Jun Wang, Jie Zhang

Keywords Paper

Graph attention networks, graph neural networks, node classification

0

0

0

0

3:47

02/02/2021

Multi-Scale Spatial Temporal Graph Convolutional Network for Skeleton-Based Action Recognition

Zhan Chen, Sicheng Li, Bing Yang and
Qinghan Li, Hong Liu

Keywords Paper

0

0

0

0

18:17

22/11/2021

BI-GCN: Boundary-Aware Input-Dependent Graph Convolution Network for Biomedical Image Segmentation

Yanda Meng, Hongrun Zhang, Dongxu Gao and
Yitian Zhao, Xiaoyun Yang, Xuesheng Qian, Xiaowei Huang, Yalin Zheng

Keywords Paper

Medical Image Segmentation, Graph Convolution Network

0

0

0

0

7:43

19/08/2021

Dig into Multi-modal Cues for Video Retrieval with Hierarchical Alignment

Wenzhe Wang, Mengdan Zhang, Runnan Chen and
Guanyu Cai, Penghao Zhou, Pai Peng, Xiaowei Guo, Jian Wu, Xing Sun

Keywords Paper

Computer Vision, Language and Vision, Deep Learning

0

0

0

0

9:07

14/06/2020

Visual-Semantic Matching by Exploring High-Order Attention and Distraction

Yongzhi Li, Duo Zhang, Yadong Mu

Keywords Paper

visual semantic matching, cross modal retrieval, scene graph, visual distraction, graph matching, gcn

0

0

0

0

1:01

05/01/2021

Towards Visually Explaining Video Understanding Networks With Perturbation

Zhenqiang Li, Weimin Wang, Zuoyue Li and
Yifei Huang, Yoichi Sato

Keywords Paper

0

0

0

0

4:53