22/11/2021

End-to-End Object Detection with Adaptive Clustering Transformer

Minghang Zheng, Peng Gao, Renrui Zhang, Kunchang Li, Hongsheng Li, Hao Dong

Keywords: transformer, object detection

Abstract: End-to-end Object Detection with Transformer (DETR) performs object detection with Transformer and achieves comparable performance with two-stage object detection like Faster-RCNN. However, DETR needs huge computational resources for training and inference due to the high-resolution spatial inputs. In this paper, a novel variant of transformer named Adaptive Clustering Transformer (ACT) has been proposed to reduce the computation cost for high-resolution input. ACT clusters the query features adaptively using Locality Sensitive Hashing (LSH) and approximates the query-key interaction using the prototype-key interaction. ACT can reduce the quadratic O(N^2) complexity inside self-attention into O(NK) where K is the number of prototypes in each layer. ACT can be a drop-in module replacing the original self-attention module without any training. ACT achieves a good balance between accuracy and computation cost (FLOPs). The code is available as supplementary for the ease of experiment replication and verification.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at BMVC 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd Characters remaining: 140

Similar Papers