Abstract:
In this paper, we investigate the cause of the high false positive rate in Visual Relationship Detection (VRD). We observe that during training, the relationship proposal distribution is highly imbalanced: most of the negative relationship proposals are easy to identify, e.g., the inaccurate object detection, which leads to the under-fitting of low-frequency difficult proposals. This paper presents Spatially-Aware Balanced negative pRoposal sAmpling (SABRA), a robust VRD framework as a proof of concept that alleviates the influence of false positives. To effectively optimize the model under imbalanced distribution, SABRA adopts Balanced Negative Proposal Sampling (BNPS) strategy for mini-batch sampling. BNPS divides proposals into 5 well-defined sub-classes and generates a balanced training distribution. To further resolve the low-frequency challenging false positive proposals with high spatial ambiguity, we adopt a spatial learning module that implicitly imposes the object-centric spatial configuration with a spatial mask decoder, using the global spatial features extracted with Graph Neural Networks. SABRA is conceptually simple and outperforms SOTA methods by a large margin on two human-object interaction (HOI) datasets and one general VRD dataset.