Abstract:
Detecting interaction groups is an essential task for understanding human behaviours and social activities. However, it is still challenging to identify social interactions and the resulting crowd groups using purely visual cues, especially from single images. Prior works either require additional statistics, such as interpersonal angles and kinaesthetic information, or simply deduce the group memberships with the similarity of individual actions. In this paper, we present the Psychology-inspired Relation Network (PRN) to comprehensively understand the static social scenes and effectively model the interaction relations between individuals. More concretely, stimulated by recent advances in social psychology, we first predict the keypoint heatmap from an image with the human regions of interest as the visual representations of the key factors determining interaction groups: distance, orientation and postural openness. We then incorporate the personal and mutual influences together to compute the interaction strength matrix via self-attention, and finally utilise a perception to convert this matrix into dyadic interaction probability. Moreover, we devise two loss functions, the dyad loss to optimise the dyadic interaction probability and the group loss to enhance the distinguishability among different social groups. To evaluate the performance of PRN, we introduce a novel dataset containing various scenes with different crowd densities, by merging representative databases and relabeling the group labels. Our method achieves outstanding results on the proposed dataset.