Abstract:
Although great progress has been made in both Image-based person Re-IDentification (IReID) and Video-based person Re-IDentification (VReID), the misalignment problem resulted from some complex factors, e.g., variations of pose, non-rigid deformation of the body, etc., still makes these two tasks very challenging. Some recent IReID studies employ the human parsing model or self-attention mechanisms to capture the human part features or emphasize the Discriminative Clues (DCs), whereas these methods heavily rely on the external annotations, and the DCs are not aligned well. Moreover, these IReID models lack generalization ability and cannot achieve promising performances on the VReID benchmarks yet. To this end, we propose a Discriminative Clue Alignment Network (DCANet), along with a discrimination constraint, to automatically identify various DCs and then align them into a fixed pattern, without requiring additional annotations. The experiments on three popular VReID benchmarks show that even with a simple temporal feature aggregation method, DCANet can still achieve state-of-the-art performances. Moreover, the evaluation on the public IReID datasets also verifies that our method is very competitive.