Abstract:
In this work, we introduce the Constrained first nearest neighbour Clustering (C1C) method for video face clustering. Using the premise that the first nearest neighbour (1NN) of an instance is sufficient to discover large chains and groupings, C1C builds upon the hierarchical clustering method FINCH by imposing must-link and cannot-link constraints acquired in a self-supervised manner. We show that adding these constraints leads to performance improvements with a low computational cost. C1C is easily scalable and does not require any training. Additionally, we introduce a new Friends dataset for evaluating the performance of face clustering algorithms. Given that most video datasets for face clustering are saturated or emphasize only the main characters, the Friends dataset is larger, contains identities for several main and secondary characters, and tackles more challenging cases as it labels also the `back of the head’. We evaluate C1C on the Big Bang Theory, Buffy, and Sherlock datasets for video face clustering, and show that it achieves the new state of the art whilst setting the baseline on Friends.