Abstract:
Using data generated by generative adversarial networks or three-dimensional (3D) technology for face recognition training is a theoretically reasonable solution to the problems of unbalanced data distributions and data scarcity. However, due to the modal difference between synthetic data and real data, the direct use of data for training often leads to a decrease in the recognition performance, and the effect of synthetic data on recognition remains ambiguous. In this paper, after observing in experiments that modality information has a fixed form, we propose a demodalizing face recognition training architecture for the first time and provide a feasible method for recognition training using synthetic samples. Specifically, three different demodalizing training methods, from implicit to explicit, are proposed. These methods gradually reveal a generated modality that is difficult to quantify or describe. By removing the modalities of the synthetic data, the performance degradation is greatly alleviated. We validate the effectiveness of our approach on various benchmarks of large-scale face recognition and outperform the previous methods, especially in the low FAR range.