Abstract:
Face biometrics have attracted significant attention in many security-based applications. The presentation attack (PA) or face spoofing is a cybercriminal attempt to gain illegitimate access to a victim's device using photos, videos, or 3D artificial masks of the victim's face. Various deep learning approaches can tackle particular PA attacks when tested on standard datasets. However, these methods fail to generalize to complex environments or unseen datasets. We propose a new Multi-Teacher Single-Student (MTSS) visual Transformer with a multi-level attention design to improve the generalizability of face spoofing detection. Then, a novel Multi-Level Attention Module with a DropBlock (MAMD) is designed to strengthen discriminative features while dropping irrelevant spatial features to avoid overfitting. Finally, these rich convolutional feature sets are combined and fed into the MTSS network for face spoofing training. With this MAMD module, our method survives well under small training datasets with poorly lighted conditions. Experimental results demonstrate the superiority of our method when compared with several anti-spoofing methods on four datasets (CASIA-MFSD, Replay-Attack, MSU-MFSD, and OULU-NPU). Furthermore, our model can run on Jetson TX2 up to 80 FPS for real-world applications.