Abstract:
While the abuse of deepfake technology has brought about a serious impact on human society, the detection of deepfake videos is still very challenging due to their highly photorealistic synthesis on each frame. To address that, this paper aims to leverage the possible inconsistent cues among video frames and proposes a Temporal Dropout 3-Dimensional Convolutional Neural Network (TD-3DCNN) to detect deepfake videos. In the approach, the fixed-length frame volumes sampled from a video are fed into a 3-Dimensional Convolutional Neural Network (3DCNN) to extract features across different scales and identified whether they are real or fake. Especially, a temporal dropout operation is introduced to randomly sample frames in each batch. It serves as a simple yet effective data augmentation and can enhance the representation and generalization ability, avoiding model overfitting and improving detecting accuracy. In this way, the resulting video-level classifier is accurate and effective to identify deepfake videos. Extensive experiments on benchmarks including Celeb-DF(v2) and DFDC clearly demonstrate the effectiveness and generalization capacity of our approach.