Abstract:
3D Hand Pose Estimation from a single RGB image is a challenging task due to the significant depth ambiguities and occlusions. In this paper, we propose a Privileged Modality Distillation Network (PMD-Net), which improves the RGB-based hand pose estimation by excavating the privileged information from depth prior during training. Different from existing methods, the PMD-Net is composed of three sub-networks to regress X, Y, and Z coordinates respectively and distills the privileged information from the depth network to the RGB network by transferring constraints between corresponded layers. Furthermore, a random block replacement is adopted and a refine module is added to enhance the robustness of PMD-Net. Experiments on both synthesized and real-world hand pose estimation datasets are conducted, and extensive results demonstrate that the proposed PMD-Net achieves state-of-the-art results and is superior to existing methods.