Abstract:
In this work, we extend a method originally devised for 3D body pose estimation to tackle the 3D hand pose estimation task. Due to its compositionality and compact Bio Vision Hierarchy (BVH) output, the resulting method can be combined with the original body 3D pose estimation method. This is achieved based on a novel neural network architecture combining key design characteristics of DenseNets, ResNets and MocapNETs trainable to accommodate both bodies and hands. The resulting method is assessed quantitatively in well-established hand and body pose estimation datasets. The obtained results show that the proposed enhancements result in competitive performance for hands, as well as on accuracy and performance benefits for the original body estimation task. Moreover, we show qualitatively that due to its real-time performance and easy deployment using off-the-shelf webcam equipped PCs, the proposed solution can become a valuable perceptual building block supporting a variety of applications.