Abstract by Fanqing Lin
Two-hand 3D Pose Estimation using Monocular RGB
We tackle the challenging task of estimating global 3D joint locations for both hands via only monocular RGB input images. We propose a novel multi-stage convolutional neural network based pipeline that accurately segments and locates hands despite occlusion between two hands and complex background noise, estimates the 2D and 3D canonical joint locations without any depth information. Global joint locations with respect to the camera origin are computed using the hand pose estimations and the actual length of the key bones. To train the CNNs for this new task, we introduce a large-scale synthetic 3D hand pose dataset. We demonstrate that our system outperforms previous works on RGB-only 3D canonical hand pose estimation benchmark datasets. Additionally, we present the first work that achieves accurate global 3D hand tracking on both hands using RGB-only inputs and provide extensive quantitative and qualitative evaluation.