Abstract by Fanqing Lin
Two-hand 3D Tracking using Monocular RGB
We tackle the challenging task of tracking global 3D joint locations for both hands only via monocular RGB input sequence. We propose a multi-stage network pipeline that accurately segments hands despite severe self-occlusion between two hands, estimates the 2D and projects the 3D canonical joint locations without any depth information. Global joint locations with respect to the camera origin are computed using the hand pose estimations and the actual bone lengths as a final step. We introduce a large-scale synthetic 3D hand pose dataset for training the networks since no existing dataset provide sufficient data on both hands. In order to breach the domain gap between synthetic and realistic input images, we propose a novel segmentation architecture that enables joint training in both domains. Our system outperforms previous works on RGB-based 3D hand pose estimation benchmark datasets. Additionally, we present the first work that achieves accurate global 3D hand tracking on both hands using monocular RGB input.