Umar Iqbal 1,2 , Pavlo Molchanov 2 , Thomas Breuel 2 , Juergen Gall 1 and Jan Kautz 2 Hand Pose Estimation via 2.5D Latent Heatmap Regression 2 NVIDIA Research 1 Computer Vision Group, University of Bonn, Germany [email protected] [email protected] Comparison with the state-of-the-art Normalized relative depth Loss function Ablative Studies learnable parameter controls spread Hadamard product [2] C. Zimmerman and T. Brox. Learning to estimate 3D hand pose from a single Image. In ICCV'17 [4] F. Mueller, F. Bernard, O. Sotnychenko, D. Mehta, S. Sridhar, D. Casas, C. Theobalt. GANerated hands for real-time 3D hand tracking from monocular RGB. In CVPR'18. [3] T. Simon, H. Joo, I. Mattews, Y. Sheikh. Hand keypoint detection in single images using multiview bootstrapping. In CVPR'17. References [1] J. Zhang, J. Jiao, M. Chen, L. Qu, X. Xu, Q. Yang. 3D hand pose tracking and estimation using stereo matching. ArXiv'16. Problem: Introduction Challenges Large amounts of appearance variation and self occlusions 2D and 3D hand pose estimation Occlusion due to interaction with objects Complex hand articulations Motivation An exact approach to reconstruct 3D hand pose from 2.5D pose Overview Results 3D Pose Reconstruction 3D pose estimation is an ill-posed problem due to scale and depth ambiguities 2.5D Pose Representation A 2.5D pose representation that can be estimated easily from an RGB image A 2.5D heatmap representation to enable accuract keypoint localization Contributions A CNN architecture to regress 2.5D heatmaps in a latent way A view-agnostic approach for monocular 2D and 3D hand pose estimation VR/AR human-machine interactions gaming recognition sign-language 2D pixel coordinates root-relative depth 1 0 -1 Scale Normalization Scale Recovery Latent Direct Latent 2.5D Heatmap Regression Given 2.5D pose, we need to find the depth of the root keypoint to reconstruct the scale normalized 3D pose. Given and there exists a unique 3D pose that satisfies: The coefficients of the quadratic equation: Mean bone length Kinematic structure of the hand The equation can be rewritten in terms of the 2D projections , and relative depths , as follows: 2D Heatmaps 2D Coordinates Stereo Hand Pose Ego-Dexter MPII+NZSL Dexter-Object Dexter-Object Ego-Dexter Comparison with direct heatmaps