Top Banner
Weakly-supervised 3D Hand Pose Estimation from Monocular RGB Images Presented by: Yujun Cai
21

Weakly-supervised 3D Hand Pose Estimation from Monocular … · 2019-04-30 · Convolutional Pose Machines [Wei. et al. CVPR 2016] Stacked Hourglass Networks [Newell et al. ECCV 2016]

Jul 17, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Weakly-supervised 3D Hand Pose Estimation from Monocular … · 2019-04-30 · Convolutional Pose Machines [Wei. et al. CVPR 2016] Stacked Hourglass Networks [Newell et al. ECCV 2016]

Weakly-supervised 3D Hand Pose Estimation

from Monocular RGB Images

Presented by: Yujun Cai

Page 2: Weakly-supervised 3D Hand Pose Estimation from Monocular … · 2019-04-30 · Convolutional Pose Machines [Wei. et al. CVPR 2016] Stacked Hourglass Networks [Newell et al. ECCV 2016]

Articulated Hand Pose Estimation

2

Output: estimated hand

joint locations which

represent the hand pose

Input: RGB/RGB-D/Depth

images containing human hand

with certain gesture

Figure from [Tompson et al. SIGGRAPH2014]

Page 3: Weakly-supervised 3D Hand Pose Estimation from Monocular … · 2019-04-30 · Convolutional Pose Machines [Wei. et al. CVPR 2016] Stacked Hourglass Networks [Newell et al. ECCV 2016]

Depth-based Approach

The recent several years have witnessed

a surging market of depth cameras and

wearable devices.

• Advantages:

Cheap

Provide 2.5D information

Achieve good performance

• Disadvantage:

Limited scenarios

Page 4: Weakly-supervised 3D Hand Pose Estimation from Monocular … · 2019-04-30 · Convolutional Pose Machines [Wei. et al. CVPR 2016] Stacked Hourglass Networks [Newell et al. ECCV 2016]

RGB-based Approach

2D Pose Estimation from single RGB images

Convolutional Pose Machines

[Wei. et al. CVPR 2016]

Stacked Hourglass Networks

[Newell et al. ECCV 2016]

Page 5: Weakly-supervised 3D Hand Pose Estimation from Monocular … · 2019-04-30 · Convolutional Pose Machines [Wei. et al. CVPR 2016] Stacked Hourglass Networks [Newell et al. ECCV 2016]

Monocular RGB-based 3D pose estimation

From 2D images to 3D skeleton results

Page 6: Weakly-supervised 3D Hand Pose Estimation from Monocular … · 2019-04-30 · Convolutional Pose Machines [Wei. et al. CVPR 2016] Stacked Hourglass Networks [Newell et al. ECCV 2016]

Challenge : Insufficient Datasets

For Real Dataset:

• Multi-view annotation method is labor-costing

• Reconstructed 3D labels may not be perfect

Directly annotate accurate 3D labels

Page 7: Weakly-supervised 3D Hand Pose Estimation from Monocular … · 2019-04-30 · Convolutional Pose Machines [Wei. et al. CVPR 2016] Stacked Hourglass Networks [Newell et al. ECCV 2016]

For synthetic dataset[Zimmermann et al. ICCV 2017]

Challenge : Insufficient Datasets

Synthetic data can get perfect 3D annotations while different from real ones

Page 8: Weakly-supervised 3D Hand Pose Estimation from Monocular … · 2019-04-30 · Convolutional Pose Machines [Wei. et al. CVPR 2016] Stacked Hourglass Networks [Newell et al. ECCV 2016]

Motivation

For RGB-based approaches

• Absence of real dataset with 3D annotations

• Domain gap between synthetic and real data

For Depth-based approaches

• Relatively better performance

• Limited application scenarios

Can we do RGB-based 3D hand pose estimation without complete 3D annotations

and take the advantages of depth-based methods?

Page 9: Weakly-supervised 3D Hand Pose Estimation from Monocular … · 2019-04-30 · Convolutional Pose Machines [Wei. et al. CVPR 2016] Stacked Hourglass Networks [Newell et al. ECCV 2016]

Motivation

• Is it possible to do 3D hand pose estimation without 3D annotations?

3D hand pose should be supervised by some constraints.

• What else can we leverage to constrain the 3D pose?

Depth map can serve as weak constraints for 3D pose.

?

Page 10: Weakly-supervised 3D Hand Pose Estimation from Monocular … · 2019-04-30 · Convolutional Pose Machines [Wei. et al. CVPR 2016] Stacked Hourglass Networks [Newell et al. ECCV 2016]

Motivation

Controlled by depth map references

• Weak supervision:

Add loss on reference depth maps instead of directly using 3D annotations.

• RGB and Depth enough for training a 3D hand pose estimation?

Get the relationships between 3D hand pose and the referenced depth maps.

Ensure the regression network to output meaningful 3D joints instead of intermediate features.

Leverage synthetic dataset

Page 11: Weakly-supervised 3D Hand Pose Estimation from Monocular … · 2019-04-30 · Convolutional Pose Machines [Wei. et al. CVPR 2016] Stacked Hourglass Networks [Newell et al. ECCV 2016]

System Overview

• In this work, we propose a weakly-supervised method leveraging

reference depth maps to alleviate the burden of 3D annotations.

• We use synthetic data and real data for fuse training.

• RGB-D for training but only RGB input for testing

Page 12: Weakly-supervised 3D Hand Pose Estimation from Monocular … · 2019-04-30 · Convolutional Pose Machines [Wei. et al. CVPR 2016] Stacked Hourglass Networks [Newell et al. ECCV 2016]

System Overview

• During testing, real images only go through the part of the network in the dot line box.

• Both synthetic and real data are utilized during training stage for fuse training.

Convolutional

Pose MachinesDepth

Regularizer

Shared

weights

syn

thre

al

Loss1

L2

3D joint locations

3D joint locations

2D joint heatmaps

2D joint heatmaps

Shared

weights

For synth: Loss = Loss1+Loss2+Loss3

For real : Loss = Loss1+Loss3

Loss2

sml1

Loss3

L1

Loss3

L1

Convolutional

Pose Machines

Loss1

L2

Regression

Network

Regression

Network

Shared

weights

Depth

Regularizer

Page 13: Weakly-supervised 3D Hand Pose Estimation from Monocular … · 2019-04-30 · Convolutional Pose Machines [Wei. et al. CVPR 2016] Stacked Hourglass Networks [Newell et al. ECCV 2016]

Depth Regularizer

Inspired from [Oberweger et al. ICCV2015]

• Generate a depth map from 3D joint locations

• Use transposed convolution to enlarge features

Page 14: Weakly-supervised 3D Hand Pose Estimation from Monocular … · 2019-04-30 · Convolutional Pose Machines [Wei. et al. CVPR 2016] Stacked Hourglass Networks [Newell et al. ECCV 2016]

Visualization Analysis

By adding depth regularizer constraints, 3D pose estimation results significantly improves the performance,

especially in global orientations.

Page 15: Weakly-supervised 3D Hand Pose Estimation from Monocular … · 2019-04-30 · Convolutional Pose Machines [Wei. et al. CVPR 2016] Stacked Hourglass Networks [Newell et al. ECCV 2016]

Datasets and Metrics

Synthetic dataset: RHD[Zimmermann et al. ICCV 2017]

Real dataset: STB[Zhang et al. ICIP 2017]

• Large variations in gesture and global

orientations

• Seems not quite “real”

• 3D annotations are utilized for

evaluations

Evaluation metrics:

• The area under the curve (AUC) on the percentage of correct keypoints (PCK) score.

• The higher the curve is, the better the performance is.

Page 16: Weakly-supervised 3D Hand Pose Estimation from Monocular … · 2019-04-30 · Convolutional Pose Machines [Wei. et al. CVPR 2016] Stacked Hourglass Networks [Newell et al. ECCV 2016]

Quantitative Results

• Weakly-supervised results with 2D

supervision and depth regularizer

• Fully-supervised method achieves

best performance

• Our weakly-supervised method with

depth regularizer significantly

improves results

Self-Comparisons

Page 17: Weakly-supervised 3D Hand Pose Estimation from Monocular … · 2019-04-30 · Convolutional Pose Machines [Wei. et al. CVPR 2016] Stacked Hourglass Networks [Newell et al. ECCV 2016]

Quantitative Results

• Red curve represents our fully-supervised

results

• Pink curve denotes our proposed weakly-

supervised results

• Other curves are mostly fully-supervised

methods.

Comparisons with state-of-the-art methods

Page 18: Weakly-supervised 3D Hand Pose Estimation from Monocular … · 2019-04-30 · Convolutional Pose Machines [Wei. et al. CVPR 2016] Stacked Hourglass Networks [Newell et al. ECCV 2016]

Experiment Results(Weakly-supervised)

RGB input Estimation Ground Truth

Page 19: Weakly-supervised 3D Hand Pose Estimation from Monocular … · 2019-04-30 · Convolutional Pose Machines [Wei. et al. CVPR 2016] Stacked Hourglass Networks [Newell et al. ECCV 2016]

Failed cases(Weakly-supervised)

RGB input Estimation Ground Truth

Page 20: Weakly-supervised 3D Hand Pose Estimation from Monocular … · 2019-04-30 · Convolutional Pose Machines [Wei. et al. CVPR 2016] Stacked Hourglass Networks [Newell et al. ECCV 2016]

Experiment results(Fully-supervised)