Top Banner
Articulated Human Pose Estimation by Deep Learning Wei Yang Supervisor: Xiaogang Wang, Wanli Ouyang [email protected]
32

Articulated human pose estimation by deep learning

Jan 12, 2017

Download

Science

Wei Yang
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Articulated human pose estimation by deep learning

Articulated Human Pose Estimation by Deep Learning

Wei YangSupervisor: Xiaogang Wang, Wanli Ouyang

[email protected]

Page 2: Articulated human pose estimation by deep learning

05/01/2023 2

Outline

• Introduction• Regression by Convolutional Neural Network• Deformable Convolutional Neural Networks• Discussion and Future work

Page 3: Articulated human pose estimation by deep learning

05/01/2023 3

Introduction

Articulated body pose estimation “recovers the pose of an articulated body, which consists of joints and rigid parts using image-based observations.”

Page 4: Articulated human pose estimation by deep learning

05/01/2023 4

Applications

Action recognition

Human tracking

Clothing Parsing

Gaming

Page 5: Articulated human pose estimation by deep learning

05/01/2023 5

Challenges

Page 6: Articulated human pose estimation by deep learning

05/01/2023 6

Classic Approaches

Fischler & Elschlager 1973 Felzenszwalb & Huttenlocher 2005

Pictorial Structure• Unary Templates• Pairwise Springs

Yang & Ramanan 2011

Mixtures of “mini-parts”• Mixture of part • Unary template for part with mixture • Pairwise springs between part with

mixture and part with mixture

Page 7: Articulated human pose estimation by deep learning

05/01/2023 7

Deep Learning Methods

Multi-source Deep Learning • Candidate estimations• Deep model uses multi-source

including appearance score, mixture type, and deformation.

Ouyang et al. 2014

Deeppose• Reasoning pose in a holistic fashion • refines the joint predictions by using

higher resolution sub-images

Toshev & Szegedy 2014

Page 8: Articulated human pose estimation by deep learning

05/01/2023 8

We propose to study pose estimation in two ways

• Holistic View–Regression of joint locations by convolutional neural

networks (CNNs)

• Local information–Deformable Convolutional Neural Networks

Page 9: Articulated human pose estimation by deep learning

05/01/2023 9

Regression by Convolutional Neural Network

Page 10: Articulated human pose estimation by deep learning

05/01/2023 10

Formulation

• Image: • Part location:

𝜓 (𝐼 ;𝜃)=𝐩

Location of part :

Learned by deep CNN

Page 11: Articulated human pose estimation by deep learning

05/01/2023 11

Basic Architecture of the CNN Regressor

• AlexNet – Krizhevsky, Sutskever, and Hinton, NIPS 2012

– The first time deep model is shown to be effective on large scalecomputer vision task.

Page 12: Articulated human pose estimation by deep learning

05/01/2023 12

Normalize Scale of Human Body

• Size of the CNN input is fixed• Simple warping changes the aspect ratio of people

• People appear at different scales of an image

1. Original image 2. Human detection[Ouyang et al. CVPR 2014]

3. Crop by bbox 4. Padding with mean RGB value

Page 13: Articulated human pose estimation by deep learning

05/01/2023 13

Architecture 1

• Loss function:• Evaluation metric: PCP

Method Torso Head U.Leg L.Leg U.Arm L.Arm Mean PCP

Yang&Ramanan 84.1 77.1 69.5 65.6 52.5 35.9 60.8

Conv5 58.8 24.1 49.6 36.6 25.8 2.8 31.3

Page 14: Articulated human pose estimation by deep learning

05/01/2023 14

Architecture 2

Method Torso Head U.Leg L.Leg U.Arm L.Arm Mean PCP

Yang&Ramanan 84.1 77.1 69.5 65.6 52.5 35.9 60.8

Conv5 58.8 24.1 49.6 36.6 25.8 2.8 31.3

Fc8 (AlexNet) 81.1 63.7 72.8 66.6 50.6 21.9 56.9

• Loss function:• Evaluation metric: PCP

Page 15: Articulated human pose estimation by deep learning

05/01/2023 15

Architecture 3

Method Torso Head U.Leg L.Leg U.Arm L.Arm Mean PCP

Yang&Ramanan 84.1 77.1 69.5 65.6 52.5 35.9 60.8

Conv5 58.8 24.1 49.6 36.6 25.8 2.8 31.3

Fc8 (AlexNet) 81.1 63.7 72.8 66.6 50.6 21.9 56.9

Fc10 84.1 68.8 76.8 69.4 54.9 26.8 60.9

• Loss function:• Evaluation metric: PCP

Page 16: Articulated human pose estimation by deep learning

05/01/2023 16

PCP and PDJ on LSP

# Method Torso Head U.Leg L.Leg U.Arm L.Arm Mean PCP

ours

1 Conv5 58.8 24.1 49.6 36.6 25.8 2.8 31.3

2 Fc8 (AlexNet) 81.1 63.7 72.8 66.6 50.6 21.9 56.9

3 Fc8(LSP-extend) 83.1 67.2 75.0 68.7 53.4 25.6 59.6

4 Fc10 84.1 68.8 76.8 69.4 54.9 26.8 60.9

5 Fc10 (Fusion) 84.8 71.8 77.6 71.2 55.9 29.2 62.5

State-of-the-art

methods

6 Yang&Ramanan 84.1 77.1 69.5 65.6 52.5 35.9 60.8

7 Ouyang et al. 85.8 83.1 76.5 72.2 63.3 46.6 68.6

Page 17: Articulated human pose estimation by deep learning

05/01/2023 17

Results on LSP dataset

Page 18: Articulated human pose estimation by deep learning

05/01/2023 18

Failure Cases

• articulation• fore-shortening• occlusions and distractions• cluttered background or overlapping people

Page 19: Articulated human pose estimation by deep learning

05/01/2023 19

Deformable Convolutional Neural Networks

Page 20: Articulated human pose estimation by deep learning

05/01/2023 20

Motivation

• Local image patches are able to capture:– Part presence

– Pairwise part spatial relationships

Number of mixture type for each pair: 6

Neighbor: 1# of relationships:

Neighbor: 2# of relationships:

Lower arm

Upper arm

[Chen & Yuille NIPS 2014]

Page 21: Articulated human pose estimation by deep learning

05/01/2023 21

Tree-structured Relational Graph

– : positions of body parts

– : pairwise relationships between parts

– : Pixel location of part

– Pairwise relationship

– Defined by relative position

– In experiment: 13 type for each pair

Page 22: Articulated human pose estimation by deep learning

05/01/2023 22

Formulation

𝐹 (𝐩 ,𝐭|𝐼 ;𝝎 ,𝜃 )¿∑𝑖∈𝑉

𝐴𝑖(𝑝𝑖∨𝐼 ;𝜃)

Part presence

𝜔 𝑖 ⋅

Inference: • Tree structure• Can be solved efficiently by dynamic programming

, , are currently learned by Latent structure SVM

+ ∑(𝑖 , 𝑗 )∈𝐸

𝑅 (𝑝𝑖 ,𝑝 𝑗 , 𝑡𝑖𝑗 , 𝑡 𝑗𝑖∨𝐼 ;𝜃)

Pairwise deformation

+𝝎𝑖𝑗𝑡𝑖𝑗 ⋅𝜔 𝑖𝑗 ⋅

Pairwise Relationship

Page 23: Articulated human pose estimation by deep learning

05/01/2023 23

Learning parameters

Derive the type label for each patch• use relative position to represent the

pairwise relations• Cluster the relative positions over the

whole training set • Type label : cluster index• Mean relative position : cluster center

Page 24: Articulated human pose estimation by deep learning

05/01/2023 24

Casting Full Connections into Convolutions

Elbow

Part presence map

Pairwise relationship map

Page 25: Articulated human pose estimation by deep learning

05/01/2023 25

PCP and PDJ on LSP dataset and FLIC dataset

Dataset Method Torso Head U.Leg L.Leg U.Arm L.Arm Mean PCP

LSPDCNN 92.5 85.1 82.7 76.3 70.2 55.9 74.8

Ouyang et al. 85.8 83.1 76.5 72.2 63.3 46.6 68.6

FLIC DCNN 87.0 98.8 - - 96.5 84.0 91.1

LSP FLIC

Page 26: Articulated human pose estimation by deep learning

05/01/2023 26

Future work

Page 27: Articulated human pose estimation by deep learning

05/01/2023 27

Future Work

• Build end-to-end system to estimate human pose

• Consider combining local information and holistic view• Beyond tree structure

Page 28: Articulated human pose estimation by deep learning

Thank you

Articulated Human Pose Estimation by Deep Learning

Page 29: Articulated human pose estimation by deep learning

05/01/2023 29

Appendix

Data AugmentationEvaluation Metrics

Page 30: Articulated human pose estimation by deep learning

05/01/2023 30

Data Augmentation

• The number of training data of existing datasets are insufficient to train deep CNNs– Statistics of existing datasets

– Number of parameters of AlexNet: 60 million

• Data augmentation is efficient to prevent overfitting

Dataset # Training images

# Testing images

Type

PARSE 100 205 Full body

LSP 1,000 1,000 Full body

LSP extend 10,000 - Full

FLIC 3,987 1,016 Upper body

MPII 28,821 11,701 Full body

Page 31: Articulated human pose estimation by deep learning

05/01/2023 31

Data Augmentation (cont.)

• Random padding

• Rotating– ±[2.5◦, 5◦, 7.5◦, 10◦, 15◦, 20◦]

• Flipping

Page 32: Articulated human pose estimation by deep learning

05/01/2023 32

Evaluation Metrics

• Percentage of Correct Parts (PCP)– measures the percentage of correctly localized body parts.

– A candidate body part is treated as correct if its segment endpoints lie within 50% of the length of the ground-truth annotated endpoints.

• Percentage of Detected Joints (PDJ)– measures the performance using a curve of the percentage of correctly localized

joints by varying localization precision threshold, which is normalized by the scale defined as distance between left shoulder and right hip

– invariant to scale