Top Banner
Human Pose Estimation and Action Recognition Gang Yu, Megvii (Face++) Junsong Yuan, SUNY Buffalo Zicheng Liu, Microsoft ICIP 2019 Tutorial
77

Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Jul 13, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Human Pose Estimation and Action Recognition

Gang Yu, Megvii (Face++)

Junsong Yuan, SUNY Buffalo

Zicheng Liu, Microsoft

ICIP 2019 Tutorial

Page 2: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Overview

• Part1: Human Pose Estimation• 2D Skeleton

• Top-Down• Bottom-Up

• 3D Skeleton• 2D -> 3D Skeleton• 2D -> 3D Shape

• Application

• Part2: Action Recognition

– Datasets

• RGB

• RGB-D

– Skeleton based

approaches

• 2D and 3D skeletons

– Video based approaches• 2D/3D CNN features

Page 3: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Gang Yuy u g a n g @ m e g v i i . c o m

Human Pose EstimationAlgorithm and Application

Page 4: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Outline

• Introduction to Human Pose Estimation• 2D Skeleton

• Top-Down• Bottom-Up

• 3D Skeleton• 2D -> 3D Skeleton• 2D -> 3D Shape

• Application• Conclusion

Page 5: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Outline

• Introduction to Human Pose Estimation• 2D Skeleton

• Top-Down• Bottom-Up

• 3D Skeleton• 2D -> 3D Skeleton• 2D -> 3D Shape

• Application• Conclusion

Page 6: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

What is Human Pose Estimation?

Page 7: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Benchmark and Evaluation

• Benchmark• Single-person Estimation

• MPII, FLIC, LSP, LIP• Multi-person Keypoint Detection

• COCO, CrowdPose• Video

• PoseTrack• 3D

• Human3.6M, DensePose• Evaluation on COCO

Page 8: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Outline

• Introduction to Human Pose Estimation• 2D Skeleton

• Top-Down• Bottom-Up

• 3D Skeleton• 2D -> 3D Skeleton• 2D -> 3D Shape

• Application• Conclusion

Page 9: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

2D Skeleton: How to Do Pose Estimation

• Top-down Approach VS Bottom-up Approach

• Top-down• Mask R-CNN, CPN, MSPN• High Performance (good localization ability), High Recall

• Bottom-up• Openpose, Associative Embeding• Clean framework, potentially fast speed

Human Head

L-Arm

Top-down

Bottom-up

Mask R-CNN, Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick, ICCV 2018

Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun, CVPR 2018

Rethinking on Multi-Stage Networks for Human Pose Estimation, Wenbo Li, Zhicheng Wang, Binyi Yin, Qixiang Peng, Yuming Du, Tianzi Xiao, Gang Yu, Hongtao Lu,

Yichen Wei, Jian Sun

OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields, Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, Yaser Sheikh,

Associative Embedding: End-to-End Learning for Joint Detection and Grouping, Alejandro Newell, Zhiao Huang, Jia Deng, NIPS 2017

Page 10: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Challenges

• Ambiguous Appearance• Crowd Case• Large Pose• Inference Speed

Page 11: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Top-Down: Mask R-CNN

Mask R-CNN, Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick, ICCV 2017

• Motivation:• Multi-task learning• ROI Pool -> ROI Align

Page 12: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Top-Down: Mask R-CNN

Mask R-CNN, Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick, ICCV 2017

• Experiments on COCO Skeleton:

Page 13: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Top-Down: Hourglass

Stacked Hourglass Networks for Human Pose Estimation, Alejandro Newell, Kaiyu Yang, and Jia Deng, ECCV 2016

• Motivation:• Crop & Single Person Skeleton• Multi-stage context refinement

Page 14: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Top-Down: Hourglass

• Structure of a one block

Stacked Hourglass Networks for Human Pose Estimation, Alejandro Newell, Kaiyu Yang, and Jia Deng, ECCV 2016

Page 15: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Top-Down: Hourglass

• Experiments

Stacked Hourglass Networks for Human Pose Estimation, Alejandro Newell, Kaiyu Yang, and Jia Deng, ECCV 2016

Page 16: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Top-Down: Single Person Skeleton: CPM

• Motivation:• Multi-stage context refinement• Large receptive Field -> long range spatial relationship

Convolutional Pose Machines, Shih-En Wei, Varun Ramakrishna, Takeo Kanade, Yaser Sheikh, CVPR 2016

Page 17: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Top-Down: Cascade Pyramid Network

• Motivation: How to locate the “hard” joints• Human perspective

Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun, CVPR 2018

Page 18: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Top-Down: Cascade Pyramid Network

• Motivation: How to locate the “hard” joints• Human perspective

Left elbow

Right hand

What ?

What?

Nose ✓

easy visible parts

Visible easy keypoints

Page 19: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Top-Down: Cascade Pyramid Network

• Motivation: How to locate the “hard” joints• Human perspective

easy visible parts

Left elbow

Right hand

hard visible parts

What ?

Visible easy keypoints

enlarge view

context

Left knee

Right knee

Left hip

What?

Nose

enlarge view hard to

distinguish?

Visible hard

keypoints

Page 20: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Top-Down: Cascade Pyramid Network

• Motivation: How to locate the “hard” joints• Human perspective

easy visible parts

Left elbow

Right hand

hard visible parts

What ?

Visible easy keypoints

enlarge view

context

Left knee

Right knee

Left hip

Invisible part

What?

context

Right

shoulder

Nose

enlarge view hard to

distinguish?

Visible hard

keypoints

Page 21: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Top-Down: Cascade Pyramid Network

• Motivation: How to locate the “hard” joints• Human perspective: Coarse to Fine

coarse parts

fine parts

Input image Output imagereceptive view getting larger

& more context

Page 22: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Network Design Principles:

● Inspired by the process of human locating keypoints and adjusted to CNN network

○ locate easy parts => locate hard parts

● Two stages

○ GlobalNet: to locate the easy parts (Vanilla L2 loss)

○ RefineNet: to locate hard parts (deep layers) with online hard keypoint mining(Hard Mining Loss)

Network Architecture

Page 23: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

52.149.344.341.136.3Det mAP

Keypoint mAP

Experiments: Person Detector

68.8 69.4 69.7 69.8 69.8

Page 24: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Experiments: Online Hard Keypoints Mining

Page 25: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Experiments: Design Choices of GlobalNet & RefineNet

Page 26: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Experiments

Page 27: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Summary for CPN

• Hard Keypoints with Coarse-to-fine Strategy (context)• Code: https://github.com/chenyilun95/tf-cpn• MS COCO2017 Challenge Winner

Page 28: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Top-Down: A Simple Baseline

Simple Baselines for Human Pose Estimation and Tracking, Bin Xiao, Haiping Wu, Yichen Wei, ECCV 2018

• Motivation• Simple Baseline & OKS based tracking• Spatial Resolution

Page 29: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Top-Down: A Simple Baseline

• Experiments on COCO and PoseTrack

Simple Baselines for Human Pose Estimation and Tracking, Bin Xiao, Haiping Wu, Yichen Wei, ECCV 2018

Page 30: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Top-Down: HRNet

Deep High-Resolution Representation Learning for Human Pose Estimation, Ke Sun, Bin Xiao, Dong Liu, Jingdong Wang, CVPR2019

• Motivation• High Resolution Feature maps

Page 31: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Top-Down: HRNet

Deep High-Resolution Representation Learning for Human Pose Estimation, Ke Sun, Bin Xiao, Dong Liu, Jingdong Wang, CVPR2019

Page 32: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Top-Down: HRNet

Deep High-Resolution Representation Learning for Human Pose Estimation, Ke Sun, Bin Xiao, Dong Liu, Jingdong Wang, CVPR2019

• Experiments

Page 33: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Top-Down: Multi-stage Pose Estimation

• Motivation• Upperbound• Only Two-stages available (limited Context)

Rethinking on Multi-Stage Networks for Human Pose Estimation, Wenbo Li, Zhicheng Wang, Binyi Yin, Qixiang Peng, Yuming Du, Tianzi Xiao, Gang Yu, Hongtao Lu, Yichen Wei, Jian Sun

Page 34: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Top-Down: Multi-stage Pose Estimation

• Method• Coarse-to-fine with better information flow• Involve more stages

Page 35: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Top-Down: Multi-stage Pose Estimation

• Cross Stage Feature Aggregation• Coarse-to-fine Supervision

Page 36: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Experiments: More Stages

Page 37: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Experiments: CTF & CSFA

Page 38: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Experiments: COCO test-dev

Page 39: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Experiments: COCO test-Challenge

Page 40: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Summary for MSPN

• Refined Coarse-to-fine Strategy• Code: https://github.com/megvii-detection/MSPN• MS COCO2018 Challenge Winner

Page 41: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Bottom-Up: DeepCut

• Motivation• Part Detector• Assemble (Integer Linear Optimization)

DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation, Leonid Pishchulin, Eldar Insafutdinov, Siyu Tang, Bjoern Andres, Mykhaylo Andriluka, Peter Gehler, Bernt Schiele, CVPR 2016

Page 42: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Bottom-Up: DeeperCut

• Motivation• Deeper Part Detector + Assemble (image-conditioned

pairwise terms + incremental optimization)

DeeperCut: A Deeper, Stronger, and Faster Multi-Person Pose Estimation Model, Eldar Insafutdinov, Leonid Pishchulin, Bjoern Andres, Mykhaylo Andriluka, Bernt Schiele, ECCV2016

Page 43: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields, Zhe Cao, Tomas Simon, Shih-En Wei, Yaser Sheikh, CVPR 2017

Bottom-Up: OpenPose

• Motivation• Part Detector (CPM) + Assemble (PAF)

Page 44: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields, Zhe Cao, Tomas Simon, Shih-En Wei, Yaser Sheikh, CVPR 2017

Bottom-Up: OpenPose

• Motivation• Part Detector (CPM) + Assemble (PAF)

Page 45: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields, Zhe Cao, Tomas Simon, Shih-En Wei, Yaser Sheikh, CVPR 2017

Bottom-Up: OpenPose

• Experiments on MPI and COCO

Page 46: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Associative Embedding: End-to-End Learning for Joint Detection and Grouping, Alejandro Newell, Zhiao Huang, Jia Deng, NIPS 2017

Bottom-Up: Associative Embedding

• Motivation• Part Detector (Hourglass) + Assemble (AE)

Page 47: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Associative Embedding: End-to-End Learning for Joint Detection and Grouping, Alejandro Newell, Zhiao Huang, Jia Deng, NIPS 2017

Bottom-Up: Associative Embedding

• Motivation• Part Detector (Hourglass) + Assemble (AE)

Page 48: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Associative Embedding: End-to-End Learning for Joint Detection and Grouping, Alejandro Newell, Zhiao Huang, Jia Deng, NIPS 2017

Bottom-Up: Associative Embedding

• Experiments on MPI and COCO

Page 49: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Bottom-Up: Azure Kinect

Page 50: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Azure Kinect DK

Build computer vision and speech

models using a developer kit with

advanced AI sensors

• Get started with a range of SDKs,

including an open-source Sensor

SDK.

• Experiment with multiple modes

and mounting options.

• Add cognitive services and manage

connected PCs with easy Azure

integration.

Page 51: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Azure Kinect Body Tracking SDK

• Bottom up approach• On IR image

• Insensitive to environment lighting• DNN outputs

• Heat map• Part Affinity Field• Part Segmentation Map

• SDK outputs• 3D skeletons• Instance segmentation

Page 52: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Neural Network

Contact: Lijuan Wang

Last Updated: April 20, 2019

Page 53: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,
Page 54: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Summary for 2D Skeleton

• Top-down vs Bottom-up• Top-down: Context & spatial resolution • Bottom-up: Assemble• Remaining issues

• Crowd• Spatial resolution• Speed

Page 55: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Outline

• Introduction to Human Pose Estimation• 2D Skeleton

• Top-Down• Bottom-Up

• 3D Skeleton• 2D -> 3D Skeleton• 2D -> 3D shape

• Application• Conclusion

Page 56: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Benchmark: H3.6M

• Large-scale Constrained 3D Skeleton benchmark

• 3.6M human pose

• Evaluations

• Protocol 1: Six subjects (S1, S5, S6, S7, S8, S9) are used in training. Evaluation is performed on every 64th frame of Subject 11’s videos. Alignment is used.

• Protocol 2: Five subjects (S1, S5, S6, S7, S8) are used for training. Evaluation is performed on every 64th frame of two subjects (S9, S11)

http://vision.imar.ro/human3.6m/description.php

Page 57: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

3D Skeleton: 3D Human Pose Estimation = 2D Pose Estimation + Matching

• Motivation• 3D = 2D CNN + NN Match

https://zpascal.net/cvpr2017/Chen_3D_Human_Pose_CVPR_2017_paper.pdf

3D Human Pose Estimation = 2D Pose Estimation + Matching, Ching-Hang Chen Deva Ramanan, CVPR2017

• Split or Joint Training

• 3D structure: 2D Joints

Page 58: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

3D Skeleton: 3D Human Pose Estimation = 2D Pose Estimation + Matching

• Experiments

https://zpascal.net/cvpr2017/Chen_3D_Human_Pose_CVPR_2017_paper.pdf

3D Human Pose Estimation = 2D Pose Estimation + Matching, Ching-Hang Chen Deva Ramanan, CVPR2017

Page 59: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

3D Skeleton: A simple yet effective baseline for 3d human pose estimation

• Motivation• 3D = 2D CNN + Mapping

http://openaccess.thecvf.com/content_ICCV_2017/papers/Martinez_A_Simple_yet_ICCV_2017_paper.pdf

A simple yet effective baseline for 3d human pose estimation, Deva Ramanan, Julieta Martinez, Rayat Hossain, Javier Romero, James J. Little, ICCV2018

• Split or Joint Training

• 3D structure: 2D Joints

Page 60: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

3D Skeleton: A simple yet effective baseline for 3d human pose estimation

• Experiments

http://openaccess.thecvf.com/content_ICCV_2017/papers/Martinez_A_Simple_yet_ICCV_2017_paper.pdfA simple yet effective baseline for 3d human pose estimation, Deva Ramanan, Julieta Martinez, Rayat Hossain, Javier Romero, James J. Little, ICCV2018

Page 61: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

3D Skeleton: Compositional Human Pose Regression

• Motivation• Bone Representation + 2D & 3D Joint training

http://openaccess.thecvf.com/content_ICCV_2017/papers/Sun_Compositional_Human_Pose_ICCV_2017_paper.pdf

Compositional Human Pose Regression, Xiao Sun, Jiaxiang Shang, Shuang Liang, Yichen Wei, ICCV2017

• Split or Joint Training

• 3D structure: 2D Joints + bone

Page 62: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

3D Skeleton: Compositional Human Pose Regression

• Experiments

http://openaccess.thecvf.com/content_ICCV_2017/papers/Sun_Compositional_Human_Pose_ICCV_2017_paper.pdf

Compositional Human Pose Regression, Xiao Sun, Jiaxiang Shang, Shuang Liang, Yichen Wei, ICCV2017

Page 63: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

3D Skeleton: Integral Human Pose Regression

• Motivation• Heatmap vs Regression

• Heatmap: non-differentiable, quantization error• Regression: miss spatial structure

• Integral loss

http://openaccess.thecvf.com/content_ICCV_2017/papers/Sun_Compositional_Human_Pose_ICCV_2017_paper.pdf

Integral Human Pose Regression, Xiao Sun, Bin Xiao, Fangyin Wei, Shuang Liang, and Yichen Wei, ECCV2018

• Split or Joint Training

• 3D structure: 3D Heatmaps

Page 64: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

3D Skeleton: Integral Human Pose Regression

• Experiments

http://openaccess.thecvf.com/content_ICCV_2017/papers/Sun_Compositional_Human_Pose_ICCV_2017_paper.pdf

Integral Human Pose Regression, Xiao Sun, Bin Xiao, Fangyin Wei, Shuang Liang, and Yichen Wei, ECCV2018

Page 65: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

3D Shape: DensePose

• Motivation• Dense Correspondence

DensePose: Dense Human Pose Estimation In The Wild, Rıza Alp Güler, Natalia Neverova, Iasonas Kokkinos, CVPR2018

Page 66: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

3D Shape: DensePose

• Dataset• DensePose-COCO Dataset

DensePose: Dense Human Pose Estimation In The Wild, Rıza Alp Güler, Natalia Neverova, Iasonas Kokkinos, CVPR2018

50K Images, 5M correspondences

24 UV Parts

Page 67: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

3D Shape: DensePose

• Method

DensePose: Dense Human Pose Estimation In The Wild, Rıza Alp Güler, Natalia Neverova, Iasonas Kokkinos, CVPR2018

Page 68: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

3D Shape: DensePose

• Experiments

DensePose: Dense Human Pose Estimation In The Wild, Rıza Alp Güler, Natalia Neverova, Iasonas Kokkinos, CVPR2018

Page 69: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Summary for 3D Skeleton

• 3D Representation: 3D Skeleton vs 3D Shape • 2D -> 3D Joint -> 3D Shape• Remaining issues

• Unconstrained (in the wild) benchmark• Ambiguous poses• Joint training of both 2D and 3D skeleton data

Page 70: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Outline

• Introduction to Human Pose Estimation• 2D Skeleton

• Top-Down• Bottom-Up

• 3D Skeleton• 2D -> 3D Skeleton• 2D -> 3D Shape

• Application• Conclusion

Page 71: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Application: Action Recognition

Page 72: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Application: Robotics

Page 73: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Application: Human-Computer Interaction

Page 74: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Application: Mobile Applications

Page 75: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Outline

• Introduction to Human Pose Estimation• 2D Skeleton

• Top-Down• Bottom-Up

• 3D Skeleton• 2D -> 3D Skeleton• 2D -> 3D Shape

• Application• Conclusion

Page 76: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,

Conclusion

• 2D Skeleton (context, resolution) -> 3D Skeleton (regression) -> 3D shape (Representation)

• A lot of potential applications based on Skeleton

• Action, Interaction, Game

• An improvement of skeleton is a large step for the industry

Page 77: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,