Top Banner
Deep Learning in Object Deep Learning in Object Detection Wanli Ouyang Wanli Ouyang Department of Electronic Engineering, Th Chi Ui i fH K The Chinese University ofHong K ong
59

deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

May 11, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

Deep Learning in ObjectDeep Learning in Object Detection

Wanli OuyangWanli OuyangDepartment of Electronic Engineering, Th Chi U i i f H KThe Chinese University of Hong Kong

Page 2: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

Face alignment

Deep learningObject detection

Human pose estimation

Deep learning

Pedestrian detection

Page 3: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

Outline• General object detectionj

• Pedestrian Detection 

l li i• Human part localization

Page 4: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

2 “How to”s2  How to s• How to effectively train a deep model Data augmentation Label more dataPre‐train on large‐scale related data (RCNN) Layerwise pre‐training + fine tuning (Multi‐stage)

• How to formulate a vision problem with deep learning Tune hyper‐parameters, e.g. number of hidden nodes, filter Tune hyper parameters,  e.g. number of hidden nodes, filter 

size, number of layers, activation function, dropout …Make use of experience and insights obtained in CV researchp g

Sequential design/learning vs joint learning Contextual information  (Multi‐stage, face, human pose)Background clutter removal (SDN)

Page 5: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

2 “How to”s2  How to s• How to effectively train a deep model D t t tiData augmentation Label more dataPre train on large scale related data (RCNN)Pre‐train on large‐scale related data (RCNN) Layerwise pre‐training + fine tuning (Multi‐stage)

• How to formulate a vision problem with deep learning• How to formulate a vision problem with deep learning Tune hyper‐parameters,  e.g. number of hidden nodes, 

number of layers, activation function, dropout.u be o aye s, act at o u ct o , d opoutMake use of experience and insights obtained in CV researchSequential design/learning vs joint learning Contextual information  (Multi‐stage, face, human pose)Background clutter removal (SDN)Background clutter removal (SDN)Short and long range temporal relationship (Action recognition)

Page 6: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

Object detectionObject detectionP l VOC I t ILSVRCPascal VOC

~ 20 object classesi i 00 i

Image‐net ILSVRC~ 200 object classesi i 39 000 iTraining: ~ 5,700 images

Testing:  ~10,000 imagesTraining: ~ 395,000 imagesTesting:  ~ 40,000 images

Page 7: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

SIFT HOG LBP DPMSIFT, HOG, LBP, DPM …

[Regionlets. Wang et al. ICCV’13] [SegDPM. Fidler et al. CVPR’13]

Page 8: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

With CNN featuresWith CNN features

Page 9: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

2 “How to”s2  How to s• How to effectively train a deep model D t t tiData augmentation Label more dataPre train on large scale related data (RCNN)Pre‐train on large‐scale related data (RCNN) Layerwise pre‐training + fine tuning (Multi‐stage)

• How to formulate a vision problem with deep learning• How to formulate a vision problem with deep learning Tune hyper‐parameters,  e.g. number of hidden nodes, 

number of layers, activation function, dropout.u be o aye s, act at o u ct o , d opoutMake use of experience and insights obtained in CV researchSequential design/learning vs joint learning Contextual information  (Multi‐stage, face, human pose)Background clutter removal (SDN)Short and long range temporal relationship (Li fei‐fei and Yu kai’s works)

Page 10: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

R CNN: regions + CNN featuresR‐CNN: regions + CNN features

Input image Extract region proposals (~2k/image)

Compute CNN f t

2‐class linear SVMproposals ( 2k/image) featuresRegion: 91.6%/98Selective Search [van de Sande, Uijlings et al.]. % recall rate on ImageNet/PASCAL

CNN feature : Krizhevsky, Sutskever & Hinton. NIPS 2012. Also called “AlexNet”SVM: Liblinear

Page 11: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

CNN trainingCNN training

T i S Vi i CNN* f th 1000• Train a SuperVision CNN* for the 1000‐way ILSVRC image classification task (1.2 million images)

• Fine‐tune the CNN for detectionFine tune the CNN for detectionTransfer the representation learned from ILSVRC

Classification to PASCAL (or ImageNet) detectionClassification to PASCAL (or ImageNet) detection

ImageNetCls Pre‐Train ImageNetCls

Det Fine‐tune

Network from Krizhevsky, Sutskever & Hinton. NIPS 2012Also called “AlexNet

Det

Page 12: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

Overfeat [Sermanet et al 2014]Overfeat [Sermanet et al. 2014]

M id ti d d l d i• More considerations on deep model design– Multi‐resolution, dense pooling

• Sliding window (not region proposal)• Does not use ImageNet Cls for pretraining• Does not use ImageNet‐Cls for pretraining

Page 13: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

Experimental results on ILSVRC 2013Experimental results on ILSVRC 2013

Page 14: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

2 “How to”s2  How to s• How to effectively train a deep model D t t tiData augmentation Label more dataPre train on large scale related data (RCNN)Pre‐train on large‐scale related data (RCNN) Layerwise pre‐training + fine tuning (Multi‐stage)

• How to formulate a vision problem with deep learning• How to formulate a vision problem with deep learning Tune hyper‐parameters,  e.g. number of hidden nodes, 

number of layers, activation function, dropout.u be o aye s, act at o u ct o , d opoutMake use of experience and insights obtained in CV researchSequential design/learning vs joint learning Contextual information  (Multi‐stage, face, human pose)Background clutter removal (SDN)Short and long range temporal relationship (Li fei‐fei and Yu kai’s works)

Page 15: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

Pedestrian Detection

Improve state‐of‐the‐art average miss detection rateaverage miss detection rate on the largest Caltech dataset from 63% to 38%

ICCV’13

CVPR’12 CVPR’13 ICCV’13 CVPR’14

Page 16: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

Joint Deep LearningJoint Deep Learning

Page 17: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

What if we treat an existing deep model as a black box in pedestrian detection?

ConvNet−U−MS 

– Sermnet, K. Kavukcuoglu, S. Chintala, and LeCun, “Pedestrian Detection with Unsupervised Multi‐Stage Feature Learning,” CVPR 2013.

Page 18: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

Results on Caltech TestResults on ETHZ

Page 19: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

2 “How to”s2  How to s• How to effectively train a deep model D t t tiData augmentation Label more dataPre train on large scale related data (RCNN)Pre‐train on large‐scale related data (RCNN) Layerwise pre‐training + fine tuning (Multi‐stage)

• How to formulate a vision problem with deep learning• How to formulate a vision problem with deep learning Tune hyper‐parameters,  e.g. number of hidden nodes, 

number of layers, activation function, dropout.u be o aye s, act at o u ct o , d opoutMake use of experience and insights obtained in CV researchSequential design/learning vs joint learning Contextual information  (Multi‐stage, face, human pose)Background clutter removal (SDN)Short and long range temporal relationship (Li fei‐fei and Yu kai’s works)

Page 20: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

• N. Dalal and B. Triggs. Histograms of oriented gradients for human detection.  CVPR 2005 (6000 citations)CVPR, 2005. (6000 citations)

• P. Felzenszwalb, D. McAlester, and D. Ramanan. A Discriminatively Trained, Multiscale, Deformable Part Model.  CVPR, 2008. (2000 citations)

• W. Ouyang and X. Wang. A Discriminative Deep Model for Pedestrian Detection with Occlusion Handling.  CVPR, 2012. 

Page 21: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

Our Joint Deep Learning ModelOur Joint Deep Learning Model

Page 22: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

Modeling Part DetectorsModeling Part Detectors

• Design the filters in the second convolutional layer with variable sizes

Part models learned from HOG

Part models Learned filtered at the second convolutional layer

Page 23: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

Deformation LayerDeformation Layer

Page 24: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

Visibility Reasoning with Deep Belief NetVisibility Reasoning with Deep Belief Net

Page 25: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

Experimental ResultsExperimental Results• Caltech Test dataset (largest most widely used)• Caltech – Test dataset (largest, most widely used) 

90

100 95%68%( %)

70

80

90 68%63% (state‐of‐the‐art)

53%

ss ra

te (

50

60

39% (best performing)age mis

2000 2002 2004 2006 2008 2010 2012 201430

40( p g)

Improve by ~ 20%

Avera

W. Ouyang, X. Zeng and X. Wang, "Modeling Mutual Visibility Relationship in Pedestrian Detection ", CVPR 2013.W. Ouyang, Xiaogang Wang, "Single‐Pedestrian Detection aided by Multi‐pedestrian Detection ", CVPR 2013.X Zeng W Ouyang and X Wang ” A Cascaded Deep Learning Architecture for Pedestrian Detection ” ICCV 2013

W. Ouyang and X. Wang, "A Discriminative Deep Model for Pedestrian Detection with Occlusion Handling,“ CVPR 2012.

X. Zeng, W. Ouyang and X. Wang,   A Cascaded Deep Learning Architecture for Pedestrian Detection,  ICCV 2013.W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,” IEEE ICCV 2013.

Page 26: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

Results on Caltech Test Results on ETHZ

Page 27: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

DN‐HOGUDN‐HOGUDN‐HOGCSSUDN‐CNNFeatUDN‐DefLayer

Page 28: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

Multi‐Stage Contextual Deep Learning

Page 29: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

Motivated by Cascaded Classifiers and Contextual Boost

• The classifier of each stage deals with a specific set of samples

• The score map output by one classifier can serve as contextual information for the next classifier

Only pass one detection score to the next stage Classifiers are trained Classifiers are trained sequentially

Conventional cascaded classifiers for detection

Page 30: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

2 “How to”s2  How to s• How to effectively train a deep model D t t tiData augmentation Label more dataPre train on large scale related data (RCNN)Pre‐train on large‐scale related data (RCNN) Layerwise pre‐training + fine tuning (Multi‐stage)

• How to formulate a vision problem with deep learning• How to formulate a vision problem with deep learning Tune hyper‐parameters,  e.g. number of hidden nodes, 

number of layers, activation function, dropout.u be o aye s, act at o u ct o , d opoutMake use of experience and insights obtained in CV researchSequential design/learning vs joint learning Contextual information  (Multi‐stage, face, human pose)Background clutter removal (SDN)Short and long range temporal relationship (Li fei‐fei and Yu kai’s works)

Page 31: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

• Our deep model keeps the score map output by the current classifier and it serves as contextual information to support the decision at the next stageserves as contextual information to support the decision at the next stage

• Cascaded classifiers are jointly optimized instead of being trained sequentially• To avoid overfitting, a stage‐wise pre‐training scheme is proposed to regularize 

ti i tioptimization• Simulate the cascaded classifiers by mining hard samples to train the network stage‐by‐stage

Page 32: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

Training StrategiesTraining Strategiesd l b l• Unsupervised pre‐train Wh,i+1 layer‐by‐layer, setting Ws,i+1 = 0, Fi+1 = 0

• Fine‐tune all the Wh,i+1 with supervised BP• Train Fi 1 andW i 1 with BP stage‐by‐stageTrain Fi+1 and Ws,i+1 with BP stage by stage• A correctly classified sampled at the previous stage does not influence the 

update of parameters• Stage‐by‐stage training can be considered as adding regularization 

constraints to parameters, i.e. some parameters are constrained to be zeros in the early training stagesy g g

Log error function: 

Gradients for updating parameters: 

Page 33: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

Experimental ResultsExperimental Results

Caltech ETHZ

Page 34: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

DeepNetNoneFilter

Page 35: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

Comparison of Different Training StrategiesComparison of Different Training Strategies

Network‐BP: use back propagation to update all the parameters without pre‐trainingPretrainTransferMatrix‐BP:  the transfer matrices are unsupervised pertrained, and then all the parameters are fine‐tunedMulti‐stage: our multi‐stage training strategy

Page 36: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

Switchable Deep Network for Pedestrian DetectionPedestrian Detection

Page 37: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

Switchable Deep Network for Pedestrian Detection

• Background clutter and large variations of pedestrianappearance.

• Proposed Solution. A Switchable Deep Network (SDN)for learning the foreground map.

Page 38: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

2 “How to”s2  How to s• How to effectively train a deep model D t t tiData augmentation Label more dataPre train on large scale related data (RCNN)Pre‐train on large‐scale related data (RCNN) Layerwise pre‐training + fine tuning (Multi‐stage)

• How to formulate a vision problem with deep learning• How to formulate a vision problem with deep learning Tune hyper‐parameters,  e.g. number of hidden nodes, 

number of layers, activation function, dropout.u be o aye s, act at o u ct o , d opoutMake use of experience and insights obtained in CV researchSequential design/learning vs joint learning Contextual information  (Multi‐stage, face, human pose)Background clutter removal (SDN)Background clutter removal (SDN)Short and long range temporal relationship (Action recognition)

Page 39: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

Switchable Deep Network for Pedestrian Detection

• Switchable Restricted Boltzmann Machine

Page 40: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

Switchable Deep Network for Pedestrian Detection

• Switchable Restricted Boltzmann Machine

ForegroundBackground

Page 41: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

Switchable Deep Network for Pedestrian Detection

(a) Performance on Caltech Test (b) Performance on ETH

Page 42: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

Human part localizationHuman part localization

• Facial Keypoint Detection• Human pose estimationHuman pose estimation

CVPR’ 13

CVPR’ 14

Page 43: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

2 “How to”s2  How to s• How to effectively train a deep model D t t tiData augmentation Label more dataPre train on large scale related data (RCNN)Pre‐train on large‐scale related data (RCNN) Layerwise pre‐training + fine tuning (Multi‐stage)

• How to formulate a vision problem with deep learning• How to formulate a vision problem with deep learning Tune hyper‐parameters,  e.g. number of hidden nodes, 

number of layers, activation function, dropout.u be o aye s, act at o u ct o , d opoutMake use of experience and insights obtained in CV researchSequential design/learning vs joint learning Contextual information  (Multi‐stage, face, human pose)Background clutter removal (SDN)Background clutter removal (SDN)Short and long range temporal relationship (Action recognition)

Page 44: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

Facial Keypoint DetectionFacial Keypoint Detection

• Y. Sun, X. Wang and X. Tang, “Deep Convolutional Network Cascade for Facial Point Detection,” CVPR 2013

Page 45: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”
Page 46: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

Comparison with Belhumeur et al. [4], Cao et al. [5] on LFPW test images.

1. http://www.luxand.com/facesdk/2. http://research.microsoft.com/en‐us/projects/facesdk/.3. O. Jesorsky, K. J. Kirchberg, and R. Frischholz. Robust face detection using the hausdorff distance. In Proc. AVBPA, 2001.4. P. N. Belhumeur, D. W. Jacobs, D. J. Kriegman, and N. Kumar. Localizing parts of faces using a consensus of exemplars. In Proc. CVPR, 2011.5. X. Cao, Y. Wei, F. Wen, and J. Sun. Face alignment by explicit shape regression. In Proc. CVPR, 2012.6. L. Liang, R. Xiao, F. Wen, and J. Sun. Face alignment via component‐based discriminative search. In Proc. ECCV, 2008.7. M. Valstar, B. Martinez, X. Binefa, and M. Pantic. Facial point detection using boosted regression and graph models. In Proc. CVPR, 2010.

Page 47: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

Validation.

BioID.

LFPWLFPW.

Page 48: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

Benefits of Using Deep ModelBenefits of Using Deep Model

• Take the full face as input to make full use of texture context information over the entire face to locate each keypointTh fi t t k th t t k th h l f i t d• The first network that takes the whole face as input needs deep structures to extract high‐level features

• Since the networks are trained to predict all the keypoints• Since the networks are trained to predict all the keypointssimultaneously, the geometric constraints among keypointsare implicitly encodedp y

• Global geometric constraints among keypoints can also be explicitly encoded by deep model.

Page 49: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

Human pose estimationHuman pose estimation• W Ouyang X Wang and X Tang “Multi‐W. Ouyang, X. Wang and X. Tang,  Multisource Deep Learning for Human Pose Estimation” CVPR 2014Estimation  CVPR 2014.

Page 50: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

Multiple information sourcesp

• Appearance

Page 51: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

Multiple information sourcesp

• Appearance• Appearance mixture typeAppearance mixture type

Page 52: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

Multiple information sourcesp

• Appearance• Appearance mixture typeAppearance mixture type• Deformation

Page 53: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

Multi‐source deep modelp

Page 54: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

Experimental resultsExperimental resultsPARSE

Method Torso  U.leg L.leg U.arm L.arm head  Total

Yang&Ramanan [58] 82.9  68.8  60.5  63.4  42.4  82.4  63.6

O 89 3 78 0 72 0 67 8 47 8 89 3 71 0Ours 89.3  78.0  72.0  67.8  47.8  89.3  71.0

UIUC PeopleMethod Torso  U.leg L.leg U.arm L.arm head  Total

Yang&Ramanan [58] 81.8  65.0  55.1  46.8  37.7  79.8  57.0

Ours 89 1 72 9 62 4 56 3 47 6 89 1 65 6Ours 89.1  72.9  62.4  56.3  47.6  89.1  65.6

LSPMethod Torso  U.leg L.leg U.arm L.arm head  Total

Yang&Ramanan [58] 82.9 70.3  67.0  56.0  39.8  79.3 62.8

Ours 85 8 76 5 72 2 63 3 46 6 83 1 68 6Ours 85.8  76.5  72.2  63.3  46.6  83.1  68.6

Up to 8.6 percent  accuracy improvement with global geometric constraints 

Page 55: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

Experimental resultsExperimental resultsRightLeftOursYang&Ramanan

Page 56: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

Conclusion 2 “How to”sConclusion ‐ 2  How to s• How to effectively train a deep model D t t tiData augmentation Label more dataPre train on large scale related data (RCNN)Pre‐train on large‐scale related data (RCNN) Layerwise pre‐training + fine tuning (Multi‐stage)

• How to formulate a vision problem with deep learning?• How to formulate a vision problem with deep learning? Tune hyper‐parameters,  e.g. number of hidden nodes, 

number of layers, activation function, dropout.u be o aye s, act at o u ct o , d opoutMake use of experience and insights obtained in CV researchSequential design/learning vs joint learning Contextual information  (Multi‐stage, face, human pose)Background clutter removal (SDN)Background clutter removal (SDN)Short and long range temporal relationship (Action recognition)

Page 57: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

ReferenceReferenceR Gi hi k J D h T D ll J M lik “Ri h F Hi hi f A Obj• R. Girshick, J. Donahue, T. Darrell, J. Malik, “Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation,”  CVPR, 2014.

• Sermanet, Pierre, et al. "Overfeat: Integrated recognition, localization and detection i l ti l t k " Xi i t Xi 1312 6229 (2013)using convolutional networks." arXiv preprint arXiv:1312.6229 (2013).

• W. Ouyang and X. Wang, "Joint Deep Learning for Pedestrian Detection," in Proceedings of IEEE International Conference on Computer Vision (ICCV) 2013X Z W O d X W "M l i S C l D L i f P d i• X. Zeng, W. Ouyang and X. Wang, "Multi‐Stage Contextual Deep Learning for Pedestrian Detection," in Proceedings of IEEE International Conference on Computer Vision (ICCV)2013

• P Luo Y Tian X Wang and X Tang "Switchable Deep Network for Pedestrian• P. Luo, Y. Tian, X. Wang, and X. Tang, "Switchable Deep Network for Pedestrian Detection", IEEE Conf. on Computer Vision and Pattern Recognition, June 2014

Page 58: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

ReferenceReferenceW O X Z d X W "M d li M l Vi ibili R l i hi i h D• W. Ouyang, X. Zeng and X. Wang, "Modeling Mutual Visibility Relationship with a Deep Model in Pedestrian Detection," in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3222‐3229, 2013 W O d X W "A Di i i ti D M d l f P d t i D t ti ith• W. Ouyang, and X. Wang, "A Discriminative Deep Model for Pedestrian Detection with Occlusion Handling," in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3258‐3265, 2012

• Y Sun X Wang and X Tang "Deep Convolutional Network Cascade for Facial Point• Y. Sun, X. Wang and X. Tang,  Deep Convolutional Network Cascade for Facial Point Detection," in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3476‐3483, 2013

• W Ouyang X Chu and X Wang "Multi‐source Deep Learning for Human Pose Estimation"W. Ouyang, X. Chu, and X. Wang,  Multi source Deep Learning for Human Pose Estimation , IEEE Conf. on Computer Vision and Pattern Recognition, June 2014

Page 59: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”

Q&AQ&A

mmlab.ie.cuhk.edu.hk/ www.ee.cuhk.edu.hk/~xgwang/ www.ee.cuhk.edu.hk/~wlouyang/