Top Banner
1/26 Deformable Part Models are Convolutional Neural Networks Ross Girshick, Forrest Iandola, Trevor Darrell, Jitendra Malik Presentor: YANG Wei January 25, 2016
33

Deformable Part Models are Convolutional Neural Networks

Jan 23, 2017

Download

Science

Wei Yang
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Deformable Part Models are Convolutional Neural Networks

1/26

Deformable Part Models are ConvolutionalNeural Networks

Ross Girshick, Forrest Iandola, Trevor Darrell, Jitendra Malik

Presentor: YANG Wei

January 25, 2016

Page 2: Deformable Part Models are Convolutional Neural Networks

2/26

Outline

1 Introduction

2 DeepPyramid DPMsFeature pyramid front-end CNNConstructing an equivalent CNN from a DPM

3 Implementation details

4 Experiments

Page 3: Deformable Part Models are Convolutional Neural Networks

3/26

Outline

1 Introduction

2 DeepPyramid DPMsFeature pyramid front-end CNNConstructing an equivalent CNN from a DPM

3 Implementation details

4 Experiments

Page 4: Deformable Part Models are Convolutional Neural Networks

4/26

Deformable Part Models vs. Convolutional NeuralNetworks

Deformable part models

Convolutional neural networks

Page 5: Deformable Part Models are Convolutional Neural Networks

4/26

Deformable Part Models vs. Convolutional NeuralNetworks

Deformable part models

Convolutional neural networks

Page 6: Deformable Part Models are Convolutional Neural Networks

5/26

Are DPMs and CNNs actually distinct?

DPMs: graphical modelsCNNs: “black-box” non-linear classifiers

This paper shows that any DPM can be formulated as anequivalent CNN, i.e., deformable part models are convolutionalneural networks.

Page 7: Deformable Part Models are Convolutional Neural Networks

6/26

Outline

1 Introduction

2 DeepPyramid DPMsFeature pyramid front-end CNNConstructing an equivalent CNN from a DPM

3 Implementation details

4 Experiments

Page 8: Deformable Part Models are Convolutional Neural Networks

7/26

DeepPyramid DPMs

Schematic model overview: “front-end CNN” + DPM-CNN

input: image pyramidoutput: object detection scores

Page 9: Deformable Part Models are Convolutional Neural Networks

8/26

Feature pyramid front-end CNN

front-end CNN: AlexNet (conv1-conv5).

A CNN that maps an image pyramid to a feature pyramidAlexNetsingle-scale architecture

Page 10: Deformable Part Models are Convolutional Neural Networks

9/26

Constructing an equivalent CNN from a DPM

A single-component DPM.

mixture of componentscomponent = root filter + part filter

Page 11: Deformable Part Models are Convolutional Neural Networks

10/26

Inference with DPMs

The matching process at one scale.

Page 12: Deformable Part Models are Convolutional Neural Networks

11/26

Architecture of DPM-CNN

The unrolled detection algorithm of DPM generates a specificnetwork with fixed length:

1 input: conv5 feature pyramid of front-end CNN

2 generate P+1 feature maps: 1 root filter and P part filters3 P part feature maps are fed into distance transform layer4 root feature map are stacked (channel-wise concatenated)

with the transformed part feature maps5 The resulting P+1 channel feature map is convolved with

an object geometry filter, which produces the output DPMscore map for the input pyramid level

Page 13: Deformable Part Models are Convolutional Neural Networks

11/26

Architecture of DPM-CNN

The unrolled detection algorithm of DPM generates a specificnetwork with fixed length:

1 input: conv5 feature pyramid of front-end CNN2 generate P+1 feature maps: 1 root filter and P part filters

3 P part feature maps are fed into distance transform layer4 root feature map are stacked (channel-wise concatenated)

with the transformed part feature maps5 The resulting P+1 channel feature map is convolved with

an object geometry filter, which produces the output DPMscore map for the input pyramid level

Page 14: Deformable Part Models are Convolutional Neural Networks

11/26

Architecture of DPM-CNN

The unrolled detection algorithm of DPM generates a specificnetwork with fixed length:

1 input: conv5 feature pyramid of front-end CNN2 generate P+1 feature maps: 1 root filter and P part filters3 P part feature maps are fed into distance transform layer

4 root feature map are stacked (channel-wise concatenated)with the transformed part feature maps

5 The resulting P+1 channel feature map is convolved withan object geometry filter, which produces the output DPMscore map for the input pyramid level

Page 15: Deformable Part Models are Convolutional Neural Networks

11/26

Architecture of DPM-CNN

The unrolled detection algorithm of DPM generates a specificnetwork with fixed length:

1 input: conv5 feature pyramid of front-end CNN2 generate P+1 feature maps: 1 root filter and P part filters3 P part feature maps are fed into distance transform layer4 root feature map are stacked (channel-wise concatenated)

with the transformed part feature maps

5 The resulting P+1 channel feature map is convolved withan object geometry filter, which produces the output DPMscore map for the input pyramid level

Page 16: Deformable Part Models are Convolutional Neural Networks

11/26

Architecture of DPM-CNN

The unrolled detection algorithm of DPM generates a specificnetwork with fixed length:

1 input: conv5 feature pyramid of front-end CNN2 generate P+1 feature maps: 1 root filter and P part filters3 P part feature maps are fed into distance transform layer4 root feature map are stacked (channel-wise concatenated)

with the transformed part feature maps5 The resulting P+1 channel feature map is convolved with

an object geometry filter, which produces the output DPMscore map for the input pyramid level

Page 17: Deformable Part Models are Convolutional Neural Networks

12/26

Architecture of DPM-CNN

CNN equivalent to a single-component DPM.

Page 18: Deformable Part Models are Convolutional Neural Networks

13/26

Traditional distance transform

Traditional distance transforms are defined for sets of points ona grid [FH05].

G : gridd(p−q): measure ofdistance between pointsp,q ∈ GB⊆ G

Then the distance transform ofB on G

DB(p) = minq∈B

d(p−q)Distance transform (Euclidean distance)

Page 19: Deformable Part Models are Convolutional Neural Networks

14/26

Traditional distance transform

DT can be also formulated as

DB(p) = minq∈G

(d(p−q)+1B(q))

where

1B(q) =

{0, if q ∈ B,∞, if q /∈ B.

(1)

Page 20: Deformable Part Models are Convolutional Neural Networks

15/26

Generalized distance transform

A generalization of distance transforms can be obtained byreplacing the indicator function with some arbitrary functionover the grid G

D f ′(p) = minq∈G

(d(p−q)+ f ′(q))

We can also define the generalized DT as maximization byletting f (q) =− f ′(q)

D f (p) = maxq∈G

( f (q)−d(p−q))

Page 21: Deformable Part Models are Convolutional Neural Networks

16/26

Distance transform in DPM

In DPM, after computing filter responses we transform theresponses of the part filters to allow spatial uncertainty,

Di(x,y) = maxdx,dy

(Ri(x+dx,y+dy)−wi ·φd(dx,dy))

whereφd(dx,dy) = [dx,dy,dx2, ,dy2]

The value Di(x,y) is the maximum contribution of the partto the score of a root location that places the anchor of thispart at position (x,y).

By letting p = (x,y), p−q = (dx,dy) andd(p−q) = w ·φ(p−q), we can see that it is exactly in theform of distance transform.

Page 22: Deformable Part Models are Convolutional Neural Networks

16/26

Distance transform in DPM

In DPM, after computing filter responses we transform theresponses of the part filters to allow spatial uncertainty,

Di(x,y) = maxdx,dy

(Ri(x+dx,y+dy)−wi ·φd(dx,dy))

whereφd(dx,dy) = [dx,dy,dx2, ,dy2]

The value Di(x,y) is the maximum contribution of the partto the score of a root location that places the anchor of thispart at position (x,y).By letting p = (x,y), p−q = (dx,dy) andd(p−q) = w ·φ(p−q), we can see that it is exactly in theform of distance transform.

Page 23: Deformable Part Models are Convolutional Neural Networks

17/26

Max pooling as distance transform

Consider max pooling on f : G 7→ R on a regular grid G .Let a window half-length as k, then max pooling can be definedas

M f (p) = max∆p∈{−k,··· ,k}

f (p+∆p)

Max pooling can be expressed equivalently as distancetransform:

M f (p) = maxq∈G

( f (q)−dmax(p−q))

where

dmax(p−q) =

{0, if (p−q) ∈ {−k, · · · ,k},∞, otherwise .

(2)

Page 24: Deformable Part Models are Convolutional Neural Networks

18/26

Generalize max pooling to distance transform pooling

We can generalize max pooling to distance transform pooling:unlike max pooling, the distance transform of f at p istaken over the entire domain Grather than specifying a fixed pooling window a priori, theshape of the pooling region can be learned from the data.

The released code does not include the DT pooling layer.Please refer to [OW13] for more details.

Page 25: Deformable Part Models are Convolutional Neural Networks

18/26

Generalize max pooling to distance transform pooling

We can generalize max pooling to distance transform pooling:unlike max pooling, the distance transform of f at p istaken over the entire domain Grather than specifying a fixed pooling window a priori, theshape of the pooling region can be learned from the data.

The released code does not include the DT pooling layer.Please refer to [OW13] for more details.

Page 26: Deformable Part Models are Convolutional Neural Networks

19/26

Object geometry filters

The root convolution map and the DT pooled part convolution maps are stacked into asingle feature map with P+1 channels and then convolved with a sparse objectgeometry filter.

Page 27: Deformable Part Models are Convolutional Neural Networks

20/26

Combining mixture components with maxout

CNN equivalent to a multi-component DPM. A multi-component DPM-CNN iscomposed of one DPM-CNN per component and a maxout [GWFM+13] layer thattakes a max over component DPM-CNN outputs at each location.

Page 28: Deformable Part Models are Convolutional Neural Networks

21/26

Outline

1 Introduction

2 DeepPyramid DPMsFeature pyramid front-end CNNConstructing an equivalent CNN from a DPM

3 Implementation details

4 Experiments

Page 29: Deformable Part Models are Convolutional Neural Networks

22/26

Feature pyramid front-end CNN

Implementation detailspretrain on ILSVRC 2012 classification using Caffeuse conv5 as output layer“same” convolution

zero-pad each conv/pooling layer’s input with xk/2y zeroson all sides (top, bottom, left and right)(x,y) in conv5 feature map has a receptive field centered onpixel (16x,16y) in the input imageconv5 feature maps: stride: 16; receptive field: 163×163

Page 30: Deformable Part Models are Convolutional Neural Networks

23/26

Outline

1 Introduction

2 DeepPyramid DPMsFeature pyramid front-end CNNConstructing an equivalent CNN from a DPM

3 Implementation details

4 Experiments

Page 31: Deformable Part Models are Convolutional Neural Networks

24/26

Experiments

Detection average precision (%) on VOC 2007 test. Column C shows the number ofcomponents and column P shows the number of parts per component.

Page 32: Deformable Part Models are Convolutional Neural Networks

25/26

Experiments

HOG versus conv5 feature pyramids. In contrast to HOG features, conv5 features aremore part-like and scale selective. Each conv5 pyramid shows 1 of 256 featurechannels. The top two rows show a HOG feature pyramid and the face channel of aconv5 pyramid on the same input image.

Page 33: Deformable Part Models are Convolutional Neural Networks

26/26

References

Pedro F Felzenszwalb and Daniel P Huttenlocher, Pictorial structures for objectrecognition, International Journal of Computer Vision 61 (2005), no. 1, 55–79.

Ian J Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron Courville, and YoshuaBengio, Maxout networks, arXiv preprint arXiv:1302.4389 (2013).

Wanli Ouyang and Xiaogang Wang, Joint deep learning for pedestrian detection,ICCV, IEEE, 2013, pp. 2056–2063.