cv:hci - Visual Perception for Human-Computer Interaction · 2008-12-27 · Computer Vision for Human-Computer Interaction Research Group, Universität Karlsruhe (TH) cv:hci Requirements
Post on 27-Jul-2020
1 Views
Preview:
Transcript
Edgar Seemann, 19.12.08 1
Visual Perception for Human-Computer Interaction
WS 2008/09
Dr. Rainer StiefelhagenDr. Edgar Seemann
Interactive Systems LaboratoriesUniversität Karlsruhe (TH)
http://isl.ira.uka.de/VisionHCICoursestiefel@ira.uka.de
seemann@pedestrian-detection.com
Edgar Seemann, 19.12.08 2
Termine (1)Termine Thema
27.10.2008 Introduction, Applications
31.10.2008 Basics: Image Processing
03.11.2008 Basics: Image Transformations, 2D Structure
07.11.2008 Basics: Pattern recognition
10.11.2008 Computer Vision: Tasks, Challenges, Learning, Performance measures
14.11.2008 Face Detection I: Color, Edges (Birchfield)
17.11.2008 Project 1: Intro + Programming tips21.11.2008 Face Detection II: ANNs, SVM, Viola & Jones
24.11.2008 Project 1: Questions
28.11.2008 Face Recognition I: Traditional Approaches, Eigenfaces, Fisherfaces, EBGM
01.12.2008 Face Recognition II
05.12.2008 Head Pose Estimation: Model-based, NN, Texture Mapping, Focus of Attention
08.12.2008 Project 1: Student Presentations, Project 2: Intro 12.12.2008 People Detection I
15.12.2008 People Detection II
19.12.2008 People Detection III
22.12.2008 Scene Context and Geometry I
Edgar Seemann, 19.12.08 3
Organisatorisches
� Assignment deadlines have been changed
� Assignment 2 is due: January 16th, 2009
� Assignment 3 is due: February 6th, 2009
-> you have one more week
� Please don’t forget to send me your files from assignment 1
Edgar Seemann, 15.12.08 4
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Today
� Local Features [credits: K. Mikolajczyk, B. Leibe, B. Schiele]
� Interest Point Detectors
� Scale Selection
� Implicit Shape Model� Codebook Representation
� Detection Loop
� Interleaved detection and segmentation
� Pose recovery
� Combination of generative and discriminative approaches
Edgar Seemann, 15.12.08 5
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Local Features
Edgar Seemann, 15.12.08 6
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
So far
� Parts were defined manually
� Parts represented the semantic structure� i.e. face, leg etc.
� Questions:� Do these parts decompose the variability in an optimal
way?� Must the parts have a semantic meaning
� Should we use smaller/larger parts?
� Can we find parts automatically?
Edgar Seemann, 15.12.08 7
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Requirements for part decomposition
� Repeatable� i.e. we should be able to find the part despite articulation or image
transformations (e.g. rotation, perspective, lighting)
� Distinctive� Part should not be confounded with other parts� The regions should contain an “interesting” structure
� Compact� Typically no lengthy or strangely shaped parts
� Efficient� It should be computationally inexpensive to detect or represent part
� Cover� parts need to sufficiently cover the object
Edgar Seemann, 15.12.08 8
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Going local
� Local Feature Approaches� Use a large number of parts (typically 100-10000 parts)
� Parts have mostly no direct semantic meaning
� Parts are generated automatically
� Let algorithm find its own parts
� Typically smaller partsθq
φ
dq
φ
θ
d
Edgar Seemann, 15.12.08 9
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Keypoints and descriptors
� We distinguish� Key or interest points
� Local (key point) descriptors
� Interest Points� Specify repeatable points on the object
� x-, y-position and scale
� Local Descriptors� Define the feature representation around an interest
point
Edgar Seemann, 15.12.08 10
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
ApproachN
pix
els
N pixels
Similarity
measureAf
e.g. color
Bf
e.g. color
B1
B2
B3A1
A2 A3
Tffd BA <),(
1. Find a set of distinctive key-points
3. Extract and normalize the region content
2. Define a region around each keypoint
4. Compute a local descriptor from the normalized region
5. Match local descriptors
Edgar Seemann, 15.12.08 11
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Key Point Detectors
Edgar Seemann, 15.12.08 12
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Key Point Detectors
� Many Existing Detectors Available� Hessian & Harris [Beaudet ‘78], [Harris ‘88]
� Laplacian, DoG [Lindeberg ‘98], [Lowe 1999]
� Harris-/Hessian-Laplace [Mikolajczyk & Schmid ‘01]
� Harris-/Hessian-Affine [Mikolajczyk & Schmid ‘04]
� EBR and IBR [Tuytelaars & Van Gool ‘04]
� MSER [Matas ‘02]
� Salient Regions [Kadir & Brady ‘01]
� Others…
� Reference site:� http://www.robots.ox.ac.uk/~vgg/research/affine/index.
html
Edgar Seemann, 15.12.08 13
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Keypoint Localization
� Goals: � Repeatable detection� Precise localization� Interesting content
⇒ Look for two-dimensional signal changes
Edgar Seemann, 15.12.08 14
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Hessian Detector [Beaudet78]
� Hessian determinant
=
yyxy
xyxx
II
IIIHessian )(
Ixx
IyyIxy
Intuition: Search for strongderivatives in two orthogonal directions
Edgar Seemann, 15.12.08 15
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Hessian Detector [Beaudet78]
� Hessian determinantIxx
IyyIxy
2))(det( xyyyxx IIIIHessian −=
2)^(. xyyyxx III −∗In Matlab:
=
yyxy
xyxx
II
IIIHessian )(
Edgar Seemann, 15.12.08 16
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
16
Hessian Detector – Responses [Beaudet78]
Effect: Responses mainly on corners and strongly textured areas.
Edgar Seemann, 15.12.08 17
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Hessian Detector – Responses [Beaudet78]
Edgar Seemann, 15.12.08 18
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Harris Detector [Harris88]
� Second moment matrix(autocorrelation matrix)
K. Grauman, B. Leibe
∗=
)()(
)()()(),( 2
2
DyDyx
DyxDxIDI III
IIIg
σσσσ
σσσµ
Intuition: Search for local neighborhoods where the image content has two main directions (eigenvectors).
Edgar Seemann, 15.12.08 19
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Harris Detector [Harris88]
� Second moment matrix(autocorrelation matrix)
K. Grauman, B. Leibe
1. Image derivatives
gx(σD), gy(σD),
IxIy
∗=
)()(
)()()(),( 2
2
DyDyx
DyxDxIDI III
IIIg
σσσσ
σσσµ
Edgar Seemann, 15.12.08 20
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Harris Detector [Harris88]
� Second moment matrix(autocorrelation matrix)
K. Grauman, B. Leibe
∗=
)()(
)()()(),( 2
2
DyDyx
DyxDxIDI III
IIIg
σσσσ
σσσµ
1. Image derivatives
Ix(σD), Iy(σD),
IxIy
2. Square of
derivatives
Ix2 Iy
2 IxIy
Edgar Seemann, 15.12.08 21
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Harris Detector [Harris88]
� Second moment matrix(autocorrelation matrix)
Iy
∗=
)()(
)()()(),( 2
2
DyDyx
DyxDxIDI III
IIIg
σσσσ
σσσµ
1. Image
derivatives
2. Square of
derivatives
3. Gaussian
filter g(σI)
Ix Iy
Ix2 Iy
2 IxIy
g(Ix2) g(Iy
2) g(IxIy)
21
Edgar Seemann, 15.12.08 22
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Harris Detector [Harris88]
� Second moment matrix(autocorrelation matrix)
Iy
∗=
)()(
)()()(),( 2
2
DyDyx
DyxDxIDI III
IIIg
σσσσ
σσσµ
g(IxIy)
22
1. Image
derivatives
2. Square of
derivatives
3. Gaussian
filter g(σI)
Ix Iy
Ix2 Iy
2 IxIy
g(Ix2) g(Iy
2) g(IxIy)
222222 )]()([)]([)()( yxyxyx IgIgIIgIgIg +−− α
=−= ))],([trace()],(det[ DIDIhar σσµασσµ4. Cornerness function – both eigenvalues are strong
har5. Non-maxima suppression
Edgar Seemann, 15.12.08 23
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
23
Harris Detector – Responses [Harris88]
Effect: A very precise corner detector.
Edgar Seemann, 15.12.08 24
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hciHarris Detector – Responses [Harris88]
Edgar Seemann, 15.12.08 25
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Scale Space
� So far, we can detect repeatable points in the image
� Now what about the image scale?
� Can we not only detect a distinctive position, but also a characteristic scale around an interest point?
Edgar Seemann, 15.12.08 26
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Automatic Scale Selection
)),(( )),((11
σσ ′′= xIfxIfmm iiii KK
Same operator responses if the patch contains the same image up to scale factor
How to find corresponding patch sizes?
Edgar Seemann, 15.12.08 27
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Automatic Scale Selection
� Function responses for increasing scale (scale signature)
)),((1
σxIfmii K
)),((1
σxIfmii ′
K
Edgar Seemann, 15.12.08 28
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Automatic Scale Selection
� Function responses for increasing scale (scale signature)
)),((1
σxIfmii K
)),((1
σxIfmii ′
K
Edgar Seemann, 15.12.08 29
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Automatic Scale Selection
� Function responses for increasing scale (scale signature)
)),((1
σxIfmii K
)),((1
σxIfmii ′
K
Edgar Seemann, 15.12.08 30
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Automatic Scale Selection
� Function responses for increasing scale (scale signature)
)),((1
σxIfmii K
)),((1
σxIfmii ′
K
Edgar Seemann, 15.12.08 31
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Automatic Scale Selection
� Function responses for increasing scale (scale signature)
)),((1
σxIfmii K
)),((1
σxIfmii ′
K
Edgar Seemann, 15.12.08 32
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Automatic Scale Selection
� Function responses for increasing scale (scale signature)
)),((1
σxIfmii K
)),((1
σ ′′xIfmii K
Edgar Seemann, 15.12.08 33
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hciWhat Is A Useful Signature Function?
� Laplacian-of-Gaussian = “blob” detector
Edgar Seemann, 15.12.08 34
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Laplacian-of-Gaussian (LoG)
� Local maxima in scale space of Laplacian-of-Gaussian
)()( σσ yyxx LL +
σσσσ
σσσσ2222
σσσσ3333
σσσσ4444
σσσσ5555
⇒⇒⇒⇒ List of(x, y, s)
Edgar Seemann, 15.12.08 35
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Results: Laplacian-of-Gaussian
Edgar Seemann, 15.12.08 36
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Maximally Stable Extremal Regions [Matas ‘02]
� Based on Watershed segmentation algorithm
� Select regions that stay stable over a large parameter range
Edgar Seemann, 15.12.08 37
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Example Results: MSER
Edgar Seemann, 15.12.08 38
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Local Descriptors
Edgar Seemann, 15.12.08 39
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Local Descriptors
� Most available descriptors focus on edge/gradient information� Capture boundary and texture information
� Color still used relatively seldom (more suitable for homogenous regions)
Edgar Seemann, 15.12.08 40
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Local Descriptors: SIFT Descriptor
[Lowe, ICCV 1999]
Histogram of oriented gradients
• Captures important texture information
• Robust to small translations /affine deformations
Edgar Seemann, 15.12.08 41
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Orientation Normalization
� Compute orientation histogram
� Select dominant orientation
� Normalize: rotate to fixed orientation
0 2π
[Lowe, SIFT, 1999]
Edgar Seemann, 15.12.08 42
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
• GPU implementation available
� Feature extraction @ 100Hz(detector + descriptor, 640×480 img)
� http://www.vision.ee.ethz.ch/~surf
Local Descriptors: SURF
• Fast approximation of SIFT idea
� Efficient computation by 2D box filters & integral images
⇒⇒⇒⇒ 6 times faster than SIFT
� Equivalent quality for object identification
[Bay, ECCV’06], [Cornelis, CVGPU’08]
Edgar Seemann, 15.12.08 43
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Local Descriptors: Shape Context
Count the number of points inside each bin, e.g.:
Count = 4
Count = 10...
Log-polar binning: more precision for nearby points, more flexibility for farther points.
Belongie & Malik, ICCV 2001
Edgar Seemann, 15.12.08 44
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Local Descriptors: Geometric Blur
Example descriptor
~
Compute edges
at four
orientations
Extract a patch
in each channel
Apply spatially varying
blur and sub-sample
(Idealized signal)
Berg & Malik, CVPR 2001
Edgar Seemann, 15.12.08 45
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hciSo, What Local Features Should I Use?
� There have been extensive evaluations/comparisons� [Mikolajczyk et al., IJCV’05, PAMI’05]
� All detectors/descriptors shown there work well
� Best choice often application dependent� MSER works well for buildings and printed things
� Harris-/Hessian-Laplace/DoG work well for many natural categories
� More features are better� Combining several detectors often helps
Edgar Seemann, 15.12.08 46
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Implicit Shape Model
Edgar Seemann, 15.12.08 47
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Spatial Models Considered
x1
x3
x4
x6
x5
x2
“Star” shape model
x1
x3
x4
x6
x5
x2
Fully connected shape model
Slide credit: Rob Fergus
� e.g. Constellation Model
� parts fully connected
� Recognition Complexity O(np)
� Method: Exhaustive Search
� Complexity restricts method
to a small number of parts
� e.g. ISM
� parts mutually independent
� Recognition Complexity O(np)
� Method: Gen. Hough Transform
� Suited for many local parts
Edgar Seemann, 15.12.08 48
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
48K. Grauman, B. Leibe
Implicit Shape Model (ISM)
� Basic ideas1. Automatically learn a large number of local parts that
occur on the object� Also referred to as visual vocabulary or appearance codebook
2. Learn a star-topology structural model� Features are considered independent given obj. center
x1
x3
x4
x6
x5
x2
Edgar Seemann, 15.12.08 49
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Visual Vocabulary /
Appearance Codebook
Edgar Seemann, 15.12.08 50
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Visual Vocabulary
� Detect keypoints on all training examples
� Extract feature descriptions around keypoints
� Result: A large set of local image descriptors occurring on people
Edgar Seemann, 15.12.08 51
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Visual Vocabulary
� Group visually similar local descriptors� i.e. parts that are reoccurring
� Parts, that occur only once are discarded (they could result from noise or unusual structures)
Edgar Seemann, 15.12.08 52
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Side Note: Grouping Algorithms
� Partitional Clustering� K-Means
� Gaussian Mixture Clustering (EM)
� Hierarchical or Agglomerative Clustering � Single-Link
� Group Average
� Ward’s method (minimum variance)
Edgar Seemann, 15.12.08 54
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Complexity
� Standard Approach:� Time complexity: O(n2logn)
� Compute distance matrix
� Consecutively merge the two most similar clusters
� Space complexity: O(n2)
Edgar Seemann, 15.12.08 55
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hciReciprocal Nearest Neighbor (RNN)
� RNN Algorithm [de Rham’80, Benzecri’82]
� Time complexity: O(n2)
� Space complexity: O(n)
� Requirement: “reducibility property”[Bruynooghe’77]
Edgar Seemann, 15.12.08 56
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Space Complexity
� Note, that space complexity is quite important for clustering large data sets� Example: 100 000 data points
� Standard distance matrix contains:
105*105 =1010 entries
-> ~40 GB if one entry has 32bit-> Does your PC have enough RAM?
Edgar Seemann, 15.12.08 57
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
� Agglomerative clustering produces a hierarchy
� Difficult question: where to stop?� Ideally, clusters should be visually compact.
� But� Distance value depends on feature dimensionality.
� Appropriate ratio #features/#clusters depends on data set and interest point detector.
⇒ Needs to be selected for each detector/descriptor combination!
Clustering Hierarchy
Edgar Seemann, 15.12.08 58
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Visual Vocabulary
� Vocabulary size ~10000 clusters� Probabilistic votes decide, whether part is important or
not
Edgar Seemann, 15.12.08 59
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Learning Spatial Structure:
“Star”-Model
Edgar Seemann, 15.12.08 60
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
1. Learn appearance codebook� Extract local features at interest points� Agglomerative clustering ⇒ codebook
2. Learn spatial distributions� Match codebook to training images� Record matching positions on object
� Sparse representation of the object appearance
Implicit Shape Model - Representation
Edgar Seemann, 15.12.08 61
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Training: Spatial Occurrence (Star-Model)
1. Record spatial occurrence� Match codebook to training images� Record occurrence distributions with
respect to object center� Location (x, y) and scale
Spatial occurrence distributionsStar-Modelx
y
s
x
y
sx
y
s
x
y
s
Edgar Seemann, 15.12.08 62
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Occurrence Distribution
� For each codebook entry, we obtain a non-parametric probability distribution of its position relative to the object center
� With� ci a codebook entry
� λ=(λx, λy, λs) the relative position and scale
Edgar Seemann, 15.12.08 63
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Remember: Generalized Hough Transform [Ballard81]
� Choose reference point for the contour (e.g. center)� For each point on the contour remember where it is located
w.r.t. to the reference point � Remember radius r and angle φφφφ
relative to the contour tangent� Recognition: whenever you find
a contour point, calculate the tangent angle and ‘vote’ for all possible reference points
� Instead of reference point, can also vote for transformation⇒ The same idea can be used with local features!
Slide credit: Bernt Schiele
Edgar Seemann, 15.12.08 64
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
64
Generalized Hough Transform
� For every feature, store possible “occurrences”
– Object identity– Pose– Relative position
• For new image, let the matched features vote for possible objectpositions
Edgar Seemann, 15.12.08 65
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hciProbabilistic Gen. Hough Transform
� Exact correspondences → Prob. match to object part
� NN matching → Soft matching
� Feature location on obj. → Part location distribution
� Uniform votes → Probabilistic vote weighting
� Quantized Hough array → Continuous Hough space
Edgar Seemann, 15.12.08 66
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Detection Procedure
Edgar Seemann, 15.12.08 67
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Image
Recognition: ISM Detection Procedure
Back-ProjectionSegmentation
Probabilistic Voting
xs
3D Voting Space
y
0.7 0.5
Detection Confidences
Edgar Seemann, 15.12.08 68
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Probabilistic Formulation
� Descriptor contribution:
� With� e an extracted image descriptor
� l the position of the descriptor in the image
� Marginalization over all found descriptors:
Edgar Seemann, 15.12.08 69
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
� Mean-Shift formulation for refinement� Scale-adaptive balloon density estimator
Scale Voting: Efficient Computation
y
s
Binned accum. array
y
s
x
Refinement(MSME)
y
s
x
Candidatemaxima
y
s
Scale votes
Edgar Seemann, 15.12.08 70
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Figure-Ground Segmentation
Edgar Seemann, 15.12.08 71
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Occurrence distributions
� Adding local segmentation masks
Spatial occurrence distributionsx
y
s
x
y
sx
y
s
x
y
s
+ local figure-ground labels
Edgar Seemann, 15.12.08 72
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Figure-Ground Segmentation
� Influence of descriptor on an object hypotheses:
� Figure probability for a hypothesis:
Segmentationinformation
Influence on object hypothesis
Edgar Seemann, 15.12.08 73
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Figure-Ground Segmentation
� Final segmentation value:
Edgar Seemann, 15.12.08 75
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Overlapping hypotheses
Edgar Seemann, 15.12.08 76
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Minimum Description Length (MDL) Reasoning
� Savings term:� Sarea : #pixels N in segmentation
� Smodel: model cost, assumed constant
� Serror : estimate of error
� Error term:
� Overlapping hypotheses:
Edgar Seemann, 15.12.08 77
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
� Secondary hypotheses� Desired property of algorithm! ⇒ robustness to occlusion
� Standard solution: reject based on bounding box
⇒ Problematic - may lead to missing detections!
⇒ Use segmentations to resolve ambiguities instead
MDL based Verification
Leibe, Leonardis, Schiele, ‘04
Edgar Seemann, 15.12.08 79
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Extensions and Evaluation
Edgar Seemann, 15.12.08 80
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Outline
1. Image Descriptors and Interest Points� Measure the influence of interest region extraction � Evaluate the robustness of local image descriptions
2. Body Articulations3. Cross-Articulation Learning4. Discriminative Hypothesis Verification5. Instance-Specific Models
Edgar Seemann, 15.12.08 81
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Interest Point Detectors� Interest point detectors sample different image regions� Unclear, which sampling is most informative for pedestrian detection
Harris
Harris-
Laplace
(scale-
invariant)
DoG
(scale-
invariant)
Hessian-
Laplace
(scale-
invariant)
Edgar Seemann, 15.12.08 82
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
� Object shape more important than actual pixel values� Shape generalizes better
� Representation: image patch (25x25 px)� Distance measure: Correlation
� Representation: edge patch (25x25 px)� Distance measure: Chamfer distance
� Representation: Log-polar histogram of edge orientations(9 location bins, 4 edge orientations per bin)
� Distance measure: Euclidean distance
Shape-based Image Descriptors
Codebook (image patches)
Codebook (Local Chamfer)
Local
Chamfer
Shape
Context [Belongie’00,
Mikolajcyzk
et al. ’05]
Image
Patches
Edgar Seemann, 15.12.08 83
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Training Procedures
� Pedestrian shape can be learned from
� Does “clean” model generalize to realistic images?
� Does background noise deteriorate the model?
Silhouettes
(from segmentation)
Real edge images
(Canny edge detector)
Edgar Seemann, 15.12.08 84
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hciResults – ISM with Shape Descriptors
� Learning on real edges leads to better performance
� Shape Context + Hessian-Laplace work best
� Up to 23% improvement
Edgar Seemann, 15.12.08 85
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Advantages and Disadvantages – ISM & Shape
�Large performance increase when using shape-based descriptors
�Detection algorithm is essentially unchanged
�No notion of pedestrian articulations
Edgar Seemann, 15.12.08 86
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
End of Lecture
top related