cv:hci - Visual Perception for Human-Computer Interaction · 2008-12-27 · Computer Vision for Human-Computer Interaction Research Group, Universität Karlsruhe (TH) cv:hci Requirements

Edgar Seemann, 19.12.08 1

Visual Perception for Human-Computer Interaction

WS 2008/09

Dr. Rainer StiefelhagenDr. Edgar Seemann

Interactive Systems LaboratoriesUniversität Karlsruhe (TH)

http://isl.ira.uka.de/[email protected]

[email protected]


Termine (1)Termine Thema

27.10.2008 Introduction, Applications

31.10.2008 Basics: Image Processing

03.11.2008 Basics: Image Transformations, 2D Structure

07.11.2008 Basics: Pattern recognition

10.11.2008 Computer Vision: Tasks, Challenges, Learning, Performance measures

14.11.2008 Face Detection I: Color, Edges (Birchfield)

17.11.2008 Project 1: Intro + Programming tips21.11.2008 Face Detection II: ANNs, SVM, Viola & Jones

24.11.2008 Project 1: Questions

28.11.2008 Face Recognition I: Traditional Approaches, Eigenfaces, Fisherfaces, EBGM

01.12.2008 Face Recognition II

05.12.2008 Head Pose Estimation: Model-based, NN, Texture Mapping, Focus of Attention

08.12.2008 Project 1: Student Presentations, Project 2: Intro 12.12.2008 People Detection I

15.12.2008 People Detection II

19.12.2008 People Detection III

22.12.2008 Scene Context and Geometry I


Organisatorisches

� Assignment deadlines have been changed

� Assignment 2 is due: January 16th, 2009

� Assignment 3 is due: February 6th, 2009

-> you have one more week

� Please don’t forget to send me your files from assignment 1


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Today

� Local Features [credits: K. Mikolajczyk, B. Leibe, B. Schiele]

� Interest Point Detectors

� Scale Selection

� Implicit Shape Model� Codebook Representation

� Detection Loop

� Interleaved detection and segmentation

� Pose recovery

� Combination of generative and discriminative approaches


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Local Features


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

So far

� Parts were defined manually

� Parts represented the semantic structure� i.e. face, leg etc.

� Questions:� Do these parts decompose the variability in an optimal

way?� Must the parts have a semantic meaning

� Should we use smaller/larger parts?

� Can we find parts automatically?


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Requirements for part decomposition

� Repeatable� i.e. we should be able to find the part despite articulation or image

transformations (e.g. rotation, perspective, lighting)

� Distinctive� Part should not be confounded with other parts� The regions should contain an “interesting” structure

� Compact� Typically no lengthy or strangely shaped parts

� Efficient� It should be computationally inexpensive to detect or represent part

� Cover� parts need to sufficiently cover the object


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Going local

� Local Feature Approaches� Use a large number of parts (typically 100-10000 parts)

� Parts have mostly no direct semantic meaning

� Parts are generated automatically

� Let algorithm find its own parts

� Typically smaller partsθq

φ

dq

φ

θ

d


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Keypoints and descriptors

� We distinguish� Key or interest points

� Local (key point) descriptors

� Interest Points� Specify repeatable points on the object

� x-, y-position and scale

� Local Descriptors� Define the feature representation around an interest

point


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

ApproachN

pix

els

N pixels

Similarity

measureAf

e.g. color

Bf

e.g. color

B1

B2

B3A1

A2 A3

Tffd BA <),(

1. Find a set of distinctive key-points

3. Extract and normalize the region content

2. Define a region around each keypoint

4. Compute a local descriptor from the normalized region

5. Match local descriptors


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Key Point Detectors


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Key Point Detectors

� Many Existing Detectors Available� Hessian & Harris [Beaudet ‘78], [Harris ‘88]

� Laplacian, DoG [Lindeberg ‘98], [Lowe 1999]

� Harris-/Hessian-Laplace [Mikolajczyk & Schmid ‘01]

� Harris-/Hessian-Affine [Mikolajczyk & Schmid ‘04]

� EBR and IBR [Tuytelaars & Van Gool ‘04]

� MSER [Matas ‘02]

� Salient Regions [Kadir & Brady ‘01]

� Others…

� Reference site:� http://www.robots.ox.ac.uk/~vgg/research/affine/index.

html


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Keypoint Localization

� Goals: � Repeatable detection� Precise localization� Interesting content

⇒ Look for two-dimensional signal changes


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Hessian Detector [Beaudet78]

� Hessian determinant

=

yyxy

xyxx

II

IIIHessian )(

Ixx

IyyIxy

Intuition: Search for strongderivatives in two orthogonal directions


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Hessian Detector [Beaudet78]

� Hessian determinantIxx

IyyIxy

2))(det( xyyyxx IIIIHessian −=

2)^(. xyyyxx III −∗In Matlab:

=

yyxy

xyxx

II

IIIHessian )(


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

16

Hessian Detector – Responses [Beaudet78]

Effect: Responses mainly on corners and strongly textured areas.


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Hessian Detector – Responses [Beaudet78]


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Harris Detector [Harris88]

� Second moment matrix(autocorrelation matrix)

K. Grauman, B. Leibe

∗=

)()(

)()()(),( 2

2

DyDyx

DyxDxIDI III

IIIg

σσσσ

σσσµ

Intuition: Search for local neighborhoods where the image content has two main directions (eigenvectors).


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci




1. Image derivatives

gx(σD), gy(σD),

IxIy

∗=

)()(

)()()(),( 2

2

DyDyx

DyxDxIDI III

IIIg

σσσσ

σσσµ


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci




∗=

)()(

)()()(),( 2

2

DyDyx

DyxDxIDI III

IIIg

σσσσ

σσσµ

1. Image derivatives

Ix(σD), Iy(σD),

IxIy

2. Square of

derivatives

Ix2 Iy

2 IxIy


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci



Iy

∗=

)()(

)()()(),( 2

2

DyDyx

DyxDxIDI III

IIIg

σσσσ

σσσµ

1. Image

derivatives

2. Square of

derivatives

3. Gaussian

filter g(σI)

Ix Iy

Ix2 Iy

2 IxIy

g(Ix2) g(Iy

2) g(IxIy)

21


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci



Iy

∗=

)()(

)()()(),( 2

2

DyDyx

DyxDxIDI III

IIIg

σσσσ

σσσµ

g(IxIy)

22

1. Image

derivatives

2. Square of

derivatives

3. Gaussian

filter g(σI)

Ix Iy

Ix2 Iy

2 IxIy

g(Ix2) g(Iy

2) g(IxIy)

222222 )]()([)]([)()( yxyxyx IgIgIIgIgIg +−− α

=−= ))],([trace()],(det[ DIDIhar σσµασσµ4. Cornerness function – both eigenvalues are strong

har5. Non-maxima suppression


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

23

Harris Detector – Responses [Harris88]

Effect: A very precise corner detector.


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hciHarris Detector – Responses [Harris88]


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Scale Space

� So far, we can detect repeatable points in the image

� Now what about the image scale?

� Can we not only detect a distinctive position, but also a characteristic scale around an interest point?


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Automatic Scale Selection

)),(( )),((11

σσ ′′= xIfxIfmm iiii KK

Same operator responses if the patch contains the same image up to scale factor

How to find corresponding patch sizes?


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci


� Function responses for increasing scale (scale signature)

)),((1

σxIfmii K

)),((1

σxIfmii ′

K


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci



)),((1

σxIfmii K

)),((1

σxIfmii ′

K


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci



)),((1

σxIfmii K

)),((1

σxIfmii ′

K


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci



)),((1

σxIfmii K

)),((1

σxIfmii ′

K


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci



)),((1

σxIfmii K

)),((1

σxIfmii ′

K


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci



)),((1

σxIfmii K

)),((1

σ ′′xIfmii K


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hciWhat Is A Useful Signature Function?

� Laplacian-of-Gaussian = “blob” detector


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Laplacian-of-Gaussian (LoG)

� Local maxima in scale space of Laplacian-of-Gaussian

)()( σσ yyxx LL +

σσσσ

σσσσ2222

σσσσ3333

σσσσ4444

σσσσ5555

⇒⇒⇒⇒ List of(x, y, s)


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Results: Laplacian-of-Gaussian


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Maximally Stable Extremal Regions [Matas ‘02]

� Based on Watershed segmentation algorithm

� Select regions that stay stable over a large parameter range


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Example Results: MSER


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Local Descriptors


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Local Descriptors

� Most available descriptors focus on edge/gradient information� Capture boundary and texture information

� Color still used relatively seldom (more suitable for homogenous regions)


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Local Descriptors: SIFT Descriptor

[Lowe, ICCV 1999]

Histogram of oriented gradients

• Captures important texture information

• Robust to small translations /affine deformations


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Orientation Normalization

� Compute orientation histogram

� Select dominant orientation

� Normalize: rotate to fixed orientation

0 2π

[Lowe, SIFT, 1999]


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

• GPU implementation available

� Feature extraction @ 100Hz(detector + descriptor, 640×480 img)

� http://www.vision.ee.ethz.ch/~surf

Local Descriptors: SURF

• Fast approximation of SIFT idea

� Efficient computation by 2D box filters & integral images

⇒⇒⇒⇒ 6 times faster than SIFT

� Equivalent quality for object identification

[Bay, ECCV’06], [Cornelis, CVGPU’08]


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Local Descriptors: Shape Context

Count the number of points inside each bin, e.g.:

Count = 4

Count = 10...

Log-polar binning: more precision for nearby points, more flexibility for farther points.

Belongie & Malik, ICCV 2001


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Local Descriptors: Geometric Blur

Example descriptor

~

Compute edges

at four

orientations

Extract a patch

in each channel

Apply spatially varying

blur and sub-sample

(Idealized signal)

Berg & Malik, CVPR 2001


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hciSo, What Local Features Should I Use?

� There have been extensive evaluations/comparisons� [Mikolajczyk et al., IJCV’05, PAMI’05]

� All detectors/descriptors shown there work well

� Best choice often application dependent� MSER works well for buildings and printed things

� Harris-/Hessian-Laplace/DoG work well for many natural categories

� More features are better� Combining several detectors often helps


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Implicit Shape Model


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Spatial Models Considered

x1

x3

x4

x6

x5

x2

“Star” shape model

x1

x3

x4

x6

x5

x2

Fully connected shape model

Slide credit: Rob Fergus

� e.g. Constellation Model

� parts fully connected

� Recognition Complexity O(np)

� Method: Exhaustive Search

� Complexity restricts method

to a small number of parts

� e.g. ISM

� parts mutually independent

� Recognition Complexity O(np)

� Method: Gen. Hough Transform

� Suited for many local parts


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

48K. Grauman, B. Leibe

Implicit Shape Model (ISM)

� Basic ideas1. Automatically learn a large number of local parts that

occur on the object� Also referred to as visual vocabulary or appearance codebook

2. Learn a star-topology structural model� Features are considered independent given obj. center

x1

x3

x4

x6

x5

x2


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Visual Vocabulary /

Appearance Codebook


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Visual Vocabulary

� Detect keypoints on all training examples

� Extract feature descriptions around keypoints

� Result: A large set of local image descriptors occurring on people


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Visual Vocabulary

� Group visually similar local descriptors� i.e. parts that are reoccurring

� Parts, that occur only once are discarded (they could result from noise or unusual structures)


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Side Note: Grouping Algorithms

� Partitional Clustering� K-Means

� Gaussian Mixture Clustering (EM)

� Hierarchical or Agglomerative Clustering � Single-Link

� Group Average

� Ward’s method (minimum variance)


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Complexity

� Standard Approach:� Time complexity: O(n2logn)

� Compute distance matrix

� Consecutively merge the two most similar clusters

� Space complexity: O(n2)


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hciReciprocal Nearest Neighbor (RNN)

� RNN Algorithm [de Rham’80, Benzecri’82]

� Time complexity: O(n2)

� Space complexity: O(n)

� Requirement: “reducibility property”[Bruynooghe’77]


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Space Complexity

� Note, that space complexity is quite important for clustering large data sets� Example: 100 000 data points

� Standard distance matrix contains:

105*105 =1010 entries

-> ~40 GB if one entry has 32bit-> Does your PC have enough RAM?


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

� Agglomerative clustering produces a hierarchy

� Difficult question: where to stop?� Ideally, clusters should be visually compact.

� But� Distance value depends on feature dimensionality.

� Appropriate ratio #features/#clusters depends on data set and interest point detector.

⇒ Needs to be selected for each detector/descriptor combination!

Clustering Hierarchy


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Visual Vocabulary

� Vocabulary size ~10000 clusters� Probabilistic votes decide, whether part is important or

not


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Learning Spatial Structure:

“Star”-Model


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

1. Learn appearance codebook� Extract local features at interest points� Agglomerative clustering ⇒ codebook

2. Learn spatial distributions� Match codebook to training images� Record matching positions on object

� Sparse representation of the object appearance

Implicit Shape Model - Representation


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Training: Spatial Occurrence (Star-Model)

1. Record spatial occurrence� Match codebook to training images� Record occurrence distributions with

respect to object center� Location (x, y) and scale

Spatial occurrence distributionsStar-Modelx

y

s

x

y

sx

y

s

x

y

s


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Occurrence Distribution

� For each codebook entry, we obtain a non-parametric probability distribution of its position relative to the object center

� With� ci a codebook entry

� λ=(λx, λy, λs) the relative position and scale


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Remember: Generalized Hough Transform [Ballard81]

� Choose reference point for the contour (e.g. center)� For each point on the contour remember where it is located

w.r.t. to the reference point � Remember radius r and angle φφφφ

relative to the contour tangent� Recognition: whenever you find

a contour point, calculate the tangent angle and ‘vote’ for all possible reference points

� Instead of reference point, can also vote for transformation⇒ The same idea can be used with local features!

Slide credit: Bernt Schiele


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

64

Generalized Hough Transform

� For every feature, store possible “occurrences”

– Object identity– Pose– Relative position

• For new image, let the matched features vote for possible objectpositions


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hciProbabilistic Gen. Hough Transform

� Exact correspondences → Prob. match to object part

� NN matching → Soft matching

� Feature location on obj. → Part location distribution

� Uniform votes → Probabilistic vote weighting

� Quantized Hough array → Continuous Hough space


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Detection Procedure


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Image

Recognition: ISM Detection Procedure

Back-ProjectionSegmentation

Probabilistic Voting

xs

3D Voting Space

y

0.7 0.5

Detection Confidences


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Probabilistic Formulation

� Descriptor contribution:

� With� e an extracted image descriptor

� l the position of the descriptor in the image

� Marginalization over all found descriptors:


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

� Mean-Shift formulation for refinement� Scale-adaptive balloon density estimator

Scale Voting: Efficient Computation

y

s

Binned accum. array

y

s

x

Refinement(MSME)

y

s

x

Candidatemaxima

y

s

Scale votes


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Figure-Ground Segmentation


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Occurrence distributions

� Adding local segmentation masks

Spatial occurrence distributionsx

y

s

x

y

sx

y

s

x

y

s

+ local figure-ground labels


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci


� Influence of descriptor on an object hypotheses:

� Figure probability for a hypothesis:

Segmentationinformation

Influence on object hypothesis


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci


� Final segmentation value:


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Overlapping hypotheses


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Minimum Description Length (MDL) Reasoning

� Savings term:� Sarea : #pixels N in segmentation

� Smodel: model cost, assumed constant

� Serror : estimate of error

� Error term:

� Overlapping hypotheses:


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

� Secondary hypotheses� Desired property of algorithm! ⇒ robustness to occlusion

� Standard solution: reject based on bounding box

⇒ Problematic - may lead to missing detections!

⇒ Use segmentations to resolve ambiguities instead

MDL based Verification

Leibe, Leonardis, Schiele, ‘04


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Extensions and Evaluation


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Outline

1. Image Descriptors and Interest Points� Measure the influence of interest region extraction � Evaluate the robustness of local image descriptions

2. Body Articulations3. Cross-Articulation Learning4. Discriminative Hypothesis Verification5. Instance-Specific Models


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Interest Point Detectors� Interest point detectors sample different image regions� Unclear, which sampling is most informative for pedestrian detection

Harris

Harris-

Laplace

(scale-

invariant)

DoG

(scale-

invariant)

Hessian-

Laplace

(scale-

invariant)


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

� Object shape more important than actual pixel values� Shape generalizes better

� Representation: image patch (25x25 px)� Distance measure: Correlation

� Representation: edge patch (25x25 px)� Distance measure: Chamfer distance

� Representation: Log-polar histogram of edge orientations(9 location bins, 4 edge orientations per bin)

� Distance measure: Euclidean distance

Shape-based Image Descriptors

Codebook (image patches)

Codebook (Local Chamfer)

Local

Chamfer

Shape

Context [Belongie’00,

Mikolajcyzk

et al. ’05]

Image

Patches


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Training Procedures

� Pedestrian shape can be learned from

� Does “clean” model generalize to realistic images?

� Does background noise deteriorate the model?

Silhouettes

(from segmentation)

Real edge images

(Canny edge detector)


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hciResults – ISM with Shape Descriptors

� Learning on real edges leads to better performance

� Shape Context + Hessian-Laplace work best

� Up to 23% improvement


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Advantages and Disadvantages – ISM & Shape

�Large performance increase when using shape-based descriptors

�Detection algorithm is essentially unchanged

�No notion of pedestrian articulations


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

End of Lecture

cv:hci - Visual Perception for Human-Computer Interaction · 2008-12-27 · Computer Vision for Human-Computer Interaction Research Group, Universität Karlsruhe (TH) cv:hci Requirements

Documents