cv:hci - Visual Perception for Human-Computer Interaction · 2008-12-27 · Computer Vision for Human-Computer Interaction Research Group, Universität Karlsruhe (TH) cv:hci Requirements

Edgar Seemann, 19.12.08 1

Visual Perception for Human-Computer Interaction

WS 2008/09

Dr. Rainer StiefelhagenDr. Edgar Seemann

Interactive Systems LaboratoriesUniversität Karlsruhe (TH)

http://isl.ira.uka.de/VisionHCICoursestiefel@ira.uka.de

seemann@pedestrian-detection.com

Termine (1)Termine Thema

27.10.2008 Introduction, Applications

31.10.2008 Basics: Image Processing

03.11.2008 Basics: Image Transformations, 2D Structure

07.11.2008 Basics: Pattern recognition

10.11.2008 Computer Vision: Tasks, Challenges, Learning, Performance measures

14.11.2008 Face Detection I: Color, Edges (Birchfield)

17.11.2008 Project 1: Intro + Programming tips21.11.2008 Face Detection II: ANNs, SVM, Viola & Jones

24.11.2008 Project 1: Questions

28.11.2008 Face Recognition I: Traditional Approaches, Eigenfaces, Fisherfaces, EBGM

01.12.2008 Face Recognition II

05.12.2008 Head Pose Estimation: Model-based, NN, Texture Mapping, Focus of Attention

08.12.2008 Project 1: Student Presentations, Project 2: Intro 12.12.2008 People Detection I

15.12.2008 People Detection II

19.12.2008 People Detection III

22.12.2008 Scene Context and Geometry I

Organisatorisches

� Assignment deadlines have been changed

� Assignment 2 is due: January 16th, 2009

� Assignment 3 is due: February 6th, 2009

-> you have one more week

� Please don’t forget to send me your files from assignment 1

itätK

� Local Features [credits: K. Mikolajczyk, B. Leibe, B. Schiele]

� Interest Point Detectors

� Scale Selection

� Implicit Shape Model� Codebook Representation

� Detection Loop

� Interleaved detection and segmentation

� Pose recovery

� Combination of generative and discriminative approaches

itätK

Local Features

itätK

So far

� Parts were defined manually

� Parts represented the semantic structure� i.e. face, leg etc.

� Questions:� Do these parts decompose the variability in an optimal

way?� Must the parts have a semantic meaning

� Should we use smaller/larger parts?

� Can we find parts automatically?

itätK

Requirements for part decomposition

� Repeatable� i.e. we should be able to find the part despite articulation or image

transformations (e.g. rotation, perspective, lighting)

� Distinctive� Part should not be confounded with other parts� The regions should contain an “interesting” structure

� Compact� Typically no lengthy or strangely shaped parts

� Efficient� It should be computationally inexpensive to detect or represent part

� Cover� parts need to sufficiently cover the object

itätK

Going local

� Local Feature Approaches� Use a large number of parts (typically 100-10000 parts)

� Parts have mostly no direct semantic meaning

� Parts are generated automatically

� Let algorithm find its own parts

� Typically smaller partsθq

itätK

Keypoints and descriptors

� We distinguish� Key or interest points

� Local (key point) descriptors

� Interest Points� Specify repeatable points on the object

� x-, y-position and scale

� Local Descriptors� Define the feature representation around an interest

itätK

ApproachN

N pixels

Similarity

measureAf

e.g. color

Tffd BA <),(

1. Find a set of distinctive key-points

3. Extract and normalize the region content

2. Define a region around each keypoint

4. Compute a local descriptor from the normalized region

5. Match local descriptors

itätK

Key Point Detectors

itätK

Key Point Detectors

� Many Existing Detectors Available� Hessian & Harris [Beaudet ‘78], [Harris ‘88]

� Laplacian, DoG [Lindeberg ‘98], [Lowe 1999]

� Harris-/Hessian-Laplace [Mikolajczyk & Schmid ‘01]

� Harris-/Hessian-Affine [Mikolajczyk & Schmid ‘04]

� EBR and IBR [Tuytelaars & Van Gool ‘04]

� MSER [Matas ‘02]

� Salient Regions [Kadir & Brady ‘01]

� Others…

� Reference site:� http://www.robots.ox.ac.uk/~vgg/research/affine/index.

itätK

Keypoint Localization

� Goals: � Repeatable detection� Precise localization� Interesting content

⇒ Look for two-dimensional signal changes

itätK

Hessian Detector [Beaudet78]

� Hessian determinant

IIIHessian )(

IyyIxy

Intuition: Search for strongderivatives in two orthogonal directions

itätK

Hessian Detector [Beaudet78]

� Hessian determinantIxx

IyyIxy

2))(det( xyyyxx IIIIHessian −=

2)^(. xyyyxx III −∗In Matlab:

IIIHessian )(

itätK

Hessian Detector – Responses [Beaudet78]

Effect: Responses mainly on corners and strongly textured areas.

itätK

Hessian Detector – Responses [Beaudet78]

itätK

Harris Detector [Harris88]

� Second moment matrix(autocorrelation matrix)

K. Grauman, B. Leibe

)()()(),( 2

DyxDxIDI III

σσσσ

σσσµ

Intuition: Search for local neighborhoods where the image content has two main directions (eigenvectors).

itätK

1. Image derivatives

gx(σD), gy(σD),

)()()(),( 2

DyxDxIDI III

σσσσ

σσσµ

itätK

)()()(),( 2

DyxDxIDI III

σσσσ

σσσµ

1. Image derivatives

Ix(σD), Iy(σD),

2. Square of

derivatives

Ix2 Iy

2 IxIy

itätK

)()()(),( 2

DyxDxIDI III

σσσσ

σσσµ

1. Image

derivatives

2. Square of

derivatives

3. Gaussian

filter g(σI)

Ix2 Iy

2 IxIy

g(Ix2) g(Iy

2) g(IxIy)

itätK

)()()(),( 2

DyxDxIDI III

σσσσ

σσσµ

g(IxIy)

1. Image

derivatives

2. Square of

derivatives

3. Gaussian

filter g(σI)

Ix2 Iy

2 IxIy

g(Ix2) g(Iy

2) g(IxIy)

222222 )]()([)]([)()( yxyxyx IgIgIIgIgIg +−− α

=−= ))],([trace()],(det[ DIDIhar σσµασσµ4. Cornerness function – both eigenvalues are strong

har5. Non-maxima suppression

itätK

Harris Detector – Responses [Harris88]

Effect: A very precise corner detector.

itätK

:hciHarris Detector – Responses [Harris88]

itätK

Scale Space

� So far, we can detect repeatable points in the image

� Now what about the image scale?

� Can we not only detect a distinctive position, but also a characteristic scale around an interest point?

itätK

Automatic Scale Selection

)),(( )),((11

σσ ′′= xIfxIfmm iiii KK

Same operator responses if the patch contains the same image up to scale factor

How to find corresponding patch sizes?

itätK

� Function responses for increasing scale (scale signature)

)),((1

σxIfmii K

)),((1

σxIfmii ′

itätK

)),((1

σxIfmii K

)),((1

σxIfmii ′

itätK

)),((1

σxIfmii K

)),((1

σxIfmii ′

itätK

)),((1

σxIfmii K

)),((1

σxIfmii ′

itätK

)),((1

σxIfmii K

)),((1

σxIfmii ′

itätK

)),((1

σxIfmii K

)),((1

σ ′′xIfmii K

itätK

:hciWhat Is A Useful Signature Function?

� Laplacian-of-Gaussian = “blob” detector

itätK

Laplacian-of-Gaussian (LoG)

� Local maxima in scale space of Laplacian-of-Gaussian

)()( σσ yyxx LL +

σσσσ

σσσσ2222

σσσσ3333

σσσσ4444

σσσσ5555

⇒⇒⇒⇒ List of(x, y, s)

itätK

Results: Laplacian-of-Gaussian

itätK

Maximally Stable Extremal Regions [Matas ‘02]

� Based on Watershed segmentation algorithm

� Select regions that stay stable over a large parameter range

itätK

Example Results: MSER

itätK

Local Descriptors

itätK

Local Descriptors

� Most available descriptors focus on edge/gradient information� Capture boundary and texture information

� Color still used relatively seldom (more suitable for homogenous regions)

itätK

Local Descriptors: SIFT Descriptor

[Lowe, ICCV 1999]

Histogram of oriented gradients

• Captures important texture information

• Robust to small translations /affine deformations

itätK

Orientation Normalization

� Compute orientation histogram

� Select dominant orientation

� Normalize: rotate to fixed orientation

[Lowe, SIFT, 1999]

itätK

• GPU implementation available

� Feature extraction @ 100Hz(detector + descriptor, 640×480 img)

� http://www.vision.ee.ethz.ch/~surf

Local Descriptors: SURF

• Fast approximation of SIFT idea

� Efficient computation by 2D box filters & integral images

⇒⇒⇒⇒ 6 times faster than SIFT

� Equivalent quality for object identification

[Bay, ECCV’06], [Cornelis, CVGPU’08]

itätK

Local Descriptors: Shape Context

Count the number of points inside each bin, e.g.:

Count = 4

Count = 10...

Log-polar binning: more precision for nearby points, more flexibility for farther points.

Belongie & Malik, ICCV 2001

itätK

Local Descriptors: Geometric Blur

Example descriptor

Compute edges

at four

orientations

Extract a patch

in each channel

Apply spatially varying

blur and sub-sample

(Idealized signal)

Berg & Malik, CVPR 2001

itätK

:hciSo, What Local Features Should I Use?

� There have been extensive evaluations/comparisons� [Mikolajczyk et al., IJCV’05, PAMI’05]

� All detectors/descriptors shown there work well

� Best choice often application dependent� MSER works well for buildings and printed things

� Harris-/Hessian-Laplace/DoG work well for many natural categories

� More features are better� Combining several detectors often helps

itätK

Implicit Shape Model

itätK

Spatial Models Considered

“Star” shape model

Fully connected shape model

Slide credit: Rob Fergus

� e.g. Constellation Model

� parts fully connected

� Recognition Complexity O(np)

� Method: Exhaustive Search

� Complexity restricts method

to a small number of parts

� e.g. ISM

� parts mutually independent

� Recognition Complexity O(np)

� Method: Gen. Hough Transform

� Suited for many local parts

itätK

48K. Grauman, B. Leibe

Implicit Shape Model (ISM)

� Basic ideas1. Automatically learn a large number of local parts that

occur on the object� Also referred to as visual vocabulary or appearance codebook

2. Learn a star-topology structural model� Features are considered independent given obj. center

itätK

Visual Vocabulary /

Appearance Codebook

itätK

Visual Vocabulary

� Detect keypoints on all training examples

� Extract feature descriptions around keypoints

� Result: A large set of local image descriptors occurring on people

itätK

Visual Vocabulary

� Group visually similar local descriptors� i.e. parts that are reoccurring

� Parts, that occur only once are discarded (they could result from noise or unusual structures)

itätK

Side Note: Grouping Algorithms

� Partitional Clustering� K-Means

� Gaussian Mixture Clustering (EM)

� Hierarchical or Agglomerative Clustering � Single-Link

� Group Average

� Ward’s method (minimum variance)

itätK

Complexity

� Standard Approach:� Time complexity: O(n2logn)

� Compute distance matrix

� Consecutively merge the two most similar clusters

� Space complexity: O(n2)

itätK

:hciReciprocal Nearest Neighbor (RNN)

� RNN Algorithm [de Rham’80, Benzecri’82]

� Time complexity: O(n2)

� Space complexity: O(n)

� Requirement: “reducibility property”[Bruynooghe’77]

itätK

Space Complexity

� Note, that space complexity is quite important for clustering large data sets� Example: 100 000 data points

� Standard distance matrix contains:

105*105 =1010 entries

-> ~40 GB if one entry has 32bit-> Does your PC have enough RAM?

itätK

� Agglomerative clustering produces a hierarchy

� Difficult question: where to stop?� Ideally, clusters should be visually compact.

� But� Distance value depends on feature dimensionality.

� Appropriate ratio #features/#clusters depends on data set and interest point detector.

⇒ Needs to be selected for each detector/descriptor combination!

Clustering Hierarchy

itätK

Visual Vocabulary

� Vocabulary size ~10000 clusters� Probabilistic votes decide, whether part is important or

itätK

Learning Spatial Structure:

“Star”-Model

itätK

1. Learn appearance codebook� Extract local features at interest points� Agglomerative clustering ⇒ codebook

2. Learn spatial distributions� Match codebook to training images� Record matching positions on object

� Sparse representation of the object appearance

Implicit Shape Model - Representation

itätK

Training: Spatial Occurrence (Star-Model)

1. Record spatial occurrence� Match codebook to training images� Record occurrence distributions with

respect to object center� Location (x, y) and scale

Spatial occurrence distributionsStar-Modelx

itätK

Occurrence Distribution

� For each codebook entry, we obtain a non-parametric probability distribution of its position relative to the object center

� With� ci a codebook entry

� λ=(λx, λy, λs) the relative position and scale

itätK

Remember: Generalized Hough Transform [Ballard81]

� Choose reference point for the contour (e.g. center)� For each point on the contour remember where it is located

w.r.t. to the reference point � Remember radius r and angle φφφφ

relative to the contour tangent� Recognition: whenever you find

a contour point, calculate the tangent angle and ‘vote’ for all possible reference points

� Instead of reference point, can also vote for transformation⇒ The same idea can be used with local features!

Slide credit: Bernt Schiele

itätK

Generalized Hough Transform

� For every feature, store possible “occurrences”

– Object identity– Pose– Relative position

• For new image, let the matched features vote for possible objectpositions

itätK

:hciProbabilistic Gen. Hough Transform

� Exact correspondences → Prob. match to object part

� NN matching → Soft matching

� Feature location on obj. → Part location distribution

� Uniform votes → Probabilistic vote weighting

� Quantized Hough array → Continuous Hough space

itätK

Detection Procedure

itätK

Recognition: ISM Detection Procedure

Back-ProjectionSegmentation

Probabilistic Voting

3D Voting Space

0.7 0.5

Detection Confidences

itätK

Probabilistic Formulation

� Descriptor contribution:

� With� e an extracted image descriptor

� l the position of the descriptor in the image

� Marginalization over all found descriptors:

itätK

� Mean-Shift formulation for refinement� Scale-adaptive balloon density estimator

Scale Voting: Efficient Computation

Binned accum. array

Refinement(MSME)

Candidatemaxima

Scale votes

itätK

Figure-Ground Segmentation

itätK

Occurrence distributions

� Adding local segmentation masks

Spatial occurrence distributionsx

+ local figure-ground labels

itätK

� Influence of descriptor on an object hypotheses:

� Figure probability for a hypothesis:

Segmentationinformation

Influence on object hypothesis

itätK

� Final segmentation value:

itätK

Overlapping hypotheses

itätK

Minimum Description Length (MDL) Reasoning

� Savings term:� Sarea : #pixels N in segmentation

� Smodel: model cost, assumed constant

� Serror : estimate of error

� Error term:

� Overlapping hypotheses:

itätK

� Secondary hypotheses� Desired property of algorithm! ⇒ robustness to occlusion

� Standard solution: reject based on bounding box

⇒ Problematic - may lead to missing detections!

⇒ Use segmentations to resolve ambiguities instead

MDL based Verification

Leibe, Leonardis, Schiele, ‘04

itätK

Extensions and Evaluation

itätK

Outline

1. Image Descriptors and Interest Points� Measure the influence of interest region extraction � Evaluate the robustness of local image descriptions

2. Body Articulations3. Cross-Articulation Learning4. Discriminative Hypothesis Verification5. Instance-Specific Models

itätK

Interest Point Detectors� Interest point detectors sample different image regions� Unclear, which sampling is most informative for pedestrian detection

Harris

Harris-

Laplace

(scale-

invariant)

(scale-

invariant)

Hessian-

Laplace

(scale-

invariant)

itätK

� Object shape more important than actual pixel values� Shape generalizes better

� Representation: image patch (25x25 px)� Distance measure: Correlation

� Representation: edge patch (25x25 px)� Distance measure: Chamfer distance

� Representation: Log-polar histogram of edge orientations(9 location bins, 4 edge orientations per bin)

� Distance measure: Euclidean distance

Shape-based Image Descriptors

Codebook (image patches)

Codebook (Local Chamfer)

Chamfer

Context [Belongie’00,

Mikolajcyzk

et al. ’05]

Patches

itätK

Training Procedures

� Pedestrian shape can be learned from

� Does “clean” model generalize to realistic images?

� Does background noise deteriorate the model?

Silhouettes

(from segmentation)

Real edge images

(Canny edge detector)

itätK

:hciResults – ISM with Shape Descriptors

� Learning on real edges leads to better performance

� Shape Context + Hessian-Laplace work best

� Up to 23% improvement

itätK

Advantages and Disadvantages – ISM & Shape

�Large performance increase when using shape-based descriptors

�Detection algorithm is essentially unchanged

�No notion of pedestrian articulations

itätK

End of Lecture

cv:hci - Visual Perception for Human-Computer Interaction · 2008-12-27 · Computer Vision for Human-Computer Interaction Research Group, Universität Karlsruhe (TH) cv:hci Requirements

Documents

Center forHuman-Computer Interaction - IraSME · Center...

Human Computer Interaction Notes Interaction Design ... ·....

cv:hci - CVHCI - Visuelle Perzeption für Mensch- Maschine.....

Computer Vision for Human-Computer InteractionResearch...

HUMAN-COMPUTER INTERACTION - UCSD Design...

Human-Computer Interaction Human-Computer Interaction...

TAUCHI – Tampere Unit for Computer-Human Interaction...

Lecture 2 Introduction to Human-Computer Interaction - Part....

Interaction Design Human-computer Interaction

Human-Computer Interaction - USF Computer...

Human Computer Interaction

1 Introduction to Human Computer Interaction What is Human....

CISB213 Human Computer Interaction Understanding Interaction...

Visuelle Perzeption für Mensch- Maschine Schnittstellen ·...

Human Computer Interaction An Introduction. Human-Computer.....

1Human-Computer Interaction Human Computer Interaction...