cv:hci - CVHCI - Visuelle Perzeption für Mensch- Maschine … · 2008. 11. 13. · Edgar Seemann, 07.11.08 6 Computer Vision for Human-Computer Interaction Research Group, Universität

Interactive Systems Laboratories, Universität Karlsr uhe (TH)Edgar Seemann, 07.11.08 1

Visuelle Perzeption für Mensch-Maschine Schnittstellen

Vorlesung, WS 2008

Dr. Rainer StiefelhagenDr. Edgar Seemann

Interactive Systems LaboratoriesUniversität Karlsruhe (TH)

http://isl.ira.uka.de/msmmi/teaching/[email protected]

[email protected]

Edgar Seemann, 07.11.08 2

Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Basics:

Pattern Recognition

WS 2008/09

Dr. Edgar Seemann

[email protected]


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Organisatorisches

� Gruppe 1:Christian JohnerMike MorantePatrick Mehl

� Gruppe 2:Thomas Stephan (Java)Steffen Braun (Java)

� Gruppe 3:Martin WagnerHilke KieritzJan Hendrik Hammer

� Gruppe 4:Wenlei WuChengchao Qu

� Gruppe 5:Michael WeberTomas SemelaDennis Kopcan

� Gruppe 6:Johann KorndoerferDaniel KoesterDaniel Putsch

� Gruppe 7Benjamin BartoschThomas Lichtenstein

� frei:Igor PlotkinAdrian Genaid

� Gruppe 8� Florian Krupicka� Mathias Luedtke


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Question from last lecture

� multiplication of the two projection matrices:

u

v

1

=k u 0 0

0 − k v 0

0 0 0

u 0

v 0

1

f 0 0 0

0 f 0 0

0 0 f 0

0 0 1 0

x

y

z

1

u

v

1

=k u f 0 u 0

0 − k v f v 0

0 0 1

0

0

0

x

y

z

1


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hciSeparate projections were defined as

′x

′y

′z

→

′x

′y

′z

1

≅

fx

fy

fz

z

=

f 0 0 0

0 f 0 0

0 0 f 0

0 0 1 0

x

y

z

1

u

v

1

=ku 0 0

0 −kv 0

0 0 0

u0

v0

1

′x

′y

′z

1


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Termine (1)

6

Termine Thema27.10.2008 Introduction, Applications

31.10.2008 Basics: Image Processing

03.11.2008 Basics: Image Transformations, 2D Structure

07.11.2008 Basics: Pattern recognition

10.11.2008 Computer Vision: Tasks, Challenges, Learning, Performance measures

14.11.2008 Face Detection I: Color, Edges (Birchfield)

17.11.2008 Project 1: Intro + Programming tips21.11.2008 Project 1: Questions

24.11.2008 Face Detection II: ANNs, SVM, Viola & Jones

28.11.2008 Face Recognition I: Traditional Approaches, Eigenfaces, Fisherfaces, EBGM

01.12.2008 Face Recognition II

05.12.2008 Head Pose Estimation: Model-based, NN, Texture Mapping, Focus of Attention

08.12.2008 Project 1: Student Presentations, Project 2: Intro12.12.2008 People Detection I

15.12.2008 People Detection II

19.12.2008 People Detection III (Part-Based Models)

22.12.2008 Scene Context and Geometry I (Ground-Plane, Hoiem, Leibe)


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Today

� Why pattern recognition and what is it?

� Dimensionality Reduction� Principle Component Analysis

� Classification� Bayes Decision Theory

� k-NN� Kd-trees

� Ball trees


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Overview

� Learn common patterns based on either � a priori knowledge or

� on statistical information

� Why statistics?� Manual definition of patterns often tedious or not

obvious

� Important for the adaptability to different tasks/domains

� Finally: Try to mimic human learning / better understand human learning


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Pattern Recognition

� Supervised� Data samples with associated class labels

� Unsupervised� Data samples WITHOUT any labels

� Semi-Supervised / Weakly-Supervised Learning� Not a topic in this lecture

SensorFeature Selection/

ExtractionClassifierRepresentation

pattern

Feature

pattern

Decision


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Pattern Recognition Stages

1. Feature Extraction� Which part of the data is most important

2. Classification/Learning� How can we map the extracted features to our desired

output (supervised learning)?

= x y in [a,b,c,…]w


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Examples

� Speech recognition

� Computer Vision� You will see many examples in the course of this

semester …

w


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

What’s in the box?

� Parametric/Non-parametric distributions

� Support Vector Machines

� Neural Networks

� Decision Trees

� ….


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Problems and Considerations

� Features� How do I encode domain knowledge?

� Allow invariance (e.g. rotation in the case of images)� see e.g. previous lecture

� Which part of the data can be discarded as it represents redundant information?

� How can we reduce the dimensionality?� i.e. how can we make the problem as simple as possible, but as

complex as necessary


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Curse of Dimensionality

� Generally: the more dimensions, the more difficult for a learning algorithm to extract patterns

� More dimensions = more degrees of freedom

Decision boundary

Class A

Class B

Defining a linear decision

boundary:

2-dim: 3 degrees of freedom



…


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Dimensionality Reduction

� Reduce the dimensionality of the data, while retaining relevant information

� Popular techniques:� Principal Component Analysis (PCA)

� Linear Discriminant Analysis (LDA)

� Multidimensional Scaling (MDS)

� …

� Many of these techniques are readily available in MATLAB or Octave


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

PCA


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

First Steps: Sample Variance

� Given a set of samples

with mean

� An unbiased variance estimator is defined as

� with the sequence with zero mean

� written in vectors

),...,( 1 naa

'a

∑ −−

= 2)'(1

1)var( aa

na i ∑−

= 2

1

1iz

n

Tzzn 1

1

−

),...,( 1 nzz


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

First Steps: Covariance

� Given two sets of samples

with means

� A covariance estimator is defined as

� with sequences with zero mean

� written in vectors

),...,( 1 naa

',' ba

∑ −−−

= )')('(1

1),cov( bbaa

nba ii ∑−

= iidcn 1

1

Tcdn 1

1

−

),...,(),,...,( 11 nn ddcc

),...,( 1 nbb


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

First Steps: Covariance Matrix

� Given n sets samples

with means and

d being the dimensionality

� A multi-dimensional covariance estimator is defined as

� Written in vectors

nvv ,...,1

'iv

∑ −−−

= Tjjii vvvv

nV )')('(

1

1)cov(

),...,( 1 di aav =

TVVn 1

1

−


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Variance Properties

� Properties:� var(a)=cov(a,a)

� cov(a,b)=0 iff a,b are completely uncorrelated

� cov(V) is a square-symmetric matrix (containing variances on the diagonal and covariances on the off-diagonals)

)var(),cov(

),cov()var(

1

11

nn

n

vvv

vvv

…..

…..…

……

.…..

…..


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

PCA: Toy example

� Toy example:� Try to understand the motion of a spring

� Data from three camera sensors (x,y-position foreach camera, i.e. 6-dim. data)


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Noise and Redundancy

� Data contains noise

� Data is redundant� 1-dimensional motion vs. 6-dimensional sample data

� Sensor data from the three cameras are highly correlated


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Change of basis

� By changing the basis, we may represent the data almost perfectly with a lower number of dimensions

� Remember that the data has a mean of zero

r2’

r1’

Basis Change


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Change of Basis

� Basis change can be written as matrix multiplication

� Given the new basis vectorswe can transform data samples xi in the following manner

� i.e. we are projecting xi onto the new basis vectors

npp ,...,1


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Reducing the co-variances

� Remember the covariancematrix:

� We have seen that a basis change may reduce the correlation between the different dimensions

� Goal of PCA� Make covariance matrix as diagonal as possible

)var(),cov(

),cov()var(

1

11

nn

n

vvv

vvv

…..

…..…

……

.

…..

…..


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

PCA Assumptions

� Basis (principal components) is orthogonal

� Linearity� Change of basis is a linear operation

� For non-linear problems: kernel PCA

� Mean and variance are sufficient statistics� This holds if data is distributed according to a Gaussian

distribution

� Large variances have important dynamics


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Finally solving it

� Theorem:The covariance matrix is diagonalized by an orthogonal matrix of its eigenvectors

� That is the principal components of the underlying data are the eigenvectors

� The higher the eigenvalue the more variance is captured along the dimension


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Eigenspectrum and Energy


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

PCA in practice

� Eigenvectors can be computed by solving the linear equation system

� In practice, we most often use a singular value decomposition (SVD)

� If the data samples are high dimensional, the covariance matrix can get extremely large� Let the sample xi be images with a resolution of 1600x1200

then the covariance would be a 1920000x1920000 matrix

� Computationally, computing the eigenvectors would be very costly

xAx λ=


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

� Fortunately, it ain’t that bad� If we have a maximum of n data samples, we can also obtain only

a maximum of n eigenvectors (one dimension for each sample point)

� i.e. the number of eigenvectors is restricted by both the sampledimension and the sample number

� Let’s have a look at the following equations:

� λ is eigenvalue of ATA

� λ is eigenvalue of AAT

� We can compute the eigenvectors of ATA and multiply them by A

Algebra to the rescue

)()( AxAxAA

xAAxAA

xAxA

T

T

T

λλ

λ

=

=

=


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Summary PCA

� can be used to reduce the number of dimensions

� Find the principal dimensions� Principal dimensions are assumed to have a high

variance

� discard non-informative dimensions (redundancy and noise)

� Extensions:� Robust PCA (deal with occlusion)

� Incremental PCA

� Kernel PCA (non-linear)


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Pattern Classification


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Bayes Decision Theory

� Example: Character Recognition� Classify characters into the classes C1=a and C2=b

minimizing the probability of an incorrect classification


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Priors

� Prior probabilities represent the probability of a class to occur

� Often used to encode domain knowledge


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Conditional Probability

� Given a feature vector x� x measuring e.g. the width or height of the character

� The conditional probability measures the probability of observing x when the character is of class Ck

)|( kCxp


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Example

� Say x = 15

� What decision should be made?

x = 15


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Example

� Say x = 25


x = 25


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Example

� Say x = 20


x = 20


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Bayes’ Theorem

� The a-posteriori probabilitydefines the probability of a class given a specific feature vector x

)|( xCp k


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Decision boundary

Decision boundary


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Decision Rule

� Choose C1 iff

� this is equivalent to

� this is equivalent to

� Special cases: and

)|()|( 21 xCpxCp >

)()|()()|( 2211 CpCxpCpCxp >

)(

)(

)|(

)|(

1

2

2

1

Cp

Cp

Cxp

Cxp >

)()( 21 CpCp = )|()|( 21 CxpCxp =


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

More Classes

� We can generalize this rule to multi-class problems (e.g. all letters from the alphabet)� Choose the class with the highest a-posteriori

probability


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Example


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Generalization with Loss-Function

� In some applications we may consider the loss of a mis-classification� E.g. medical applications

� Loss if classified as healthy despite a disease:

� Loss if classified as ill despite being healthy:

� We assume: >>

)|( illhealthyλ)|( healthyillλ

)|( illhealthyλ )|( healthyillλ


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Expected Loss

� We have an observation x, but do not know the loss function λ for all x:

� with αi the possible decision and Ck the possible classes

� The expected loss can be computed in the following manner:

?)|( =xiαλ ikki C λαλ =)|(


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Example

� Let’s look at a two class example:

� We want to minimize the expected loss, i.e. we choose αi when

>


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Some simple math


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Linear Discriminant FunctionsPerceptron Algorithm


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Linear Discriminant Functions

� Separate two classes with a linear hyper plane

� With the normal vector of the hyper plane

� Examples:� Perceptron

� Linear SVM

0)( wxwxy T +=

0w

Tw

Tw


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Perceptron Algorithmus[Rosenblatt ’58]

� Preconditions:� Data set is linear separable

� Goal� Find a separating hyper plane

� Idea� Iteratively improve solution

� Only update solution if the data sample of interest is classified incorrectly


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Perceptron Algorithm

1. Initialize w = 0

2. Classify a new data sample with the following inner product y(x) = sign(wTx)

3. If correct, then goto step 2else and w = w - y(x)x

4. If no errors left, then doneelse goto step 2


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Algorithm Visualized


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Why does this work?

� Given misclassified sample x with y(x)=1

� Then: w = w - x

� If we now classify the sample again:� y(x) = (w-x)Tx = wTx – xTx

� xTx is positive and therefore the next prediction for this data sample will be closer to the correct value

� Convergence:� Does not converge if data is not separable


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Wait

� We did not consider w0� What happens if separating hyper plane does not pass

through (0,0)

� Small trick:� Choose x to be n+1 dimensional with n+1 dimension always

being 1, then

� Becomes xwxy T=)(

0)( wxwxy T +=


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Instance-Based Learning


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Instance-based Learning

� Learning=storing all training instances

� Classification=assigning target function to a new instance

� Referred to as “Lazy” learning Instance

� Examples:� Template-Matching

� K-Nearest Neighbor


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

1-Nearest-Neighbor

� Features� All instances correspond to points in an n-dimensional

Euclidean space

� Classification done by comparing feature vectors of the different points


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

3-Nearest-Neighbor

� Search for the 3 nearest neighbors� Typically majority vote, when not all neighbors are

from the same class


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

3-Nearest-Neighbor

� Decision surface� Described by the Voronoi diagram

� Voronoi diagram is the dual of the Delaunaytriangulation (computable in O(nlogn))


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

When to Consider Nearest Neighbor ?

� Lots of training data

� Less than 20 attributes (dimensions) per example

� Advantages:� Training is very fast

� Learn complex target functions

� Don’t lose information

� Disadvantages:� Slow at query time

� Easily fooled by irrelevant attributes


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

K-Nearest Neighbors

� Typically decision surface is not computed, but for each new test sample, we compute the nearest neighbors on the fly

� Probabilistic interpretation: estimate density in a local neighborhood

� There are efficient memory indexing techniques in order to retrieve the stored training examples� kd-tree [Friedman et al. 1977]

� Ball-tree


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

KD Tree for NN Search

� Each node contains� Children information� The tightest box that bounds all the data points within the node.


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

NN Search by KD Tree


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci



Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci



Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci



Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci



Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci



Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci



Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Standard Kd-tree construction

� Choose splitting planes by cycling through the dimensions� Example:

� Root node: split in x-direction

� Next level: split in y-direction

� Next level: split in z-direction

� The position of the splitting plane is chosen to be the median of the points (with respect to their coordinates in the axis being used)

� Typically generates a quite balanced tree


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Kd-tree construction

� There exist many (sometimes application specific) heuristics that try to speed up kd-tree creation� E.g. split in the dimension of the highest variance

� Kd-trees can be built incrementally� Useful for incremental learning (for example when

exploring a new territory with a robot)

� Existing libraries:� C++: libkdtree++


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

A kd-tree: level 1


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

A kd-tree: level 2


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

A kd-tree: level 3


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

A kd-tree: level 4


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

A kd-tree: level 5


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

A kd-tree: level 6


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Complexity

� Building a static kd-tree from n points: O(n log n)� O(log n) tree levels, O(n) median search

� Insertion into a balanced kd-tree: O(logn)

� Removal from a balanced kd-tree O(logn)

� Query of an axis-parallel range in a balanced kd-tree: O(n1-1/d +k)� with k the number of the reported points, and d the

dimension of the kd-tree.


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Problem of Dimensionality

� Imagine instances described by 20 attributes, but only 2 are relevant to target function

� Curse of dimensionality: nearest neighbor is easily mislead when high dimensional X


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

Ball Trees

� Can be constructed in a similar fashion as kd-trees

� Ball trees have shown to be superior to kd-trees in many applications (though there is high variance and dataset dependence)

� The Proximity Project [Gray, Lee, Rotella, Moore 2005]


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

A ball-tree: level 1


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci



Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci



Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci



Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci



Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

kNN Discussion

� Highly effective inductive inference method for noisy training data and complex target functions

� Target function for a whole space may be described as a combination of less complex local approximations

� Learning is very simple

� Classification can be time consuming, if training data set is large: O(logn)


Com

pute

r V

isio

n fo

r H

uman

-Com

pute

r In

tera

ctio

n

Res

earc

h G

roup

, Uni

vers

itätK

arls

ruhe

(TH

)cv

:hci

End of Lecture

cv:hci - CVHCI - Visuelle Perzeption für Mensch- Maschine … · 2008. 11. 13. · Edgar Seemann, 07.11.08 6 Computer Vision for Human-Computer Interaction Research Group, Universität

Documents