Interactive Systems Laboratories, Universität Karlsruhe (TH) Edgar Seemann, 07.11.08 1 Visuelle Perzeption für Mensch- Maschine Schnittstellen Vorlesung, WS 2008 Dr. Rainer Stiefelhagen Dr. Edgar Seemann Interactive Systems Laboratories Universität Karlsruhe (TH) http://isl.ira.uka.de/msmmi/teaching/visionhci [email protected][email protected]
87
Embed
cv:hci - CVHCI - Visuelle Perzeption für Mensch- Maschine … · 2008. 11. 13. · Edgar Seemann, 07.11.08 6 Computer Vision for Human-Computer Interaction Research Group, Universität
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Interactive Systems Laboratories, Universität Karlsr uhe (TH)Edgar Seemann, 07.11.08 1
Visuelle Perzeption für Mensch-Maschine Schnittstellen
Vorlesung, WS 2008
Dr. Rainer StiefelhagenDr. Edgar Seemann
Interactive Systems LaboratoriesUniversität Karlsruhe (TH)
� Learn common patterns based on either � a priori knowledge or
� on statistical information
� Why statistics?� Manual definition of patterns often tedious or not
obvious
� Important for the adaptability to different tasks/domains
� Finally: Try to mimic human learning / better understand human learning
Edgar Seemann, 07.11.08 9
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Pattern Recognition
� Supervised� Data samples with associated class labels
� Unsupervised� Data samples WITHOUT any labels
� Semi-Supervised / Weakly-Supervised Learning� Not a topic in this lecture
SensorFeature Selection/
ExtractionClassifierRepresentation
pattern
Feature
pattern
Decision
Edgar Seemann, 07.11.08 10
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Pattern Recognition Stages
1. Feature Extraction� Which part of the data is most important
2. Classification/Learning� How can we map the extracted features to our desired
output (supervised learning)?
= x y in [a,b,c,…]w
Edgar Seemann, 07.11.08 11
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Examples
� Speech recognition
� Computer Vision� You will see many examples in the course of this
semester …
w
Edgar Seemann, 07.11.08 12
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
What’s in the box?
� Parametric/Non-parametric distributions
� Support Vector Machines
� Neural Networks
� Decision Trees
� ….
Edgar Seemann, 07.11.08 13
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Problems and Considerations
� Features� How do I encode domain knowledge?
� Allow invariance (e.g. rotation in the case of images)� see e.g. previous lecture
� Which part of the data can be discarded as it represents redundant information?
� How can we reduce the dimensionality?� i.e. how can we make the problem as simple as possible, but as
complex as necessary
Edgar Seemann, 07.11.08 14
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Curse of Dimensionality
� Generally: the more dimensions, the more difficult for a learning algorithm to extract patterns
� More dimensions = more degrees of freedom
Decision boundary
Class A
Class B
Defining a linear decision
boundary:
2-dim: 3 degrees of freedom
3-dim: 5 degrees of freedom
4-dim: 7 degrees of freedom
…
Edgar Seemann, 07.11.08 15
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Dimensionality Reduction
� Reduce the dimensionality of the data, while retaining relevant information
� Popular techniques:� Principal Component Analysis (PCA)
� Linear Discriminant Analysis (LDA)
� Multidimensional Scaling (MDS)
� …
� Many of these techniques are readily available in MATLAB or Octave
Edgar Seemann, 07.11.08 16
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
PCA
Edgar Seemann, 07.11.08 17
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
First Steps: Sample Variance
� Given a set of samples
with mean
� An unbiased variance estimator is defined as
� with the sequence with zero mean
� written in vectors
),...,( 1 naa
'a
∑ −−
= 2)'(1
1)var( aa
na i ∑−
= 2
1
1iz
n
Tzzn 1
1
−
),...,( 1 nzz
Edgar Seemann, 07.11.08 18
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
First Steps: Covariance
� Given two sets of samples
with means
� A covariance estimator is defined as
� with sequences with zero mean
� written in vectors
),...,( 1 naa
',' ba
∑ −−−
= )')('(1
1),cov( bbaa
nba ii ∑−
= iidcn 1
1
Tcdn 1
1
−
),...,(),,...,( 11 nn ddcc
),...,( 1 nbb
Edgar Seemann, 07.11.08 19
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
First Steps: Covariance Matrix
� Given n sets samples
with means and
d being the dimensionality
� A multi-dimensional covariance estimator is defined as
� Written in vectors
nvv ,...,1
'iv
∑ −−−
= Tjjii vvvv
nV )')('(
1
1)cov(
),...,( 1 di aav =
TVVn 1
1
−
Edgar Seemann, 07.11.08 20
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Variance Properties
� Properties:� var(a)=cov(a,a)
� cov(a,b)=0 iff a,b are completely uncorrelated
� cov(V) is a square-symmetric matrix (containing variances on the diagonal and covariances on the off-diagonals)
)var(),cov(
),cov()var(
1
11
nn
n
vvv
vvv
…..
…..…
……
.…..
…..
Edgar Seemann, 07.11.08 21
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
PCA: Toy example
� Toy example:� Try to understand the motion of a spring
� Data from three camera sensors (x,y-position foreach camera, i.e. 6-dim. data)
Edgar Seemann, 07.11.08 22
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Noise and Redundancy
� Data contains noise
� Data is redundant� 1-dimensional motion vs. 6-dimensional sample data
� Sensor data from the three cameras are highly correlated
Edgar Seemann, 07.11.08 23
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Change of basis
� By changing the basis, we may represent the data almost perfectly with a lower number of dimensions
� Remember that the data has a mean of zero
r2’
r1’
Basis Change
Edgar Seemann, 07.11.08 24
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Change of Basis
� Basis change can be written as matrix multiplication
� Given the new basis vectorswe can transform data samples xi in the following manner
� i.e. we are projecting xi onto the new basis vectors
npp ,...,1
Edgar Seemann, 07.11.08 25
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Reducing the co-variances
� Remember the covariancematrix:
� We have seen that a basis change may reduce the correlation between the different dimensions
� Goal of PCA� Make covariance matrix as diagonal as possible
)var(),cov(
),cov()var(
1
11
nn
n
vvv
vvv
…..
…..…
……
.
…..
…..
Edgar Seemann, 07.11.08 26
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
PCA Assumptions
� Basis (principal components) is orthogonal
� Linearity� Change of basis is a linear operation
� For non-linear problems: kernel PCA
� Mean and variance are sufficient statistics� This holds if data is distributed according to a Gaussian
distribution
� Large variances have important dynamics
Edgar Seemann, 07.11.08 27
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Finally solving it
� Theorem:The covariance matrix is diagonalized by an orthogonal matrix of its eigenvectors
� That is the principal components of the underlying data are the eigenvectors
� The higher the eigenvalue the more variance is captured along the dimension
Edgar Seemann, 07.11.08 28
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Eigenspectrum and Energy
Edgar Seemann, 07.11.08 29
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
PCA in practice
� Eigenvectors can be computed by solving the linear equation system
� In practice, we most often use a singular value decomposition (SVD)
� If the data samples are high dimensional, the covariance matrix can get extremely large� Let the sample xi be images with a resolution of 1600x1200
then the covariance would be a 1920000x1920000 matrix
� Computationally, computing the eigenvectors would be very costly
xAx λ=
Edgar Seemann, 07.11.08 30
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
� Fortunately, it ain’t that bad� If we have a maximum of n data samples, we can also obtain only
a maximum of n eigenvectors (one dimension for each sample point)
� i.e. the number of eigenvectors is restricted by both the sampledimension and the sample number
� Let’s have a look at the following equations:
� λ is eigenvalue of ATA
� λ is eigenvalue of AAT
� We can compute the eigenvectors of ATA and multiply them by A
Algebra to the rescue
)()( AxAxAA
xAAxAA
xAxA
T
T
T
λλ
λ
=
=
=
Edgar Seemann, 07.11.08 32
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Summary PCA
� can be used to reduce the number of dimensions
� Find the principal dimensions� Principal dimensions are assumed to have a high
variance
� discard non-informative dimensions (redundancy and noise)
� Extensions:� Robust PCA (deal with occlusion)
� Incremental PCA
� Kernel PCA (non-linear)
Edgar Seemann, 07.11.08 34
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Pattern Classification
Edgar Seemann, 07.11.08 35
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Bayes Decision Theory
� Example: Character Recognition� Classify characters into the classes C1=a and C2=b
minimizing the probability of an incorrect classification
Edgar Seemann, 07.11.08 36
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Priors
� Prior probabilities represent the probability of a class to occur
� Often used to encode domain knowledge
Edgar Seemann, 07.11.08 37
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Conditional Probability
� Given a feature vector x� x measuring e.g. the width or height of the character
� The conditional probability measures the probability of observing x when the character is of class Ck
)|( kCxp
Edgar Seemann, 07.11.08 38
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Example
� Say x = 15
� What decision should be made?
x = 15
Edgar Seemann, 07.11.08 39
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Example
� Say x = 25
� What decision should be made?
x = 25
Edgar Seemann, 07.11.08 40
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Example
� Say x = 20
� What decision should be made?
x = 20
Edgar Seemann, 07.11.08 41
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Bayes’ Theorem
� The a-posteriori probabilitydefines the probability of a class given a specific feature vector x
)|( xCp k
Edgar Seemann, 07.11.08 42
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Decision boundary
Decision boundary
Edgar Seemann, 07.11.08 43
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Decision Rule
� Choose C1 iff
� this is equivalent to
� this is equivalent to
� Special cases: and
)|()|( 21 xCpxCp >
)()|()()|( 2211 CpCxpCpCxp >
)(
)(
)|(
)|(
1
2
2
1
Cp
Cp
Cxp
Cxp >
)()( 21 CpCp = )|()|( 21 CxpCxp =
Edgar Seemann, 07.11.08 44
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
More Classes
� We can generalize this rule to multi-class problems (e.g. all letters from the alphabet)� Choose the class with the highest a-posteriori
probability
Edgar Seemann, 07.11.08 45
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Example
Edgar Seemann, 07.11.08 46
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Generalization with Loss-Function
� In some applications we may consider the loss of a mis-classification� E.g. medical applications
� Loss if classified as healthy despite a disease:
� Loss if classified as ill despite being healthy:
� We assume: >>
)|( illhealthyλ)|( healthyillλ
)|( illhealthyλ )|( healthyillλ
Edgar Seemann, 07.11.08 47
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Expected Loss
� We have an observation x, but do not know the loss function λ for all x:
� with αi the possible decision and Ck the possible classes
� The expected loss can be computed in the following manner:
?)|( =xiαλ ikki C λαλ =)|(
Edgar Seemann, 07.11.08 48
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Example
� Let’s look at a two class example:
� We want to minimize the expected loss, i.e. we choose αi when
>
Edgar Seemann, 07.11.08 49
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Some simple math
Edgar Seemann, 07.11.08 50
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Linear Discriminant FunctionsPerceptron Algorithm
Edgar Seemann, 07.11.08 51
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Linear Discriminant Functions
� Separate two classes with a linear hyper plane
� With the normal vector of the hyper plane
� Examples:� Perceptron
� Linear SVM
0)( wxwxy T +=
0w
Tw
Tw
Edgar Seemann, 07.11.08 52
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Perceptron Algorithmus[Rosenblatt ’58]
� Preconditions:� Data set is linear separable
� Goal� Find a separating hyper plane
� Idea� Iteratively improve solution
� Only update solution if the data sample of interest is classified incorrectly
Edgar Seemann, 07.11.08 53
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Perceptron Algorithm
1. Initialize w = 0
2. Classify a new data sample with the following inner product y(x) = sign(wTx)
3. If correct, then goto step 2else and w = w - y(x)x
4. If no errors left, then doneelse goto step 2
Edgar Seemann, 07.11.08 54
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Algorithm Visualized
Edgar Seemann, 07.11.08 55
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Why does this work?
� Given misclassified sample x with y(x)=1
� Then: w = w - x
� If we now classify the sample again:� y(x) = (w-x)Tx = wTx – xTx
� xTx is positive and therefore the next prediction for this data sample will be closer to the correct value
� Convergence:� Does not converge if data is not separable
Edgar Seemann, 07.11.08 56
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Wait
� We did not consider w0� What happens if separating hyper plane does not pass
through (0,0)
� Small trick:� Choose x to be n+1 dimensional with n+1 dimension always
being 1, then
� Becomes xwxy T=)(
0)( wxwxy T +=
Edgar Seemann, 07.11.08 57
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Instance-Based Learning
Edgar Seemann, 07.11.08 58
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Instance-based Learning
� Learning=storing all training instances
� Classification=assigning target function to a new instance
� Referred to as “Lazy” learning Instance
� Examples:� Template-Matching
� K-Nearest Neighbor
Edgar Seemann, 07.11.08 59
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
1-Nearest-Neighbor
� Features� All instances correspond to points in an n-dimensional
Euclidean space
� Classification done by comparing feature vectors of the different points
Edgar Seemann, 07.11.08 60
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
3-Nearest-Neighbor
� Search for the 3 nearest neighbors� Typically majority vote, when not all neighbors are
from the same class
Edgar Seemann, 07.11.08 61
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
3-Nearest-Neighbor
� Decision surface� Described by the Voronoi diagram
� Voronoi diagram is the dual of the Delaunaytriangulation (computable in O(nlogn))
Edgar Seemann, 07.11.08 62
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
When to Consider Nearest Neighbor ?
� Lots of training data
� Less than 20 attributes (dimensions) per example
� Advantages:� Training is very fast
� Learn complex target functions
� Don’t lose information
� Disadvantages:� Slow at query time
� Easily fooled by irrelevant attributes
Edgar Seemann, 07.11.08 63
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
K-Nearest Neighbors
� Typically decision surface is not computed, but for each new test sample, we compute the nearest neighbors on the fly
� Probabilistic interpretation: estimate density in a local neighborhood
� There are efficient memory indexing techniques in order to retrieve the stored training examples� kd-tree [Friedman et al. 1977]
� Ball-tree
Edgar Seemann, 07.11.08 64
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
KD Tree for NN Search
� Each node contains� Children information� The tightest box that bounds all the data points within the node.
Edgar Seemann, 07.11.08 65
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
NN Search by KD Tree
Edgar Seemann, 07.11.08 66
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
NN Search by KD Tree
Edgar Seemann, 07.11.08 67
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
NN Search by KD Tree
Edgar Seemann, 07.11.08 68
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
NN Search by KD Tree
Edgar Seemann, 07.11.08 69
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
NN Search by KD Tree
Edgar Seemann, 07.11.08 70
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
NN Search by KD Tree
Edgar Seemann, 07.11.08 71
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
NN Search by KD Tree
Edgar Seemann, 07.11.08 72
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Standard Kd-tree construction
� Choose splitting planes by cycling through the dimensions� Example:
� Root node: split in x-direction
� Next level: split in y-direction
� Next level: split in z-direction
� The position of the splitting plane is chosen to be the median of the points (with respect to their coordinates in the axis being used)
� Typically generates a quite balanced tree
Edgar Seemann, 07.11.08 73
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Kd-tree construction
� There exist many (sometimes application specific) heuristics that try to speed up kd-tree creation� E.g. split in the dimension of the highest variance
� Kd-trees can be built incrementally� Useful for incremental learning (for example when
exploring a new territory with a robot)
� Existing libraries:� C++: libkdtree++
Edgar Seemann, 07.11.08 74
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
A kd-tree: level 1
Edgar Seemann, 07.11.08 75
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
A kd-tree: level 2
Edgar Seemann, 07.11.08 76
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
A kd-tree: level 3
Edgar Seemann, 07.11.08 77
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
A kd-tree: level 4
Edgar Seemann, 07.11.08 78
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
A kd-tree: level 5
Edgar Seemann, 07.11.08 79
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
A kd-tree: level 6
Edgar Seemann, 07.11.08 80
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Complexity
� Building a static kd-tree from n points: O(n log n)� O(log n) tree levels, O(n) median search
� Insertion into a balanced kd-tree: O(logn)
� Removal from a balanced kd-tree O(logn)
� Query of an axis-parallel range in a balanced kd-tree: O(n1-1/d +k)� with k the number of the reported points, and d the
dimension of the kd-tree.
Edgar Seemann, 07.11.08 81
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Problem of Dimensionality
� Imagine instances described by 20 attributes, but only 2 are relevant to target function
� Curse of dimensionality: nearest neighbor is easily mislead when high dimensional X
Edgar Seemann, 07.11.08 83
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Ball Trees
� Can be constructed in a similar fashion as kd-trees
� Ball trees have shown to be superior to kd-trees in many applications (though there is high variance and dataset dependence)
� The Proximity Project [Gray, Lee, Rotella, Moore 2005]
Edgar Seemann, 07.11.08 84
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
A ball-tree: level 1
Edgar Seemann, 07.11.08 85
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
A ball-tree: level 2
Edgar Seemann, 07.11.08 86
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
A ball-tree: level 3
Edgar Seemann, 07.11.08 87
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
A ball-tree: level 4
Edgar Seemann, 07.11.08 88
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
A ball-tree: level 5
Edgar Seemann, 07.11.08 89
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
kNN Discussion
� Highly effective inductive inference method for noisy training data and complex target functions
� Target function for a whole space may be described as a combination of less complex local approximations
� Learning is very simple
� Classification can be time consuming, if training data set is large: O(logn)