7/31/2019 Classification for Computer Vision
1/51
Zr ich Autonomous Systems Lab
From eigenfaces to adaboost
Cedric Pradalier
7/31/2019 Classification for Computer Vision
2/51
Autonomou
sSystemsLab
Zr ich
An introduction to the processing of large dimensionalitydataset
7/31/2019 Classification for Computer Vision
3/51
Autonomou
sSystemsLab
Zr ich
Input A database of normalised face photos
A normalised face photo
Output Face identification: whose photo is that?
Face representation in minimal dimension
Face comparison
7/31/2019 Classification for Computer Vision
4/51
Autonomou
sSystemsLab
Zr ich
DatabaseImage to identify
Identification
7/31/2019 Classification for Computer Vision
5/51
Autonomou
sSystemsLab
Zr ich
Why? Different centering
Different mouth shape
Different eye opening
Solution: Extract the most important features
Discard the details
PCA is one solution
7/31/2019 Classification for Computer Vision
6/51
Autonomou
sSystemsLab
Zr ich
Each image is a n x m matrix of pixels Convert it into a nm vector by stacking the columns
A small image is 100x100 -> a 10000 elementvector, i.e. a point in a 10000 dimension space.
7/31/2019 Classification for Computer Vision
7/51
Autonomou
sSystemsLab
Zr ich
M = compute average vector
Subtract M from each vector -> zero-centered distribution.
7/31/2019 Classification for Computer Vision
8/51
Autonomou
sSystemsLab
Zr ich
C = compute covariance matrix
(10000x10000)
= compute the 10000
eigenvalues and eigenvector of C
Change all points into the eigenvectors
frame
1 1, ) ( ,( )p pv v
7/31/2019 Classification for Computer Vision
9/51
Autonomou
sSystemsLab
Zr ich
Select just enough dimensions accordingto the strength of their eigenvalues Typical value of 30-100 dimensions seems enough
for faces
Discard all the remaining dimensions
7/31/2019 Classification for Computer Vision
10/51
Autonomou
sSystemsLab
Zr ich
Prepare the image Start with a face to identify.
Convert the image to a vector
Subtract M
Change to the eigenvectors frame
Keep only the required dimension
Find the closest point in the remaining
dimensions.
7/31/2019 Classification for Computer Vision
11/51
Autonomou
sSystemsLab
Zr ich
Database from Yale: http://cvc.yale.edu/projects/yalefaces/yalefaces.html
165 faces, 11 persons with varying lighting, expression,glasses
Results and algorithms from: http://www.cs.princeton.edu/~cdecoro/eigenfaces/
http://cvc.yale.edu/projects/yalefaces/yalefaces.htmlhttp://www.cs.princeton.edu/~cdecoro/eigenfaces/http://www.cs.princeton.edu/~cdecoro/eigenfaces/http://cvc.yale.edu/projects/yalefaces/yalefaces.html7/31/2019 Classification for Computer Vision
12/51
Autonomou
sSystemsLab
Zr ich
7/31/2019 Classification for Computer Vision
13/51
Autonomou
sSystemsLab
Zr ich
30% of faces used for testing, 70% forlearning.
7/31/2019 Classification for Computer Vision
14/51
Autonomou
sSystemsLab
Zr ich
Variance: normalised cummulative sum of the eigenvalues.
About 55 eigenfaces are required to represent 80% of the information
7/31/2019 Classification for Computer Vision
15/51
Autonomou
sSystemsLab
Zr ich
Adding eigenfaces one at a time
Adding eigenfaces eight at a time
Reconstruction to perfect image requires a lot of eigenfaces, butmuch less than pixels
7/31/2019 Classification for Computer Vision
16/51
Autonomou
sSystemsLab
Zr ich
All faces with glasses have beenignored: not a huge difference.
7/31/2019 Classification for Computer Vision
17/51
Autonomou
sSystemsLab
Zr ich
Most recognitions are correct, even with wide range of expression variation:PCA has relatively low sensitivity to local-changes
7/31/2019 Classification for Computer Vision
18/51
Autonomou
sSystemsLab
Zr ich
7/31/2019 Classification for Computer Vision
19/51
Autonomou
sSystemsLab
Zr ich
9/23 recognitions are wrong. PCA is sensitive to global changes
7/31/2019 Classification for Computer Vision
20/51
Autonomou
sSystemsLab
Zr ich
Normalisation: Normalisation of the range and center of each
dimensions
Computational tools: Eigenvalues or eigenvectors
SVD decomposition
7/31/2019 Classification for Computer Vision
21/51
Autonomou
sSystemsLab
Zr ich
Principal Component Analysis is a good toolto identify main characteristics of a dataset.
It is computationally efficient for recognitionand dimensionality reduction
The construction of the eigenvectors can bevery expensive (esp. for images).
Online PCA techniques have been
researched. For image recognitions, image must be pre-
cut very accurately, with consistent lightingfor the technique to work.
7/31/2019 Classification for Computer Vision
22/51
Autonomou
sSystemsLab
Zr ich
{ xi} a set of points in a D-dimensional space X
{ ui} an orthonormal basis in X
Then:
Approximating on a sub base:
Approximation error:
,
1 1
( )D D
T
n n i i n i i
i i
u x u ux = =
= =
,
1 1
D
n n i i
M
i i n
i i M
x u bz u z b= = +
= + = + Independent of n
2
1
1 N
n n
n
xJ xN =
=
7/31/2019 Classification for Computer Vision
23/51
Autonomou
sSystemsLab
Zr ich
Minimising J: W.r.t z:
W.r.t b:
Substituting:
Leads to:
0T
nj n j
nj
Jz x u
z
= =
1
1
0
TN
T
j j n jnj
J
b x u x ub N =
= = =
{ }1 )(D
T
n n n i ii Mx xx u ux= + =
{ }2
1 1 1
1 N D DT T Tn i i i i
n i M i M
x u u Sx u uJN = = + = +
= =
1
1( )( )
NT
n n
n
x x x x
N
S=
=
Data covariance matrix:
7/31/2019 Classification for Computer Vision
24/51
Autonomou
sSystemsLab
Zr ich
Minimising J:
Finding the optimal ui require a minimisationwith constraints: Introduce Lagrange multipliers
The optimal is found when ui is an eigenvector of S
Eigenvalues are positive, so J is minimal if the ui are
the eigenvector with thes m a l l e s t
eigenvalues
1
1 D Ti i
i M
u SuN
J= +
=
1iu =
1
D
i i i i
i M
Su Ju = +
==
7/31/2019 Classification for Computer Vision
25/51
AutonomousSystemsLab
Zr ich
PCA is the orthogonal projection of the data
onto a lower subspace such that the varianceof the projected data is maximised. Informally: more variance means more information
Probabilistic formulation: Latent variable z: projection on the subspace
EM algorithm: Maximise the log-likelihood of p(x)
Find the optimal W, and : they correspond to thedata mean and the principal component of the data.
-> Can deal with missing data (among other
advantages)
2( ) ( | 0, ) ( | ) ( ,| )p z N z I p x z N x z IW = = +
7/31/2019 Classification for Computer Vision
26/51
AutonomousSystemsLab
Zr ich
ICA: Independent Component Analysis Similar to the Probabilistic formulation, except the
latent variable have a non-linear, non gaussiandistribution:
Used in signal processing. Typical example is blindsource separation in audio signal analysis.
CCA: Canonical Correlation Analysis Creates a model that maximally correlates 2 sets
of variable
Used in data analysis/statistic to find what iscommon between two sets of observations.
1
( () )M
j
j
p zp z=
=
7/31/2019 Classification for Computer Vision
27/51
AutonomousSystemsLab
Zr ich
A good way to build a classifier
7/31/2019 Classification for Computer Vision
28/51
AutonomousSystemsLab
Zr ich
What is classification (in layman terms)?NL
L
7/31/2019 Classification for Computer Vision
29/51
AutonomousSystemsLab
Zr ich
Computational learning theorydistinguishes between a: Strong learning algorithm: finds with a high
probability an arbitrarily accurate classifier Weak learning algorithm: Only finds a classifier
with a bounded accuracy.
For example: Support Vector Machineswith linear kernel only create a boundedaccuracy.
But: They are at least better than randomguessing! (i.e. the classification error is lower than 0.5)
7/31/2019 Classification for Computer Vision
30/51
AutonomousSystemsLab
Zr ich
SVM Support vector machines for joint multvariablesoptimization [Spinello08]
Slide from prof. Buhman: Machine Learning
7/31/2019 Classification for Computer Vision
31/51
AutonomousSystemsLab
Zr ich
Decision Stumps are a class of very simpleweak classifiers. Goal: Find an axis-aligned hyperplane
that minimizes the classification error.
This can be done for each feature (i.e.for each dimension in feature space)
It can be shown that the classification erroris always better than 0.5 (random
guessing). Idea: apply many weak classifiers, where
each is trained on the misclassifiedexamples of the previous.
7/31/2019 Classification for Computer Vision
32/51
AutonomousSystemsLab
Zr ich
Weak classifiers (in Adaboost) are binaryclassifiers
>+=
m
xmmjxc
j
),,|(
Stump: simple most non trivial type of decision tree(equivalent to a linear classifier defined by affine
hyperplane)
)1,1( +m
The hyperplane is orthogonal to j axis with which it intersects in
(it ignores all entries ofx except )
jx
1x
2x
7/31/2019 Classification for Computer Vision
33/51
AutonomousSystemsLab
Zr ich
Boosting is a technique to build a stronglearning algorithm from a given weaklearning algorithm.
The most popular boosting algorithm is
AdaBoost (adaptive boosting). It assigns a weight to each training data point.
In the beginning, all weights are equal
In each round AdaBoost finds a weak classifier and
re-weights the misclassified points.
Correct classified points are weighted less,
misclassified points are weighted higher
7/31/2019 Classification for Computer Vision
34/51
AutonomousSystemsLab
Zr ich
Algorithm TrainAdaBoost:1. for do
2. for do
3. Find a classifier that minimizes
4. compute
5. return
7/31/2019 Classification for Computer Vision
35/51
AutonomousSystemsLab
Zr ich
Algorithm ClassifyAdaBoost:
1. return
Major features:
Accuracy of the classifier increases with thenumber Mof weak classifiers. I.e. the algorithm is
arbitrarily accurate Classification can be done very fast (in contrast to
training)
7/31/2019 Classification for Computer Vision
36/51
AutonomousSystemsLab
Zr ich
Slide from prof. Buhman: Machine Learning
7/31/2019 Classification for Computer Vision
37/51
AutonomousSystemsLab
Zr ich
Slide from prof. Buhman: Machine Learning
7/31/2019 Classification for Computer Vision
38/51
AutonomousSystemsLab
Zr ich
Slide from prof. Buhman: Machine Learning
7/31/2019 Classification for Computer Vision
39/51
AutonomousSystemsLab
Zr ich
The state of the art:
Robust Real-time Object Detection,
Paul Viola and Michael Jones, IWSCTV, 2001
7/31/2019 Classification for Computer Vision
40/51
AutonomousSystemsLab
Zr ich
Features for face detection
Quick evaluation through the integral image
approach
Classifier selection How to select a minimal set of features/weak
classifier to detect a face
Classifier cascade
How to efficiently assemble classifiers
7/31/2019 Classification for Computer Vision
41/51
AutonomousSystemsLab
Zr ich
Defined as difference ofrectangular integral area: The sum of the pixels which
lie within the white
rectangles are subtractedfrom the sum of pixels inthe grey rectangles.
One feature defined as: Feature type: A,B,C or D
Feature position and size
( )( )
( , ) ( , )White Grey
I x y dxdy I x y dxdy
7/31/2019 Classification for Computer Vision
42/51
AutonomousSystemsLab
Zr ich
Defined as :
Integral on rectangle D canbe computed in 4 access toIint:
Very efficient way tocompute features
= I(x,y) dy dx( , )intx X y Y
I X Y
(1( ), ) (4) (2) (3)int int int int
D
I Ix y II I = +
7/31/2019 Classification for Computer Vision
43/51
7/31/2019 Classification for Computer Vision
44/51
Autonomo
usSystemsLab
Zr ich
7/31/2019 Classification for Computer Vision
45/51
Autonomo
usSystemsLab
Zr ich
A classifier with only this two features can be trained torecognise 100% of the faces, with 40% of false positives
7/31/2019 Classification for Computer Vision
46/51
Autonomo
usSystemsLab
Zr ich
scale = 24x24
Do {
For each position in the image {
Try classifying the part of the image starting at this
position, with the current scale, using the classifier
selected by AdaBoost
} Scale = Scale x 1.5
} until maximum scale
7/31/2019 Classification for Computer Vision
47/51
Autonomo
usSystemsLab
Zr ich
Basic idea:
It is easy to detect that something is not a face
Tune(boost) classifier to be very reliable at saying
NO (i.e. very low false negative) Stop evaluating the cascade of classifier if one
classifier says NO
7/31/2019 Classification for Computer Vision
48/51
7/31/2019 Classification for Computer Vision
49/51
Autonomo
usSystemsLab
Zr ich
7/31/2019 Classification for Computer Vision
50/51
Autonomo
usSystemsLab
Zr ich
7/31/2019 Classification for Computer Vision
51/51
Autonomo
usSystemsLab
Face detection is solved
Algorithms such as Viola-Jones AdaBoost are very
efficient and easily implemented in hardware
Occurring on digital camera and camcorder
The approach used in Viola-Jones algorithm
are generic enough to be used for other
detection tasks PCA can still be useful, but only on very
controlled settings