1 CSE152, Winter 2013 Intro Computer Vision Recognition 3 Introduction to Computer Vision CSE 152 Lecture 19 CSE152, Winter 2013 Intro Computer Vision • HW4 due on Friday • Note fisherfaces paper on web page. CSE152, Winter 2013 Intro Computer Vision Three Levels of Recognition • Category Recognition -- near top of tree (e.g., vehicles) – lots of within class variability • Fine grain Recognition – within a categories (e.g., species of birds) -- Moderate within class variation • Instance recognition (e.g., person identification) – within class mostly shape articulation, bending, etc. CSE152, Winter 2013 Intro Computer Vision Linear Subspaces & Linear Projection • A d-pixel image x∈R d can be projected to a low-dimensional feature space y∈R k by y = Wx where W is an k by d matrix. • Each training image is projected to the subspace • Recognition is performed in R k using, for example, nearest neighbor. • How do we choose a good W? Example: Projecting from R 3 to R 2 R k R d CSE152, Winter 2013 Intro Computer Vision Eigenfaces: Principal Component Analysis (PCA) CSE152, Winter 2013 Intro Computer Vision PCA Example First Principal Component Direction of Maximum Variance v 1 μ v 2 Mean
7
Embed
Recognition 3 - University of California, San Diegocseweb.ucsd.edu › classes › wi14 › cse152-a › lec19.pdfdistances from each datapoint to the the subspace. CSE152, Winter
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
CSE152, Winter 2013 Intro Computer Vision
Recognition 3
Introduction to Computer Vision CSE 152
Lecture 19
CSE152, Winter 2013 Intro Computer Vision
• HW4 due on Friday
• Note fisherfaces paper on web page.
CSE152, Winter 2013 Intro Computer Vision
Three Levels of Recognition • Category Recognition -- near top of tree (e.g.,
vehicles) – lots of within class variability • Fine grain Recognition – within a categories (e.g.,
species of birds) -- Moderate within class variation • Instance recognition (e.g., person identification) –
within class mostly shape articulation, bending, etc.
CSE152, Winter 2013 Intro Computer Vision
Linear Subspaces & Linear Projection
• A d-pixel image x∈Rd can be projected to a low-dimensional feature space y∈Rk by
y = Wx where W is an k by d matrix.
• Each training image is projected to the subspace
• Recognition is performed in Rk using, for example, nearest neighbor.
• How do we choose a good W?
Example: Projecting from R3 to R2
Rk Rd
CSE152, Winter 2013 Intro Computer Vision
Eigenfaces: Principal Component Analysis (PCA)
CSE152, Winter 2013 Intro Computer Vision
PCA Example
First Principal Component Direction of Maximum Variance
v1
µ
v2
Mean
2
CSE152, Winter 2013 Intro Computer Vision
Eigenfaces Modeling
1. Given a collection of n training images xi, represent each one as a d-dimensional column vector
2. Compute the mean image and covariance matrix. 3. Compute k Eigenvectors of the covariance matrix
corresponding to the k largest Eigenvalues and form matrix WT=[v1, v2,…,vk] (Or perform using SVD!!) § Note that the Eigenvectors are images
4. Project the training images to the k-dimensional Eigenspace. yi=Wxi
Recognition 1. Given a test image x, project the vectorized image to the
Eigenspace by y=Wx 2. Perform classification of y to the projected training images.
CSE152, Winter 2013 Intro Computer Vision
Why is W a good projection? • The linear subspace spanned by W
maximizes the variance (i.e., the spread) of the projected data.
• W spans a subspace that is the best approximation to the data in a least squares sense. E.g., W is the subspace that minimizes the the sum of the squared distances from each datapoint to the the subspace.
CSE152, Winter 2013 Intro Computer Vision
Eigenfaces: Training Images
[ Turk, Pentland 91]
CSE152, Winter 2013 Intro Computer Vision
Eigenfaces
Mean Image Basis Images
CSE152, Winter 2013 Intro Computer Vision
An important footnote: We don’t really implement PCA by constructing a
covariance matrix!
Why? 1. How big is Σ?
• d by d where d is the number of pixels in an image!!
2. You only need the first k Eigenvectors
CSE152, Winter 2013 Intro Computer Vision
Singular Value Decomposition • Any m by n matrix A may be factored such that
A = UΣVT
[m x n] = [m x m][m x n][n x n]
• U: m by m, orthogonal matrix – Columns of U are the eigenvectors of AAT
• V: n by n, orthogonal matrix, – columns are the eigenvectors of ATA
• Σ: m by n, diagonal with non-negative entries (σ1, σ2, …, σs) with s=min(m,n) are called the called the singular values. SVD algorithm produces sorted singular values: σ1 ≥ σ2 ≥ … ≥ σs
Important property – Singular values are the square roots of Eigenvalues of both AAT
and ATA & Columns of U are corresponding Eigenvectors!!
3
CSE152, Winter 2013 Intro Computer Vision
SVD Properties • In Matlab [u s v] = svd(A), and you can verify
that: A=u*s*v’ • r=Rank(A) = # of non-zero singular values. • U, V give an orthonormal bases for the subspaces
of A: – 1st r columns of U: Column space of A – Last m - r columns of U: Left nullspace of A – 1st r columns of V: Row space of A – 1st n - r columns of V: Nullspace of A
• For some d where d ≤ r, the first d column of U provide the best d-dimensional basis for columns of A in least squares sense.
CSE152, Winter 2013 Intro Computer Vision
Distance to Linear Subspace • An n-pixel image x∈Rd can be projected to a low-dimensional feature space y∈Rk by
y = Wx • From y ∈ Rk , the reconstruction of the point in Rd is WTy=WTWx
• The error of the reconstruction, or the distance from x to the subspace spanned by W is:
||x-WTWx||
x
y = Wx
CSE152, Winter 2013 Intro Computer Vision
Fisherfaces: Class specific linear projection P. Belhumeur, J. Hespanha, D. Kriegman, Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection, PAMI, July 1997, pp. 711--720.
• An n-pixel image x∈Rd can be projected to a low-dimensional feature space y∈Rk by
y = Wx where W is an k by d matrix. • Recognition is performed using nearest neighbor in Rk.
• How do we choose a good W? CSE152, Winter 2013 Intro Computer Vision
• Eigenfaces (PCA) – maximizes the scatter (spread) of the projected data.
• Fisher’s linear discriminant -- maximizes a different criteria – we’ll see this shortly
• Note: Let Σ be a scatter matrix • The determinant |Σ| of Σ is an indication of the
spread of the data (product of Eigenvalues of Σ) • Let W be a projectoin matrix, then scatter of the
projected data is WTΣW and a measure of spread is |WTΣW|.
CSE152, Winter 2013 Intro Computer Vision
PCA & Fisher’s Linear Discriminant • Let χi be a set of images
of class i • Between-class scatter
• Within-class scatter
• Total scatter
where – c is the number of classes – µi is the mean of class χi – | χi | is number of samples of χi..
Tii
c
iiBS ))((
1µµµµχ −−=∑
=
∑ ∑= ∈
−−=c
i x
TikikW
ik
xxS1
))((χ
µµ
WB
c
i x
TkkT SSxxS
ik
+=−−=∑ ∑= ∈1
))((χ
µµ
µ1
µ2
µ
χ1 χ2
If the data points xi are projected by yi=Wxi and the scatter of xi is S, then the scatter of the projected points yi is WTSW
CSE152, Winter 2013 Intro Computer Vision
PCA & Fisher’s Linear Discriminant • PCA (Eigenfaces)
Maximizes projected total scatter
• Fisher’s Linear Discriminant
Maximizes ratio of projected between-class to projected within-class scatter
WSWW TT
WPCA maxarg=
WSW
WSWW
WT
BT
Wfld maxarg=
χ1 χ2
PCA
FLD
4
CSE152, Winter 2013 Intro Computer Vision
Computing the Fisher Projection Matrix
• The wi are orthonormal • There are at most c-1 non-zero generalized Eigenvalues, so m ≤ c-1 • Can be computed with eig in Matlab
CSE152, Winter 2013 Intro Computer Vision
Fisherfaces
WWSWW
WWSWWW
WSWW
WWW
PCAWTPCA
T
PCABTPCA
T
Wfld
TT
WPCA
PCAfld
maxarg
maxarg
=
=
= • Since SW is rank N-c, project
training set to subspace spanned by first N-c principal components of the training set. • Apply FLD to N-c dimensional subspace yielding c-1 dimensional feature space. • Rd è RN-c è Rc-1
• Fisher’s Linear Discriminant projects away the within-class variation (lighting, expressions) found in training set. • Fisher’s Linear Discriminant preserves the separability of the classes.
CSE152, Winter 2013 Intro Computer Vision
PCA vs. FLD
CSE152, Winter 2013 Intro Computer Vision
Harvard Face Database 15o
45o
30o
60o
• 10 individuals • 66 images per person • Train on 6 images at 15o
Variability: Camera position Illumination Internal parameters
Within-class variations
5
CSE152, Winter 2013 Intro Computer Vision
Appearance manifold approach - for every object 1. sample the set of viewing conditions 2. Crop & scale images to standard size 3. Use as feature vector - apply a PCA over all the images - keep the dominant PCs - Set of views for one object is represented as a manifold in the projected space - Recognition: What is nearest manifold for a given test image?
(Nayar et al. ‘96)
CSE152, Winter 2013 Intro Computer Vision
Parameterized Eigenspace
CSE152, Winter 2013 Intro Computer Vision
Recognition
CSE152, Winter 2013 Intro Computer Vision
Object Bag of ‘words’
Bag-of-features models
Slides from Svetlana Lazebnik who borrowed from others
CSE152, Winter 2013 Intro Computer Vision
Bag-of-features models
CSE152, Winter 2013 Intro Computer Vision
Origin 1: Texture recognition • Texture is characterized by the repetition of basic
elements or textons • For stochastic textures, it is the identity of the
textons, not their spatial arrangement, that matters