Top Banner
PCA vs ICA vs LDA
28

PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.

Jan 19, 2016

Download

Documents

Edmund Boyd
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.

PCA vs ICA vs LDA

Page 2: PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.

How to represent images?• Why representation methods are needed??

– Curse of dimensionality – width x height x channels– Noise reduction– Signal analysis & Visualization

• Representation methods– Representation in frequency domain : linear transform

• DFT, DCT, DST, DWT, …• Used as compression methods

– Subspace derivation• PCA, ICA, LDA• Linear transform derived from training data

– Feature extraction methods• Edge(Line) Detection • Feature map obtained by filtering• Gabor transform• Active contours (Snakes)• …

Page 3: PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.

1 1 2 2

1 2

ˆ ...

where , ,..., is a base in the -dimensionalsub-space (K<N)K K

K

x b u b u b u

u u u K

x̂ x

1 1 2 2

1 2

...

where , ,..., is a base in the original N-dimensionalspaceN N

n

x a v a v a v

v v v

• Find a basis in a low dimensional sub-space:

− Approximate vectors by projecting them in a low dimensional

sub-space:

(1) Original space representation:

(2) Lower-dimensional sub-space representation:

• Note: if K=N, then

What is subspace? (1/2)

Page 4: PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.

What is subspace? (2/2)

• Example (K=N):

Page 5: PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.

PRINCIPAL COMPONENT ANALYSIS (PCA)

Page 6: PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.

Why Principal Component Analysis?

• Motive– Find bases which has high variance in data– Encode data with small number of bases with low MSE

Page 7: PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.

Derivation of PCs

[ ]E x 0 T Ta x q q x

2 2 2 2[ ] [ ] [ ]

[( )( )] [ ]T T T T T

E a E a E a

E E

q x x q q xx q q Rq

Rq q

1 2 1 2, [ , ,..., ,..., ], [ , ,..., ,..., ]

1, 2,...,

Tj m j m

j j j

diag

j m

R QΛQ Q q q q q Λ

Rq q

Find q’s maximizing this!!

12|| || ( ) 1T q q q

Principal component q can be obtainedby Eigenvector decomposition such as SVD!

Assume that

Page 8: PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.

0

5

10

15

20

25

PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10

Vari

an

ce (

%)

Dimensionality Reduction (1/2)

Can ignore the components of less significance.

You do lose some information, but if the eigenvalues are small, you don’t lose much– n dimensions in original data – calculate n eigenvectors and eigenvalues– choose only the first p eigenvectors, based on their eigenvalues– final data set has only p dimensions

Page 9: PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.

Dimensionality Reduction (2/2)

Variance

Dimensionality

Page 10: PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.

Reconstruction from PCs

q=1 q=2 q=4 q=8

q=16 q=32 q=64 q=100…Original Image

Page 11: PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.

LINEAR DISCRIMINANT ANALYSIS (LDA)

Page 12: PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.

Limitations of PCA

Are the maximal variance dimensions the relevant dimensions

for preservation?

Page 13: PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.

Linear Discriminant Analysis (1/6)

• What is the goal of LDA?

− Perform dimensionality reduction “while preserving as much of the class discriminatory information as possible”.

− Seeks to find directions along which the classes are best separated.− Takes into consideration the scatter within-classes but also the

scatter between-classes.− For example of face recognition, more capable of distinguishing

image variation due to identity from variation due to other sources such as illumination and expression.

Page 14: PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.

Linear Discriminant Analysis (2/6)

1 1

1

( )( )

( )( )

incT

w j i j ii j

cT

b i ii

S Y M Y M

S M M M M

Within-class scatter matrix

Between-class scatter matrix

TUy xprojection matrix

− LDA computes a transformation that maximizes the between-class scatter while minimizing the within-class scatter:

| | | |max max

| || |

Tb b

Tww

S U S U

U S US

products of eigenvalues !

,b wS S : scatter matrices of the projected data y

1w

TbS S U U

Page 15: PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.

Linear Discriminant Analysis (3/6)

− c.f. Since Sb has at most rank C-1, the max number of eigenvectors with non-zero eigenvalues is C-1 (i.e., max dimensionality of sub-space is C-1)

• Does Sw-1 always exist?

− If Sw is non-singular, we can obtain a conventional eigenvalue problem by writing:

− In practice, Sw is often singular since the data are image vectors with large dimensionality while the size of the data set is much smaller (M << N )

1w

TbS S U U

Page 16: PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.

Linear Discriminant Analysis (4/6)

• Does Sw-1 always exist? – cont.

− To alleviate this problem, we can use PCA first:

1) PCA is first applied to the data set to reduce its dimensionality.

2) LDA is then applied to find the most discriminative directions:

Page 17: PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.

Linear Discriminant Analysis (5/6)

D. Swets, J. Weng, "Using Discriminant Eigenfeatures for Image Retrieval", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, no. 8, pp. 831-836, 1996

PCA LDA

Page 18: PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.

Linear Discriminant Analysis (6/6)• Factors unrelated to classification

− MEF vectors show the tendency of PCA to capture major variations in the training set such as lighting direction.

− MDF vectors discount those factors unrelated to classification.

D. Swets, J. Weng, "Using Discriminant Eigenfeatures for Image Retrieval", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, no. 8, pp. 831-836, 1996

Page 19: PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.

INDEPENDENT COMPONENT ANALYSIS

Page 20: PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.

PCA vs ICA• PCA

– Focus on uncorrelated and Gaussian components

– Second-order statistics

– Orthogonal transformation

• ICA– Focus on independent and non-Gaussian components

– Higher-order statistics

– Non-orthogonal transformation

Page 21: PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.
Page 22: PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.

Independent Component Analysis (1/5)

• Concept of ICA– A given signal(x) is generated by linear mixing(A) of independent

components(s)

– ICA is a statistical analysis method to estimate those independent components(z) and Mixing rule(W)

Aijs1

s2

s3

sM s A x

x1

x2

x3

xM

Wij

W z

z1

z2

z3

zM

WAsWxz

We do not knowBoth unknowns Some optimizationFunction is required!!

Page 23: PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.

Independent Component Analysis (2/5)

AW

UAXUXW

1

Page 24: PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.

Independent Component Analysis(3/5)

• What is independent component??– If one variable can not be estimated from other

variables, it is independent.– By Central Limit Theorem, a sum of two

independent random variables is more gaussian than original variables distribution of independent components are nongaussian

– To estimate ICs, z should have nongaussian distribution, i.e. we should maximize nonguassianity.

Page 25: PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.

Independent Component Analysis(4/5)

• What is nongaussianity?– Supergaussian– Subgaussian– Low entropy

Gaussian Supergaussian Subgaussian

Page 26: PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.

Independent Component Analysis(5/5)

• Measuring nongaussianity by Kurtosis– Kurtosis : 4th order cumulant of randomvariable

• If kurt(z) is zero, gaussian• If kurt(z) is positive, supergaussian• If kurt(z) is negative, subgaussian

• Maximzation of |kurt(z)| by gradient method

224 }){(3}{)( zEzEzkurt

]3})({))[((4|)(| 23 ||w||wxwxxw

w

xw

TT

T

Ekurtsignkurt

||w||w/w

x)x(wxww TT

}{))(( 3Ekurtsign

wx)x(ww T 3}{ 3 E

Fast-fixed point algorithm

Simply changeThe norm of w

Page 27: PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.

PCA vs LDA vs ICA

• PCA : Proper to dimension reduction

• LDA : Proper to pattern classification if the number of training samples of each class are large

• ICA : Proper to blind source separation or classification using ICs when class id of training data is not available

Page 28: PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.

References

• Simon Haykin, “Neural Networks – A Comprehensive Foundation- 2nd Edition,” Prentice Hall

• Marian Stewart Bartlett, “Face Image Analysis by Unsupervised Learning,” Kluwer academic publishers

• A. Hyvärinen, J. Karhunen and E. Oja, “Independent Component Analysis,”, John Willy & Sons, Inc.

• D. L. Swets and J. Weng, “Using Discriminant Eigenfeatures for Image Retrieval”, IEEE Trasaction on Pattern Analysis and and Machine Intelligence, Vol. 18, No. 8, August 1996