Apr 10, 2018

8/8/2019 3 - Feature Extraction

1/22

1

1

FEATURE EXTRACTION ANDSELECTION METHODS

2

The task of the feature extraction andselection methods is to obtain the mostr e l ev a n t i n f o rm a t i o n from theoriginal data and represent thatinformation in a l o w e r d i m e n si o n a li t y space.

Feature extraction and selection methods

8/8/2019 3 - Feature Extraction

2/22

2

3

When the cost of the acquisition andmanipulation of all the measurements is highwe must make a selection of features.

The goal is to select, among all the availablefeatures, those that will perform better.

Example: which features should be used forclassifying a student as a good or bad one:

Available features: marks, height, sex, weight, IQ. Feature selection would choose marks and IQ and

would discard height, weight and sex.

We have to choose P variables in a set of Mvariables so that the separability is maximal.

Selection methods

4

The goal is to build, using the availablefeatures, those that will perform better.

Example: which features should be used for

classifying a student as a good or bad one: Available features: marks, height, sex, weight, IQ.

Feature extraction may choose marks + IQ2 as thebest feature (in fact, it is a combination of twofeatures).

The goal is to transform the origin space X ina new space Y to obtain new features thatwork better. This way, we can compress theinformation.

Extraction methods

8/8/2019 3 - Feature Extraction

3/22

3

5

PCA = Karhunen-Loeve transform = Hotelling transform

PCA is the most popular feature extraction method

PCA is a linear transformation

PCA is used in face recognition systems based on appearance

Principal Component Analysis

6

PCA has been successfully applied to human facerecognition.

PCA consists on a transformation from a space of

high dimension to another with more reduceddimension.

If the data are highly correlated, there is redundantinformation.

PCA decreases the amount of redundant information bydecorrelating the input vectors.

The input vectors, with high dimension and correlated, canbe represented in a lower dimension space anddecorrelated.

PCA is a powerful tool to compress data.

Principal Component Analysis

8/8/2019 3 - Feature Extraction

4/22

4

7

PCA by Maximizing Variance (I)

We will derive PCA by maximizing the variance in the direction ofprincipal vectors.

Let us suppose that we have N M-dimensional vectors xj aligned in thedata matrix X.

Let u be a direction (a vector of lenght 1). The projection of the j-thvector xj onto the vector u can be calculated in the following way:

M

N = examples

dimension =

=

==M

i

ijij

T

j xuxup1

rr

8

PCA by Maximizing Variance (II)

We want to find a direction u that maximizes the variance of theprojections of all input vectors xj,j=1,..N.

The function to maximize is:

uCuppN

puJT

N

j

jj

PCA rrr====

=

...)(1

)()(1

22

Using the technique of Lagrange multipliers, the solutionto this maximization problem is to compute theeigenvectors and the eigenvalues of the covariancematrix C.

where C is the covariance matrix of the data matrix X.

MORE INFO in PCA.pdf

[ ]Tm

xN

T XXXXN

C

,...,

11

1

1

=

==

8/8/2019 3 - Feature Extraction

5/22

5

9

PCA by Maximizing Variance (III)

The largest eigenvalue equals the maximal variance, while thecorresponding eigenvector determines the direction with themaximal variance.

By performing singular value decomposition (SVD) of thecovariance matrix C we can diagonalize C:

TUUC =

in such a way that the orthonormal matrix U contains theeigenvectors u1, u2,.. uN in its columns and the diagonal matrix contains the eigenvalues 1, 2,.. N on its diagonal.The eigenvalues and the eigenvectors are arranged with respectto the descending order of the eigenvalues, thus 1 2 .. N.Therefore, the most of the variability of the input randomvectors is contained in the first eigenvectors. Hence, theeigenvectors are called principal vectors.

10

Computing PCA

Steps to compute the PCA transformationof a data matrix X:

Center the data

Compute the covariance matrix

Obtain the eigenvectors and eigenvalues

of the covariance matr ix

Project the original data in theeigenspace

Matlab code:

%number of examples

N=size(X,2);

%dimension of each example

M=size(X,1);

%mean

meanX=mean(X,2);

%centering the data

Xm=X-meanX*ones(1,N);

%covariance matrix

C=(Xm*Xm')/N;

%computing the eigenspace:

[U D]=eig(C);

%projecting the centered data

over the eigenspace

P=U'*Xm;

XC

,U

XUP T =

U can be used as a linear transformationto project the original data of highdimension into a space of lowerdimension.

8/8/2019 3 - Feature Extraction

6/22

6

11

PCA of a bidimensional dataset

-1.5 -1 -0.5 0 0.5 1 1.5 2 2.5-1.5

-1

-0.5

0

0.5

1

1.5

2

original

centereduncorrelated

12

Computing PCA of a set of images

This approach to the calculation of principal vectors is veryclear and widely used. However, if the size of the data vectorM is very large, which is often the case in the field ofcomputer vision, the covariance matrix C becomes very largeand eigenvalue decomposition ofC becomes unfeasible.

But, if the number of input vectors is smaller than the size ofthese vectors (N

8/8/2019 3 - Feature Extraction

7/22

7

13

Face recognition using PCA (I)

Eigenfaces for Recognition, Turk, M. & Pentland, A. ,

Journal of Cognitive Neuroscience, 3, 71-86, 1991.

14

LDA = Fisher analysis

LDA is a linear transformation

LDA is also used in face recognition

LDA seeks directions that are efficient fordiscrimination between classes

In PCA, the subspace defined by the vectors is the onethat better describes the conjunct of data.

LDA tries to discriminate between the different classesof data.

Linear Discriminant Analysis (I)

8/8/2019 3 - Feature Extraction

8/22

8

15

We have a conjunct of N vectors of dimension M in the datamatrix MxN.

We have C classes and k vectors per class. We want to find the transformation matrix W that better

describes the subspace that discriminates between classes, afterprojecting the data in the new space.

The objective is to make maximum the distance betweenclasses Sb and minimizing Sw.

Linear Discriminant Analysis (I)

XWP = are the eigenvectors ofW

C

16

Linear Discriminant Analysis (II)

c lass 1c lass 1

c lass 2 c lass 2

The figure shows the effect of LDA transform ina conjunct of data composed of 2 classes.

8/8/2019 3 - Feature Extraction

9/22

9

17

Linear Discriminant Analysis (III)

Limitations of LDA

LDA works better than PCA when the training data arewell representative of the data in the system.

If the data are not representative enough, PCA performsbetter.

18

Independent Component Analysis (I)

Independent Component Analysis

ICA is a statistical technique that represents amultidimensional random vector as a linear

combination of nongaussian random variables('independent components') that are as independentas possible.

ICA is somewhat similar to PCA.

ICA has many applications in data analysis, sourceseparation, and feature extraction.

8/8/2019 3 - Feature Extraction

10/22

10

19

ICA cocktail party problem Cocktail party problem

ICA is a statistical technique for decomposing a complexdataset into independent sub-parts. Here we show how itcan be applied tothe problem of separation of BlindSources.

)t(x1)t(x 2

)t(x4

)t(x3

)t(s1

)t(s2)t(s3

)t(s4

20

ICA cocktail party problem

Cocktail party problem

)t(si

)t(x iEstimate the sourcesfrom the mixed signals

8/8/2019 3 - Feature Extraction

11/22

11

21

ICA cocktail party problem

Linear model:

)t(sa)t(sa)t(sa)t(sa)t(x

)t(sa)t(sa)t(sa)t(sa)t(x

)t(sa)t(sa)t(sa)t(sa)t(x

)t(sa)t(sa)t(sa)t(sa)t(x

4443432421414

4343332321313

4243232221212

4143132121111

+++=

+++=

+++=

+++=

We can model the problem as X=AS S = 4D vector containing the independent source

signals.

A = mixing matrix.

X = Observed signals.

22

ICA cocktail party problem

Mixed signals

8/8/2019 3 - Feature Extraction

12/22

12

23

ICA cocktail party problem

Sources

24

ICA cocktail party problem

ICA: One possible solution is to assume that thesources are independent.

)s(p)s(p)s(p)s,,s,s(p n21n21 KK =

)t(si)t(x i

Estimate the sourcesfrom the mixed signals

8/8/2019 3 - Feature Extraction

13/22

13

25

ICA cocktail party problem

SAX =

SourcesMixed signals Mixing matrix

MODEL ICA

26

ICA cocktail party problem

XWS =

Mixed

signals

Sources Separation matrix

ESTIMATING THE SOURCES

ASXAW

Welcome message from author

This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Related Documents