Top Banner
Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)
41

Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)

Dec 26, 2015

Download

Documents

Simon Lawson
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)

Dimensionality Reduction:

Principal Components Analysis

Optional Reading:

Smith, A Tutorial on Principal Components Analysis (linked to class webpage)

Page 3: Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)

Data

x2

x1

Page 4: Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)

Data

x2

x1

First principal component

Gives direction of largest variation of the data

Page 5: Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)

Data

x2

x1

Second principal component

Gives direction of secondlargest variation

First principal component

Gives direction of largest variation of the data

Page 6: Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)

Rotation of Axes

x2

x1

Page 7: Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)

Dimensionality reduction

x2

x1

Page 8: Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)

Classification (on reduced dimensionality space)

+−

x2

x1

Page 9: Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)

Classification (on reduced dimensionality space)

+−

x2

x1

Note: Can be used for labeled or unlabeled data.

Page 10: Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)

Principal Components Analysis (PCA)

• Summary: PCA finds new orthogonal axes in directions of largest variation in data.

• PCA used to create high-level features in order to improve classification and reduce dimensions of data without much loss of information.

• Used in machine learning and in signal processing and image compression (among other things).

Page 11: Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)

• Suppose attributes are A1 and A2, and we have n training examples. x’s denote values of A1 and y’s denote values of A2 over the training examples.

• Variance of an attribute:

Background for PCA

)1(

)()var( 1

2

1

n

xxA

n

ii

Page 12: Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)

• Covariance of two attributes:

• If covariance is positive, both dimensions increase together. If negative, as one increases, the other decreases. Zero: independent of each other.

)1(

))((),cov( 1

21

n

yyxxAA

n

iii

Page 13: Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)

• Covariance matrix

– Suppose we have n attributes, A1, ..., An.

– Covariance matrix:

),cov( where),( ,, jijijinn AAccC

Page 14: Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)

Covariance matrix

3705.104

5.1047.47

)var(5.104

5.104)var(

),cov(),cov(

),cov(),cov(

M

H

MMHM

MHHH

Page 15: Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)

• Eigenvectors: – Let M be an nn matrix.

• v is an eigenvector of M if M v = v• is called the eigenvalue associated with v

– For any eigenvector v of M and scalar a,

– Thus you can always choose eigenvectors of length 1:

– If M is symmetric with real entries, it has n eigenvectors, and they are orthogonal to one another.

– Thus eigenvectors can be used as a new basis for a n-dimensional vector space.

vvM aa

1... 221 nvv

Review of Matrix Algebra

Page 16: Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)

Principal Components Analysis (PCA)

1. Given original data set S = {x1, ..., xk}, produce new set by subtracting the mean of attribute Ai from each xi.

Mean: 1.81 1.91 Mean: 0 0

Page 17: Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)
Page 18: Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)

2. Calculate the covariance matrix:

3. Calculate the (unit) eigenvectors and eigenvalues of the covariance matrix:

x y

xy

Page 19: Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)

Eigenvector with largesteigenvalue traces linear pattern in data

Page 20: Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)

4. Order eigenvectors by eigenvalue, highest to lowest.

In general, you get n components. To reduce dimensionality to p, ignore np components at the bottom of the list.

0490833989.677873399.

735178956.

28402771.1735178956.

677873399.

2

1

v

v

Page 21: Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)

Construct new “feature vector” (assuming vi is a column vector).

Feature vector = (v1, v2, ...vp)

V1V2

Page 22: Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)

5. Derive the new data set.

TransformedData = RowFeatureVector RowDataAdjust

where RowDataAdjust = transpose of mean-adjusted data

This gives original data in terms of chosen components (eigenvectors)—that is, along these axes.

735178956.677873399.2

677873399.735178956.

735178956.677873399.1

VectorRowFeature

VectorRowFeature

01.131.81.31.79.09.129.99.21.149.

71.31.81.19.49.29.109.39.31.169.ustRowDataAdj

Page 23: Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)
Page 24: Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)

Intuition: We projected the data onto new axes that capturesthe strongest linear trends in the data set. Each transformed data point tells us how far it is above or below those trend lines.

Page 25: Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)
Page 26: Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)

Reconstructing the original data

We did:

TransformedData = RowFeatureVector RowDataAdjust

so we can do

RowDataAdjust = RowFeatureVector -1 TransformedData

= RowFeatureVector T TransformedData

and

RowDataOriginal = RowDataAdjust + OriginalMean

Page 27: Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)
Page 28: Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)

Textbook’s notation• We have original data X and mean-subtracted data B, and covariance matrix C =

cov(B), where C is an N×N matrix:

• We find matrix V such that the columns of V are the N eigenvectors of C and

where λi is the ith eigenvalue of C.

• Each eigenvalue in D corresponds to an eigenvector in V. The eigenvectors, sorted in order of decreasing eigenvalue, become the “feature vector” for PCA.

Page 29: Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)

• With new data, compute TransformedData = RowFeatureVector RowDataAdjust

where RowDataAdjust = transpose of mean-adjusted data

Page 30: Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)

What you need to remember• General idea of what PCA does

– Finds new, rotated set of orthogonal axes that capture directions of largest variation

– Allows some axes to be dropped, so data can be represented in lower-dimensional space.

– This can improve classification performance and avoid overfitting due to large number of dimensions.

Page 31: Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)

Example: Linear discrimination using PCA for face recognition (“Eigenfaces”)

1. Preprocessing: “Normalize” faces

• Make images the same size

• Line up with respect to eyes

• Normalize intensities

Page 32: Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)
Page 33: Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)

2. Raw features are pixel intensity values (2061 features)

3. Each image is encoded as a vector i of these features

4. Compute “mean” face in training set:

M

iiM 1

1

Page 34: Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)

• Subtract the mean face from each face vector

• Compute the covariance matrix C

• Compute the (unit) eigenvectors vi of C

• Keep only the first K principal components (eigenvectors)

ii

From W. Zhao et al., Discriminant analysis of principal components for face recognition.

Page 35: Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)
Page 36: Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)

The eigenfaces encode the principal sources of variation in the dataset (e.g., absence/presence of facial hair, skin tone, glasses, etc.).

We can represent any face as a linear combination of these“basis” faces.

Use this representation for:• Face recognition

(e.g., Euclidean distance from known faces)

• Linear discrimination (e.g., “glasses” versus “no glasses”, or “male” versus “female”)

Interpreting and Using Eigenfaces

Page 37: Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)

Eigenfaces Demo

• http://demonstrations.wolfram.com/FaceRecognitionUsingTheEigenfaceAlgorithm/

Page 38: Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)

Kernel PCA

• PCA: Assumes direction of variation are all straight lines

• Kernel PCA: Maps data to higher dimensional space,

Page 39: Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)

From Wikipedia

Original data Data after kernel PCA

Page 40: Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)

Kernel PCA

• Use Φ(x) and kernel matrix Kij = Φ(xi) Φ(xj) to compute PCA transform.

(Optional: See 10.2.2 in textbook, though it might be a bit confusing. Also see “Kernel Principal Components Analysis” by Scholkopf et al., linked to the class website ).

Page 41: Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)

Kernel Eigenfaces

(Yang et al., Face Recognition Using Kernel Eigenfaces, 2000)

Training data: ~ 400 images, 40 subjects

Original features: 644 pixel gray-scale values.

Transform data using kernel PCA, reduce dimensionality to number ofcomponents giving lowest error.

Test: new photo of one of the subjects

Recognition done using nearest neighbor classification