Ch 12. continuous latent Ch 12. continuous latent variables variables Pattern Recognition and Machine Learning, Pattern Recognition and Machine Learning, C. M. Bishop, 2006. C. M. Bishop, 2006. Summarized by Soo-Jin Kim Biointelligence Laboratory, Seoul National Univ ersity http://bi.snu.ac.kr/
14
Embed
Ch 12. continuous latent variables Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Summarized by Soo-Jin Kim Biointelligence Laboratory,
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Pattern Recognition and Machine Learning, Pattern Recognition and Machine Learning, C. M. Bishop, 2006.C. M. Bishop, 2006.
Summarized by Soo-Jin Kim
Biointelligence Laboratory, Seoul National University
http://bi.snu.ac.kr/
2 (C) 2007, SNU Biointelligence La
b, http://bi.snu.ac.kr/
ContentsContents
12.1 Principle Component Analysis 12.1.1 Maximum variance formulation 12.1.2 Minimum-error formulation 12.1.3 Application of PCA 12.1.4 PCA for high-dimensional data
3 (C) 2007, SNU Biointelligence La
b, http://bi.snu.ac.kr/
12.1 Principal Component Analysis12.1 Principal Component Analysis
Principal Component Analysis (PCA) PCA is used for applications such as dimensionality reduction, l
ossy data compression, feature extraction and data visualization. Also known as Karhunen-Loeve transform PCA can be defined as the orthogonal projection of the data ont
o a lower dimensional linear space, known as the principal subspace, such that the variance of the projected data is maximized.
Principal subspace
Orthogonal projection of the data points
The subspace maximizes the variance of the projected points
Minimizing the sum-of-squares of the projection errors
4 (C) 2007, SNU Biointelligence La
b, http://bi.snu.ac.kr/
12.1.1 Maximum variance formulation (1/2)12.1.1 Maximum variance formulation (1/2)
Consider data set of observations {xn} where n = 1,…,N, and xn is a Euclidean variable with dimensionality D. To project the data onto a space having dimensionality M<D w
hile maximizing the variance of the projected data. One-dimensional space (M=1)
Define the direction of this space using a D-dimensional vector u1
The mean of the projected data is where is the sample set mean given by
The variance of the projected data is given by
1Tu x x
1
1 N
nn
x xN
21 1 1 1
1
1{ }
NT T T
nn
u x u x u SuN
1
1( )( )
NT
n nn
S x x x xN
Maximize the
projected variance
S is the data covariance matrix
5 (C) 2007, SNU Biointelligence La
b, http://bi.snu.ac.kr/
12.1.1 Maximum variance formulation (2/2)12.1.1 Maximum variance formulation (2/2)
Lagrange multiplier Make an unconstrained maximization of
(u1 must be an eigenvector of S)
The variance will be maximum when we set u1 equal to the eigenvector having the largest eigenvalue λ1.
PCA involves evaluating the mean x and the covariance matrix S of the data set and then finding the M eigenvectors of S corresponding
to the M largest eigenvalues. The cost of computing the full eigenvector decomposition for a matrix of size D x D is
O(D3)
1 1 1 1 1(1 )T Tu Su u u
1 1 1Su u
1 1 1Tu Su (this eigenvector is the first principal component)
Based on projection error minimization A complete orthogonal set of D-dimensional basis vectors {ui}
where i = 1,…,D that satisfy Each data point can be represented exactly by a linear combinati
on of the basis vectors
To approximate this data point using a representation involving a restricted number M<D of variables corresponding to projection onto a lower-dimensional subspace.
The case of a 2-dimensional data space D=2 and 1-dimensional principal subspace M=1 To choose a direction u2 so as to minimize Using a Lagrange multiplier λ2 to enforce the constraint
The minimum value of J by choosing u2 to be the eigenvector corresponding to the smaller of the two eigenvalues
choose the principal subspace to be aligned with the eigenvector having the larger eigenvalues (in order to minimize the average squared projection distance)