Reduces time complexity: Less computation Reduces space complexity: Less parameters Simpler models are more robust on small datasets More interpretable;
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
• Reduces time complexity: Less computation• Reduces space complexity: Less parameters• Simpler models are more robust on small datasets• More interpretable; simpler explanation• Data visualization (beyond 2 attributes, it gets
Diagonal elements are s2 of individual attributes.
Off diagonals describe how fluctuations in one attribute affect
fluctuations in another.
TE μμ xx
dx1 1xddxd
221
22221
11221
ddd
d
d
Dividing off-diagonal elements by the product of variances, gives “correlation coefficients”
Correlation among attributes makes it difficult to say how any one attribute contributes to an effect.
1, ji
ijijji xx
Corr
Consider a linear transformation of attributes z = Mx where M is a dxd matrix. The d features z will also be normally distributed (proof later).
A choice of M that results in a diagonal covariance matrix in feature-space has the following advantages: 1. Interpretation of uncorrelated features is easier2. Total variance of features is the sum of diagonal elements
Diagonalization of the covariance matrix:
The transformation z = Mx that leads to a diagonal feature-space covariance has M = WT where the columns of W are the eigenvectors of the covariance matrix .S
The collection of eigenvalue equations Swk = lkwk
can be written as SW = WD where D = diag(l1...ld) and W is formed by column vectors [w1 ... wd].
WT= W-1 so WTSW = W-1WD = D
If we arrange the eigenvectors so that eigenvalues l1...ld are in decreasing order of magnitude, then zi = wi
Tx, i = 1…k < d are the “principle components”
Proportion of Variance (PoV) explained by k principal components (λi sorted in descending order) is
If S has 2 distinct eigenvalues, define 2nd principal componentby max Var(z2), such that ||w2||=1 and orthogonal to w1
Introduce Lagrange multipliers a and b
01 122222 wwwwww TTTL
Set gradient of L with respect to w2 to zero2Sw2 – 2aw2 – bw1 = 0Choose b = 0 and a = l2 get Sw2 = l2w2
To maximize Var(z2) chose l2 as the second largest
eigenvalue
For any dxd matrix M, z=MTx is a linear transformation of attributes x that defines features z
If attributes x are normally distributed with mean m and covariance S, then z is normally distributed with mean MTm and covariance MTSM. (proof slide 8)
If M = W, a matrix with columns that are the normalized eigenvectors of S, then the covariance of z is diagonal with elements equal to the eigenvalues of S (proof slide 6)
Arrange the eigenvalues in decreasing order of magnitude and find l1...lk that account for most (e.g. 90%) of the total variance, then zi = wi
Tx, are the “principle components”
16
Review
MatLab’s [V,D] = eig(A) returns both eigenvectors (columns of V) and eigenvalues D in increasing order.
1-34 cancer>35 controlSamples from cancer patients cluster
Scatter plot of PCs 1 and 2
Assignment 5 due 10-30-15
Find the accuracy of a model that classifies all 6 types of beer bottles in glassdata.csv by multivariate linear regression. Find the eigenvalues and eigenvectors of the covariance matrix for the full beer-bottle data set. How many eigenvalues are required to capture more than the 90% of the variance? Transform the attribute data by the eigenvectors of the 3 largest eigenvalues. What is the accuracy of a linear model that uses these features.
Plot the accuracy when you successively extent the linear model by including z1