Top Banner
Environmental Data Analysis with MatLab Lecture 15: Factor Analysis
65
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

Environmental Data Analysis with MatLab

Lecture 15:

Factor Analysis

Page 2: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

Lecture 01 Using MatLabLecture 02 Looking At DataLecture 03 Probability and Measurement Error Lecture 04 Multivariate DistributionsLecture 05 Linear ModelsLecture 06 The Principle of Least SquaresLecture 07 Prior InformationLecture 08 Solving Generalized Least Squares ProblemsLecture 09 Fourier SeriesLecture 10 Complex Fourier SeriesLecture 11 Lessons Learned from the Fourier TransformLecture 12 Power Spectral DensityLecture 13 Filter Theory Lecture 14 Applications of Filters Lecture 15 Factor Analysis Lecture 16 Orthogonal functions Lecture 17 Covariance and AutocorrelationLecture 18 Cross-correlationLecture 19 Smoothing, Correlation and SpectraLecture 20 Coherence; Tapering and Spectral Analysis Lecture 21 InterpolationLecture 22 Hypothesis testing Lecture 23 Hypothesis Testing continued; F-TestsLecture 24 Confidence Limits of Spectra, Bootstraps

SYLLABUS

Page 3: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

purpose of the lecture

introduce

Factor Analysis

a method of detecting patterns in data

Page 4: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

source A

ocean

sediment

source B

s4s2 s3s1 s5

example:

sediment samples are a mix of several sources

Page 5: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

e1e2e3e4e5

e1e2e3e4e5

s1 s2

ocean

sediment

what does the composition of the samples

tell you about the composition of the sources?

Page 6: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

another example

Atlantic Rock Datasetchemical composition for several thousand rocks

Page 7: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

Rocks are a mix of minerals, and …

mineral 1mineral 2mineral 3

rock 1 rock 2rock 3

rock 4

rock 5 rock 6 rock 7

…minerals have a well-defined composition

Page 8: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

Which simpler?

rocks have a chemical composition

or

rocks contain minerals

and

minerals have chemical compositions

Page 9: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

answer will depend on how many minerals are involved

and how many elements are in each mineral

Page 10: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

representing mixing with matrices

Page 11: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

the sample matrix, SN samples by M elements

e.g.sediment samples

rock samples

word element is used in the abstract sense and may not refer to actual chemical elements

Page 12: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

the factor matrix, FP factors by M elements

e.g.sediment sources

minerals

note that there are P factorsa simplification if P<M

Page 13: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

the loading matrix, CN samples by P factors

specifies the mix of factors for each sample

Page 14: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

summary

samples contain factors

factors contain elements

Page 15: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

an important issue

how many factors are needed to represent the samples?

need at most P=Mbut is P < M ?

Page 16: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

simple example using ternary diagrams

Page 17: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

samples

element

element element B

Page 18: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

samples

element

element element B

line of samples implies only 2 factors, so P=2

Page 19: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

factorssamples

element

element element B

Page 20: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

A) B)factor, f’2

factor, f’1

factor, f1

factor, f2

data do not uniquely determine factors

two bracketing factors most typical factor and deviation from it

Page 21: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

mathematically

S = CF = C’ F’with F’ = M F and C’ = C M-1 where M is any P×P matrix with an inverse

must rely on prior information to choose M

Page 22: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

a method to determine

the minimum number of factors, Pand

one possible set of factors

Page 23: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

a digression, but an important one

suppose that we have an N×N square matrix, Mand we experiment with it by multiplying “input”

vectors, v, by it to create “output” vectors, ww = Mv

Page 24: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

surprisingly, the answer to the question

when is the output parallel to the input ?

tells us everything about the matrix

Page 25: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

if w is parallel to vthenw = λ v

where λ is a proportionality factor

the equationw = Mv is then λ v = Mv or (M - λ I)v=0

Page 26: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

but if (M - λ I)v=0then it would seem that

v = (M - λ I)-10 = 0 which is not a very interesting solutionw is parallel to v when v is zero

Page 27: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

to make an interesting solution you must choose λ so that

(M - λ I)-1 doesn’t exist

which is equivalent to choosing λ so that

det(M - λ I)=0

Page 28: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

to make an interesting solution you must choose λ so that

(M - λ I)-1 doesn’t exist

which is equivalent to choosing λ so that

det(M - λ I)=0

since a matrix with zero

determinant has no inverse

Page 29: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

in the 2×2 case …

this is a quadratic equation in λand so has two solutionsλ1 and λ 2

Page 30: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

in the N×N case

det(M - λ I)=0

is an N-order polynomial equationand so has N solutionsλ1, λ 2 , … λ N

each corresponds to a different vv(1), v(2), … v(N)

Page 31: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

in the N×N case

det(M - λ I)=0

is an N-order polynomial equationand so has N solutionsλ1, λ 2 , … λ N

each corresponds to a different vv(1), v(2), … v(N)“eigenvalues”

“eigenvectors”

Page 32: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

N×N matrix, Mw = Mv when is the output parallel to the input ?

N different cases

Mv(1) = λ1v(1) Mv(2) = λ2v(2) …Mv(N) = λNv(N)

Page 33: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

Mv(1) = λ1v(1) Mv(2) = λ2v(2) …Mv(N) = λNv(N) simplify notationMV = V Λ

Page 34: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

In the text its shown thatif M is symmetric

then

all λ’s are real

v’s are orthonormal

v(i)T v(j) = 1 if i=j0 if i ≠ j

Page 35: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

In the text its shown thatif M is symmetric

then

all λ’s are real

v’s are orthonormal

v(i)T v(j) = 1 if i=j0 if i ≠ j

implies VTV = VVT= I

Page 36: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

MV = V Λpost-multiply by VT

M = V Λ VT

M can be constructed from V and Λso

when is the output parallel to the input ?

tells you everything about M

Page 37: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

now here’s what this has to do with factors

Page 38: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

suppose S is square and symmetricthen

S = CF = V Λ VT

Page 39: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

suppose S is square and symmetricthen

S = CF = V Λ VTC F

Page 40: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

suppose S is square and symmetricthen

S = CF = V Λ VTC F

S can be represented by M mutually-perpendicular factors, F

Page 41: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

furthermore, suppose that only P eigvenvalues are nonzero

the eigenvectors with zero eigenvalues can be thrown out of the equation

Page 42: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

we can reduce the number of factors from M to P

S = CF = VP ΛP VPTC F

S can be represented by P mutually-perpendicular factors, FP

Page 43: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

unfortunately …

Sis usually neither square nor symmetric

so a patch in the methodology is needed

Page 44: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

the trick …

STSis an M×M square matrix

Page 45: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

suppose

STShas eigenvalues ΛP and eigenvectors VP

Page 46: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

STS written in terms of its eigenvalues and eigenvectors

Page 47: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

STS written in terms of its eigenvalues and eigenvectors

write ΛP as product of its square roots

Page 48: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

STS written in terms of its eigenvalues and eigenvectors

write ΛP as product of its square roots insert identity matrix, I

Page 49: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

STS written in terms of its eigenvalues and eigenvectors

write ΛP as product of its square roots

write I = UpTUp, with Up as yet unknown

insert identity matrix, I

Page 50: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

STS written in terms of its eigenvalues and eigenvectors

write ΛP as product of its square roots

write I = UpTUp, with Up as yet unknown

insert identity matrix, I

group and write first group as transpose of transpose

Page 51: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

STS written in terms of its eigenvalues and eigenvectors

write ΛP as product of its square roots

write I = UpTUp, with Up as yet unknown

insert identity matrix, I

group and write first group as transpose of transpose

compare

Page 52: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

so

Page 53: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

and

so

Page 54: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

and

so

called the “singular value decomposition” of S

now the non-square, non-symmetric matrix, S, is represented as a mix of P

mutually perpendicular factors

called the “singular values”

Page 55: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

the matrix of loadings, C.

the matrix of factors, F

since C depends on Σ,the samples contains more of the factors with large singular values than of the factors with

the small singular values

Page 56: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

in MatLab

svd() computes all M factors(you must decide how many to use)

Page 57: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

1 2 3 4 5 6 7 80

1000

2000

3000

4000

5000singular values, s(i)

index, i

s(i)

sing

ular

val

ues,

Sii

index, i

singular values of the Atlantic Rock dataset(sorted into order of size)

Page 58: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

1 2 3 4 5 6 7 80

1000

2000

3000

4000

5000singular values, s(i)

index, i

s(i)

sing

ular

val

ues,

Sii

index, i

singular values of the Atlantic Rock dataset(sorted into order of size)

discard, since close to zero

Page 59: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

factors of the Atlantic Rock dataset

Page 60: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

factor of the Atlantic Rock dataset

factor 1 is the “typical factor”

Page 61: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

factor of the Atlantic Rock dataset

factor 2 as MgO increases, Al2O3 and CaO decreases

Page 62: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

factor of the Atlantic Rock dataset

factor 3: as Al2O3 increases, FeO and CaO increase

Page 63: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

f2 f3 f4 f5

f2p f3p f4p f5p

graphical representation of factors 2 through 5

f5f2 f3 f4

SiO2

TiO2

Al2O3

FeOtotal

MgO

CaO

Na2O

K2O

Page 64: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

C2

C3

C4

factor loadings C2 through C4 plotted in 3D

factors 2 through 4 capture most of the variability of the rocks

Page 65: Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

Al203

Ti02Al203

Si02

K20

Fe0

Mg0

Al203

A) B)

C) D)