Top Banner
Principal Components Analysis
57

Principal Components Analysis

May 13, 2017

Download

Documents

winterdd89
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Principal Components Analysis

Principal Components Analysis

Page 2: Principal Components Analysis

Outline

• Data reduction• PCA vs. FA• Assumptions and other issues• Multivariate analysis in terms of eigenanalysis• PCA basics• Examples

Page 3: Principal Components Analysis

Ockham’s Razor1

• One of the hallmarks of good science is parsimony and elegance of theory

• However in analysis it is also often desirable to reduce data in some fashion in order to better get a better understanding of it or simply for ease of analysis

• In MR this was done more implicitly

• Reduce the predictors to a single composite▫ The sum of the weighted variables

• Note the correlation (Multiple R) and its square

Presenter
Presentation Notes
1. There are various spellings of Ockham, I tend to float among them.
Page 4: Principal Components Analysis

Principal Components Analysis• Conceptually the goal of PCA is to reduce the number of

variables of interest into a smaller set of components• PCA analyzes the all the variance in the in the variables

and reorganizes it into a new set of components equal to the number of original variables

• Regarding the new components:▫ They are independent▫ They decrease in the amount of variance in the originals they

account forFirst component captures most of the variance, 2nd second most and so on until all the variance is accounted for

▫ Only some will be retained for further study (dimension reduction)

Since the first few capture most of the variance they are typically of focus

Page 5: Principal Components Analysis

PCA vs. Factor Analysis• It is easy to make the mistake in assuming that

these are the same techniques, though in some ways exploratory factor analysis and PCA are similar, and in general both can be seen as factor analytic techniques

• However they are typically used for different reasons, are not mechanically the same, nor do they have the same underlying linear model

Page 6: Principal Components Analysis

PCA/FA• Principal Components Analysis

▫ Extracts all the factors underlying a set of variables

▫ The number of factors = the number of variables▫ Completely explains the variance in each variable

• Factor Analysis▫ Analyzes only the shared variance

Error is estimated apart from shared variance

Page 7: Principal Components Analysis

FA vs. PCA conceptually• FA produces factors; PCA produces components• Factors cause variables; components are aggregates of

the variables• The underlying causal model is fundamentally distinct

between the two▫ Some do not consider PCA as part of the FA family1

FA

I1 I3I2

PCA

I1 I3I2

Presenter
Presentation Notes
1. Just my opinion but to me the sign of a good multivariate text is one that does not lump PCA and FA together in one chapter (hence my reason for giving you the chapter I did), and a good stat package would not add to such confusion.
Page 8: Principal Components Analysis

Contrasting the underlying models• PCA▫ Extraction is the process of forming PCs as linear

combinations of the measured variables as we have done with our other techniques

PC1 = b11X1 + b21X2 + … + bk1XkPC2 = b12X1 + b22X2 + … + bk2XkPCf = b1fX1 + b2fX2 + … + bkfXk

• Common factor model▫ Each measure X has two contributing sources of

variation: the common factor ξ and the specific or unique factor δ:

X1 = λ1ξ + δ1X2 = λ2ξ + δ2Xf = λfξ + δf

Page 9: Principal Components Analysis

FA vs. PCA

• PCA▫ PCA is mathematically precise in orthogonalizing

dimensions▫ PCA redistributes all variance into orthogonal components▫ PCA uses all variable variance and treats it as true variance

• FA▫ FA distributes common variance into orthogonal factors▫ FA is conceptually realistic in identifying common factors▫ FA recognizes measurement error and true factor variance

Page 10: Principal Components Analysis

FA vs. PCA

• In some sense, PCA and FA are not so different conceptually than what we have been doing since multiple regression▫ Creating linear combinations▫ PCA especially falls more along the line of what we’ve

already been doing

• What we do have different from previous methods is that there is no IV/DV distinction▫ Just a single set of variables

Page 11: Principal Components Analysis

FA vs. PCA Summary• PCA goal is to analyze variance

and reduce the observed variables

• PCA reproduces the R matrix perfectly

• PCA – the goal is to extract as much variance with the fewest components

• PCA gives a unique solution

• FA analyzes covariance (communality)

• FA is a close approximation to the R matrix

• FA – the goal is to explain as much of the covariance with a minimum number of factors that are tied specifically to assumed constructs

• FA can give multiple solutions depending on the method and the estimates of communality

Page 12: Principal Components Analysis

Questions Regarding PCA

• Which components account for the most variance?

• How well does the component structure fit a given theory?

• What would each subject’s score be if they could be measured directly on the components?

• What is the percentage of variance in the data accounted for by the components?

Page 13: Principal Components Analysis

Assumptions/Issues

• Assumes reliable variables/correlations▫ Very much affected by missing data, outlying cases and

truncated data▫ Data screening methods (e.g. transformations, etc.) may

improve poor factor analytic results

• Normality▫ Univariate - normally distributed variables make the

solution stronger but not necessary if we are using the analysis in a purely descriptive manner

▫ Multivariate – is assumed when assessing the number of factors

Page 14: Principal Components Analysis

Assumptions/Issues• No outliers▫ Influence on correlations would bias results

• Variables as outliers▫ Some variables don’t work▫ Explain very little variance▫ Relates poorly with primary components▫ Low squared multiple correlations as DV with

other items as predictors▫ Low loadings

Page 15: Principal Components Analysis

Assumptions/Issues• Factorable R matrix

▫ Need inter-item/variable correlations > .30 or PCA/FA isn’t going to do much for you

▫ Large inter-item correlations do not guarantee a solution eitherWhile two variables may be highly correlated, they may not be correlated with others

▫ Matrix of partials adjusted for other variables, Kaiser’s measure of sampling adequacy can help assess.

Kaiser’s is a ratio of the sum of squared correlations to the sum of squared correlations plus sum of squared partial correlations

Approaches 1 if partials are small, and typically desire or about .6+

• Multicollinearity/Singularity▫ In traditional PCA it is not problem; no matrix inversion is necessary

As such it is a solution to dealing with collinearity in regression▫ Investigate tolerances, det(R)

Page 16: Principal Components Analysis

Assumptions/Issues• Sample Size and Missing Data

▫ True missing data are handled in the usual ways ▫ Factor analysis via Maximum Likelihood needs large samples and it is one of

the only drawbacks• The more reliable the correlations are, the smaller the number of

subjects needed• Need enough subjects for stable estimates

▫ How many?▫ Depends on the nature of the data and the number of parameters to be

estimatedFor example, a simple setting with few variables and clean data might not need as muchHaving several hundred data points for a more complex solution with messy data with lower correlations among the variables might not provide a meaningful result (PCA) or even converge upon a solution (FA)

Page 17: Principal Components Analysis

Other issues• No readily defined criteria by which to judge

outcome▫ Before we had R2 for example

• Choice of rotations is dependent entirely on researcher’s estimation of interpretability

• Often used when other outcomes/ analyses are not so hot, just to have something to talk about1

Presenter
Presentation Notes
1. I’m actually quite suspicious of an analysis that is simply PCA for psychological analysis of inventories (e.g. not doing a PCA to use the scores for something else). It suggests to me they really wanted to do factor analysis, which is preferable to psychologists dealing with a lot of measurement error, but that their data was so crappy they couldn’t get the maximum likelihood approach to work. Furthermore, with factor analysis itself most are wanting to confirm a model, but lack the appropriate data or sample size to pull it off, so they end up doing a regular (exploratory) FA, and then when that doesn’t converge a PCA.
Page 18: Principal Components Analysis

Extraction Methods

• PCA▫ Extracts maximum variance with each component▫ First component is a linear combination of variables

that maximizes component score variance for the cases▫ The second (etc.) extracts the max. variance from the

residual matrix left over after extracting the first component (therefore orthogonal to the first)

▫ If all components retained, all variance explained

Page 19: Principal Components Analysis

PCA• Components are linear combinations of variables.

▫ These combinations are based on weights (eigenvectors) developed by the analysis• As we will see later PCA is not much different than canonical

correlation in terms of generating canonical variates from linear combinations of variables▫ Although in PCA there are no “sides” of the equation, and you’re not necessarily

correlating the “factors”, “components”, “variates”, etc.• The loading for each item/variable is the correlation between it and

the component (i.e., the underlying shared variance)• However, unlike many of the analyses you are exposed to there is no

statistical criterion to compare the linear combination to▫ In MANOVA we create linear combinations that maximally differentiate groups▫ In Canonical correlation one linear combination is used to maximally correlate

with another▫ PCA is a form of ‘unsupervised’ learning

Page 20: Principal Components Analysis

PCA• With multivariate research we come to eigenvalues and

eignenvectors• Eigenvalues

▫ Conceptually can be considered to measure the strength (relative length) of an axis in N-dimensional space

▫ Derived via eigenanalysis of the square symmetric matrixThe covariance or correlation matrix

• Eigenvector▫ Each eigenvalue has an associated eigenvector. While an eigenvalue is

the length of an axis, the eigenvector determines its orientation in space. ▫ The values in an eigenvector are not unique because any coordinates that

described the same orientation would be acceptable.

Page 21: Principal Components Analysis

Data

• Example data of women’s height and weight

height weight Zheight Zweight57 93 -1.77427146053986 -1.9651628606882458 110 -1.47097719378091 -.87340571586144160 99 -.86438866026301 -1.579836809572959 111 -1.16768292702196 -.80918470734221861 115 -.561094393504058 -.55230067326532460 122 -.86438866026301 -.10275361363075862 110 -.257800126745107 -.87340571586144161 116 -.561094393504058 -.488079664746162 122 -.257800126745107 -.10275361363075863 128 .0454941400138444 .28257243748458362 134 -.257800126745107 .66789848859992564 117 .348788406772796 -.42385865622687663 123 .0454941400138444 -.038532605111534765 129 .652082673531747 .34679344600380764 135 .348788406772796 .73211949711914866 128 .955376940290699 .28257243748458367 135 1.25867120704965 .73211949711914866 148 .955376940290699 1.5669926078690668 142 1.5619654738086 1.1816665567537169 155 1.86525974056755 2.01653966750362

Page 22: Principal Components Analysis

Data transformation• Consider two variables height and weight• X would be our data matrix, w our eigenvector

(coefficients)• Multiplying our original data by these weights1

results in a column vector of values▫ z1 = Xw

• The multiplying of a matrix by a vector results in a linear combination

• The variance of this linear combination is the eigenvalue

Presenter
Presentation Notes
1. The actual formula to produce standardized scores which are often the default output involves the matrix of eigenvalues also. PC = XTL-1/2 where X is the mean-centered data, T the eigenvector matrix, and L the eigenvalue matrix in the form below. Note L-1/2: What are eigenvalues? Variance. What’s the square root of (i.e. taking to the power of ½) Variance? Standard deviation. What does dividing by (i.e. multiplying the inverse of) standard deviation do? Standardizes the variable in question. Ev10 0Ev2
Page 23: Principal Components Analysis

Data transformation• Consider a woman 5’ and 122 pounds• She is -.86sd from the mean height and -.10 sd

from the mean weight for this data

• The first eigenvector associated with the normalized data1 is [.707,.707], as such the resulting value for that data point is -.68

• So with the top graph we have taken the original data point and projected it onto a new axis -.68 units from the origin

• Now if we do this for all data points we will have projected them onto a new axis/component/dimension/factor/linear combination

• The length of the new axis is the eigenvalue

11 2 1 1 2 2

2

' ( )b

a b a a a b a bb⎡ ⎤

= = +⎢ ⎥⎣ ⎦

Presenter
Presentation Notes
1. Correlation matrix. Alternatively one could use the covariance matrix as a starting point and all packages will allow you to specify which to use. Here you can see it in action with R (The original dataset is called datanotes): #create the mean-centered dataset, then obtain the eigenvalues and eigenvectors datanotesMC=data.frame((datanotes$HEIGHT-mean(HEIGHT)),(datanotes$WEIGHT-mean(WEIGHT))) evect=eigen(cov(datanotesMC))$vectors eigen(cov(datanotesMC))$values evalue=matrix(c(250.725081,0,0,2.609130),ncol=2) components=as.matrix(datanotesMC)%*%evect pca=princomp(datanotes) components pca$scores
Page 24: Principal Components Analysis

Data transformation• Suppose we have more than one

dimension/factor?• In our discussion of the techniques thus far,

we have said that each component or dimension is independent of the previous one

• What does independent mean?▫ r = 0

• What does this mean geometrically in the multivariate sense?

• It means that the next axis specified is perpendicular to the previous

• Note how r is represented even here• The cosine of the 90o angle formed by the

two axes is… 0• Had the lines been on top of each other (i.e.

perfectly correlated) the angle formed by them would be zero, whose cosine is 1▫ r = 1

Page 25: Principal Components Analysis

Data transformation• The other eigenvector associated with

the data is (-.707,.707)• Doing as we did before we’d create that

second axis, and then could plot the data points along these new axes1

• We now have two linear combinations, each of which is interpretable as the vector comprised of projections of original data points onto a directed line segment

• Note how the basic shape of the original data has been perfectly maintained

• The effect has been to rotate the configuration (45o) to a new orientation while preserving its essential size and shape▫ It is an orthogonal transformation▫ Note that we have been talking of

specifiying/rotating axes, but rotating the points themselves would give us the same result

Presenter
Presentation Notes
1. Such plots are very commonly used, and can be used e.g. to identify which cases are ‘behaving’ similarly to others. plot(pca$scores, ylim=c(-30,30))
Page 26: Principal Components Analysis

Meaning of “Principal Components”

• “Component” analyses are those that are based on the “full” correlation matrix• 1.00s in the diagonal

• “Principal” analyses are those for which each successive factor...• accounts for maximum available variance• is orthogonal (uncorrelated, independent) with all prior

factors• full solution (as many factors as variables), i.e. accounts for

all the variance

Page 27: Principal Components Analysis

Application of PC analysis• Components analysis is a kind of “data reduction”

• start with an inter-related set of “measured variables”• identify a smaller set of “composite variables” that can be constructed

from the “measured variables” and that carry as much of their information as possible

• A “Full components solution” ...• has as many components as variables• accounts for 100% of the variables’ variance• each variable has a final communality of 1.00

• A “Truncated components solution” …• has fewer components than variables• accounts for <100% of the variables’ variance• each variable has a communality < 1.00

Page 28: Principal Components Analysis

The steps of a PC analysis• Compute the correlation matrix• Extract a full components solution• Determine the number of components to “keep”

• total variance accounted for• variable communalities• interpretability• replicability

• “Rotate” the components and interpret (name) them• Compute “component scores” • “Apply” components solution

• theoretically -- understand the meaning of the data reduction• statistically -- use the component scores in other analyses

Page 29: Principal Components Analysis

PC Extraction• Extraction is the process of forming PCs as linear combinations of

the measured variables as we have done with our other techniquesPC1 = b11X1 + b21X2 + … + bk1Xk

PC2 = b12X1 + b22X2 + … + bk2Xk

PCf = b1fX1 + b2fX2 + … + bkfXk

• The goal is to reproduce as much of the information in the measured variables with as few PCs as possible

• Here’s the thing to remember…• We usually perform factor analyses to “find out how many groups of

related variables there are”, however …• The mathematical goal of extraction is to “reproduce the variables’

variance, efficiently”

Page 30: Principal Components Analysis

3 variable example• Consider 3 variables with

the correlations displayed• In a 3d sense we might

envision their relationship as this, with the shadows what the scatterplots would roughly look like for each bivariate relationship

X1

X3

X2

Page 31: Principal Components Analysis

The first component identified

Page 32: Principal Components Analysis

• The variance of this component, its eigenvalue, is 2.063• In other words it accounts for twice as much variance as

any single variable1

• Note 3 variables 2.063/3 = .688% variance accounted for by this first component1

Presenter
Presentation Notes
1.This is because the correlation, rather than covariance matrix was used for analysis. Each variable, being standardized, has a variance of 1, so the total variance = # of variables = 3
Page 33: Principal Components Analysis

PCA• In principal components, we extract as many

components as there are variables• As mentioned previously, each component by default is

uncorrelated with the previous• If we save the component scores and were to look at their

graph it would resemble something like this

Page 34: Principal Components Analysis

How do we interpret the components?• The component loadings can inform us as

to their interpretation• They are the original variable’s correlation

with the component• In this case, all load nicely on the first

component, which since the others do not account for nearly as much variance is probably the only one to interpret

• Depending on the type of PCA, the rotation etc. you may see different loadings although often the general pattern will remain

• With PCA as much the overall pattern to be considered relative to sign or absolute values▫ Which variables load on to which

components in general?

Page 35: Principal Components Analysis

• Here is an example of magazine readership from the chapter handout

• Underlined loadings are > .30• How might this be

interpreted?

Page 36: Principal Components Analysis

Applied example

• Six items▫ Three sadness, three relationship quality▫ N = 300

• PCA

Page 37: Principal Components Analysis

Start with the Correlation Matrix

Page 38: Principal Components Analysis

Communalities are ‘Estimated’• A measure of how much variance of the original

variables is accounted for by the observed components/factors

• Uniqueness is 1-communality• With PC with all factors (as opposed to a

truncated solution), communality will always equal 1

• Why 1.0?▫ PCA analyzes all the variance for each variable

• As we’ll see with FA, the approach will be different▫ The initial value is the multiple R2 for the

association between a item and all the other items in the model

▫ FA analyzes shared variance

Page 39: Principal Components Analysis

What are we looking for?• Any factor whose eigenvalue is less than 1.0 is in most

cases not going to be retained for interpretation▫ Unless it is very close or has a readily understood and

interesting meaning• Loadings1 that are:▫ more than .5 are typically considered strong▫ between .3 and .5 are acceptable▫ Less than .3 are typically considered weak

• Matrix reproduction▫ All the information about the correlation matrix is

maintained▫ Correlations can be reproduced exactly in PCA

Sum of cross loadings

Presenter
Presentation Notes
1. As with all heuristics, this allows one to not actually think about their research problem. Go by what makes sense.
Page 40: Principal Components Analysis

Assessing the variance accounted for

Eigenvalue is an index of the strength of the component, the amount of variance it accounts for. It is also the sum of the squared loadings for that component

Eigenvalue/N of items or variables

Page 41: Principal Components Analysis

Loadings

Eigenvalue of factor 1 = .6092 + .6142 .5932 + .7282 + .7672 + .7642 = 2.80

Page 42: Principal Components Analysis

Reproducing the correlation matrix (R)• Sum the products of the loadings for two variables on all factors

▫ For RQ1 and RQ2:▫ (.61 * .61) + (.61 * .57) + (-.12 * -.41) + (-.45 * .33) + .06 * .05) + (.20 * -.16) = .59▫ If we just kept to the first the first two factors only, the reproduced correlation = .72

• Note that an index of the quality of a factor analysis (as opposed to PCA) is the extent to which the factor loadings can reproduce the correlation matrix1. with PCA, the correlation is reproduced exactly if all components are retained, however when we don’t, we can use a similar approach to ‘fit’.

Original correlation

Presenter
Presentation Notes
1. Note that your reproduced R matrix will not necessarily match what you get from just obtaining correlations via the SPSS menu. That is by default a pairwise approach while the PCA default is casewise i.e. the same N for all correlations is used. The reproduced R provided by the PCA will perfectly match the complete case original R matrix.
Page 43: Principal Components Analysis

Variance Accounted For• For Items▫ The sum of the square of the loadings (i.e., weights) across

the components is the amount of variance accounted for in each item.

▫ Item 1: .612 + .612 + -.122 + .452 + .062 + .202 = .37 + .37 + .015 + .2 + .004 + .04 = ~1.0

▫ For the first two factors: .74• For components▫ How much variance is accounted for by the components

that will be retained?

Page 44: Principal Components Analysis

When is it appropriate to use PCA?• PCA is largely a descriptive procedure• In our examples, we are looking at variables with decent

correlations. However, if the variables are largely uncorrelated PCA won’t do much for you▫ May just provide components that are respective of each

individual variable i.e. nothing is gained• One may use Bartlett’s sphericity test to determine

whether such an approach is appropriate• It tests the null hypothesis that the R matrix is an

identity matrix (ones on the diagonal, zer0s offdiagonals)• When the determinant of R is small (recall from before

this implies strong correlation), the chi-square statistic will be large reject H0 and PCA would be appropriate for data reduction

• One should note though that it is a powerful test, and usually will result in rejection with typical sample sizes

• One may instead refer to estimation of practical effect rather than a statistical test▫ Are the correlations worthwhile?

22

2

2 5( 1) ln2 6

2number of variablesnumber of observations

ln natural log of thedeterminant of

p p pn

p p df

pn

χ⎡ ⎤− +⎡ ⎤= − − −⎢ ⎥ ⎢ ⎥⎣ ⎦⎣ ⎦−

=

==

=

R

R R

Page 45: Principal Components Analysis

How should the data be scaled?• In most of our examples we have been using the R

matrix instead of the var-covar matrix• As PCA seeks to maximize variance, it can be sensitive to

scale differences across variables• Variables with a larger range of scores would thus have

more of an impact on the linear combination created• As such, the R matrix will typically be used, except

perhaps in cases where the items are on the same scale (e.g. Likert)

• The values involved will change (e.g. eigenvalues), though the general interpretation may not

Page 46: Principal Components Analysis

How many components should be retained?

• There are many ways to determine this1

▫ "Solving the number of factors problem is easy, I do it everyday before breakfast. But knowing the right solution is harder" (Kaiser)

• Kaiser’s Rule▫ What we’ve already suggested i.e.

eigenvalues over 1▫ The idea is that any component should

account for at least as much as a single variable

• Another perspective on this is to retain as many components as will account for X% of variance in the original variables▫ Practical approach

• Scree Plot▫ Look for the elbow2

Look for the point after which the remaining eigenvalues decrease in linear fashion and retain only those ‘above’ the elbow

▫ Not really a good primary approach though may be consistent with others

• Chi-square▫ Null hypothesis is that X number of

components is sufficient▫ Want nonsignificant result

• Horn’s Procedure▫ This is a different approach which

suggests to create a set of random data of the same size N and p variables

▫ The idea is that in this maximizing variance accounted for, PCA has a good chance of capitalization on chance

▫ Even with random data, the first eigenvalue will be > 1

▫ As such, retain components with eigenvalues greater than that produced by the largest component of the random data

Presenter
Presentation Notes
1. The VSS function in the psych library for R has 8 alone. 2. Just where this elbow actually is can be surprisingly different
Page 47: Principal Components Analysis

Rotation• Sometimes our loadings will be a little difficult

to interpret initially• Given such a case we can ‘rotate’ the solution

such that the loadings perhaps make more sense▫ This is typically done in factor analysis but is possible

here too• An orthogonal rotation is just a shift to a new set

of coordinate axes in the same space spanned by the principal components

Page 48: Principal Components Analysis

Rotation• You can think of it as shifting the axes or

rotating the ‘egg’ in our previous graphic• The gist is that the relations among the items is

maintained, while maximizing their more natural loadings and minimizing ‘off-loadings’1

• Note that as PCA is a technique that initially creates independent components, and orthogonal rotations that maintain this independence are typically used▫ Loadings will be either large or small, little in between

• Varimax is the common rotation utilized▫ Maximizes the sum of the squared loadings for each

component

Presenter
Presentation Notes
1. Not to be confused with the rugby term “off-load”.
Page 49: Principal Components Analysis

Other issues: How do we assess validity?

• Usual suspects• Cross-validation

▫ Holdout sample as we have discussed before▫ About a 2/3, 1/3 split▫ Using eigenvectors from the original components, we can create new components

with the new data and see how much variance each accounts for▫ Hope it’s similar to original solution

• Jackknife▫ With smaller samples conduct PCA multiple times each with a specific case held

out▫ Using the eigenvectors, calculate the component score for the value held out▫ Compare the eigenvalues for the components involved

• Bootstrap▫ In the absence of a hold out sample, we can create a bootstrapped sample to

perform the same function

Page 50: Principal Components Analysis

Other issues: Factoring items vs. factoring scales1

• Items are often factored as part of the process of scale development

• Check if the items “go together” like the scale’s author thinks

• Scales (composites of items) are factored to …▫ examine construct validity of “new” scales▫ test “theory” about what constructs are interrelated

• Remember, the reason we have scales is that individual items are typically unreliable and have limited validity

Presenter
Presentation Notes
1. Really more of a concern for FA, but I went ahead and kept them in the PCA notes also
Page 51: Principal Components Analysis

Other issues: Factoring items vs. factoring scales• The limited reliability and validity of items

means that they will be measured with less precision, and so, their intercorrelations for any one sample will be “fraught with error”

• Since factoring starts with R, factorings of items is likely to yield spurious solutions -- replication of item-level factoring is very important!

• Is the issue really “items vs. scales” ?▫ No -- it is really the reliability and validity of the “things

being factored”, scales having these properties more than items

Page 52: Principal Components Analysis

Other issues: When is it appropriate to use PCA?• Another reason to use PCA, which isn’t a great one

obviously, is that the maximum likelihood test involved in and Exploratory Factor Analysis does not converge

• PCA will always give a result (it does not require matrix inversion) and so can often be used in such a situation

• We’ll talk more on this later, but in data reduction situations EFA is typically to be preferred for social scientists and others that use imprecise measures

Page 53: Principal Components Analysis

Other issues: Selecting Variables for Analysis• Sometimes a researcher has access to a data set that

someone else has collected -- an “opportunistic data set”• While this can be a real money/time saver, be sure to

recognize the possible limitations• Be sure the sample represents a population you want to

talk about• Carefully consider variables that “aren’t included” and

the possible effects their absence has on the resulting factors▫ this is especially true if the data set was chosen to be “efficient” variables

chosen to cover several domains• You should plan to replicate any results obtained from

opportunistic data

Page 54: Principal Components Analysis

Other issues:Selecting the Sample for Analysis• How many?• Keep in mind that the R and so the factor solution is the

same no matter how many cases are used -- so the point is the representativeness and stability of the correlation

• Advice about the subject/variable ration varies pretty dramatically▫ 5-10 cases per variable▫ 300 cases minimum (maybe + # of items)1

• Consider that like for other statistics, your standard error for correlation decreases with increasing sample size

Presenter
Presentation Notes
1. This size would be more for confirmatory factor analysis/sem (and still probably too small for many data situations).
Page 55: Principal Components Analysis

A note about SPSS• SPSS does provide a means for principal

components analysis• However, its presentation (much like many

textbooks for that matter) blurs the distinction between PCA and FA, such that they are easily confused

• Although they are both data dimension reduction techniques, they do go about the process differently, have different implications regarding the results and can even come to different conclusions

Page 56: Principal Components Analysis

A note about SPSS• In SPSS, the menu is ‘factor’ analysis (even though

‘principal components’ is the default technique setting)• Unlike other programs PCA isn’t even a separate

procedure (it’s all in the Factor syntax)• In order to perform PCA, make sure you have principal

components selected as your extraction method, analyze the correlation matrix, and specify the number of factors to be extracted equals the number of variables

• Even now, your loadings will be different from other programs, which are scaled such that the sum of their squared values = 1

• In general be cautious when using SPSS

Page 57: Principal Components Analysis

PCA in R1

• Package name▫ Function name

• base▫ princomp

• psych▫ principal▫ VSS

• pcaMethods▫ As the name implies this package is all about PCA, and from a modern

approach. Will automatically estimate missing values (via traditional, robust, or Bayesian methods) and is useful just for that for any analysis.

▫ pca▫ Q2 for cross validation

• FactoMiner R-commander plugin▫ PCA

Presenter
Presentation Notes
1. PCA and Factor analysis are techniques, like regression, I simply would not do in SPSS personally nor recommend. Far too limited and out of touch with developments of the past 50 years.