Structural Equation Modeling and Confirmatory Factor Analysis

11/4/2014

1

Structural Equation Modeling and

Confirmatory Factor AnalysisAdvanced Statistics for Researchers Session 3

Dr. Chris Rakes Website: http://csrakes.yolasite.comEmail: [email protected]: @RakesChris

1

Types of Variables

Nominal: Names, Categories, ID numbers Ordinal: Ranks Interval: Dichotomous, Polytomous (No

Absolute Zero) Ratio: Measurements, Scalars (Absolute

Zero)

11/4/2014

2

Describing Data by the Center Example Data Set:

50, 10,1, 7, 1, 25, 20◦ Mean: Center Value

◦ Median: Center Term1, 1, 7, 10, 20, 25, 50

◦ Mode: Most Often Repeated Term(s)

3.167

114

7

50252010711

n

xx

Degrees of Freedom

Number of independent observations Consider a group of 4 observations. The mean is

20 (sum = 80). 20,so we estimate μ = 20 In the next sample, we’ve already estimated the

population mean to be 20, so 4 data points must sum to 80.

The first 3 observations are free to be anything, but the fourth must be fixed to make the sum 80.

____ +____ +____ +____ = _____ So, we always lose a degree of freedom when we

estimate a parameter.

80Free Free Free Fixed

11/4/2014

3

Variance and Standard Deviation Consider a sample data set

59637167647266

-7-351

-260

-7-3

-2

15

6

Is the “Mean Difference”XX

If the sum of the difference is 0, how can we compute a meaningful average?

P1

P2

a

bc

Notice that a + b ≠ c

a

c

b

11/4/2014

4

a

b

c

a2 + b2 = c2

So, square areas can be used to calculate distance, and they eliminate the 0 sum problem.

22 bac

Enter Pythagoras

Returning to Sample Data

Let’s look at a picture of these squares

59637167647266

-7-351

-260

49925

436

124

1

2XX

Distance (Distance)2

2XX

11/4/2014

5

-10

-9

-8

-7

-6

-5

-4

-3

-2

-1

0

1

2

3

4

5

6

7

8

9

10

55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75

Why 5 and not 6?

=5

Variance=

24.8

How can we find the side length of this square?

95.45.24

11/4/2014

6

What does that get us? The average square of mean distances is

referred to as “variance.”

This “noise” gives us a measure of how much of the data is not represented by the mean.

Standard Deviation: The side length of the variance square; the average distance from the mean

1

σ2

2

n

XXVariance

1

σσ2

2

n

XXDeviation Standard

Two Variables: Lines of Best Fit “Linear Regression” or “Least Squares Regression” How far do points “regress” from a line? Regress = “Deviate from.”

For two variables, we begin by considering the dependent variable, average distance of each Y from .

X Y11 96 15 28 39 54 29 412 6

YY 5‐3‐2‐11‐202

4 0=

11/4/2014

7

Converting to Squares

So the average sized square of variance is:Y

91235246

YY 2YY

?2

YSSYYN = 8df = 7

?

2

df

YY

Then the average distance from mean is:

?

2

df

YY

5‐3‐2‐11‐202

259411404

48

48

7= 6.9

= 2.62

Do the same for X

X1165894912

XX 2XX

0 XX ?

2

XSS

XXN = 8df = 7

?

2

df

XX

?

2

df

XX

56

56

7=8

?X 8

3‐2‐301‐414

9490116116

=2.83

11/4/2014

8

?XSS ?YSS ? YYXXSPXY

YX

XY

YX

XYXY

SSSS

SP

ss

sr

8490

84651

44

4856

44.

.XYr

X Y11 9 3 56 1 ‐2 ‐35 2 ‐3 ‐28 3 0 ‐19 5 1 14 2 ‐4 ‐29 4 1 012 6 4 2

XX YY YYXX

156601808

56

2XX 2YY 9490116116

259411404

4844

How strong is the relationship between X and Y?

1447

6.29

Estimate Line of Best Fit

ii XbbY 10 ˆ

786056

441 .

X

XY

SS

SPb

28820878600410 .... XbYb

ii XY 786.0288.2ˆ So,

11/4/2014

9

Use Linear Equation InformationX Y11 9 3 5 9 25 156 1 ‐2 ‐3 4 9 65 2 ‐3 ‐2 9 4 68 3 0 ‐1 0 1 09 5 1 1 1 1 14 2 ‐4 ‐2 16 4 89 4 1 0 1 0 012 6 4 2 16 4 8

48

XX YY YYXX 2XX 2YY Y YY ˆ 2YY

6.3582.4281.642

44.7860.8564.7867.144

2.642‐1.4280.358

‐10.2141.144‐0.786‐1.144

6.982.040.131.000.051.310.621.31

?ˆ 2 YY

13.43ii XY 786.0288.2ˆ

571.34429.1348 regressionSS

720000048

571342 ..

.

Y

regressionXY

SS

SSr

How much of an effect did the regression have?

11/4/2014

10

SEM *Causal* processes can be represented by structural

equations (regression equations –dependent variables being predicted by independent variables).

A model of these structural relations can be generated (and modeled pictorially)

SEM Variables Observed or manifest or measured variables: X’s or Y’s.

Latent variables (factors) – constructs that cannot be directly observed (or measured). Latent variables are estimated through hypothesized relationships with observed variables.◦ Exogenous latent variables – independent variables that “cause” changes in other

latent variables in the model. These are taken as “given” by the model under consideration, and any changes in exogenous variables are due to factors outside the model.

◦ Endogenous latent variables – dependent variables that are influenced by exogenous variables in the model. These are the outcomes your SEM model wishes to explain.

Observed(Y)

Latent

Latent

Latent

ExogenousEndogenous

Residual

ErrorFactor

LoadingsFactorLoading

Observed(X)

Error

11/4/2014

11

Factor Analysis Used to identify the factor structure or model for a set

of variables (Stevens, 2012)

Two types: Exploratory (EFA) and Confirmatory (CFA)

21

Exploratory Factor AnalysisSeveral Methods: ◦ Principal Components Analysis (PCA): Each successive

component accounts for the largest amount of unexplained variance

◦ Principal Axis Factoring: Identical to PCA, except that the factors are extracted from a correlation matrix with “communality estimates” on the main diagonal rather than 1’s, as in PCA.

◦ Unweighted Least Squares: Minimizes the sum of squared differences between the observed and model-implied off-diagonal correlation matrices.

◦ Generalized Least Squares: Correlations weighted by the inverse of their uniqueness, high uniqueness less weight.

◦ Alpha: Maximizes the Cronbach alpha of the factors (i.e., reliability)

◦ Image: Factors are defined by their linear regression on variables not associated with the hypothetical factors.

22

11/4/2014

12

Maximum Likelihood Estimation Attempts to find the population

parameter values from which the observed data are most likely to have arisen.

The likelihood function quantifies the discrepancy between the observed and model-implied parameters, assuming normal distribution.

Closed-form solutions for parameters usually do not exist, so iterative algorithms are used in practice for parameter estimation.

23

The Model Fitting Process Let S = the sample variance/covariance matrix

of observed scores from p variables. Let Σ = the variance/covariance matrix of the

population. Let θ represent the vector of model

parameters. Therefore, Σ(θ) represents the restricted

variance/covariance matrix implied by the model.

We are testing the hypothesis that the restricted matrix holds in the population:Null Hypothesis: Σ = Σ(θ).

SEM computes a minimum discrepancy function, Fmin. 24

11/4/2014

13

Understanding the Fmin Function

25

SpSTraceFMin loglog1

As Σ(θ) approaches S, this difference approaches 0

Trace: The sum of the diagonal

of a matrix

An inverse matrix times itself = the Identity Matrix (I), So, as Σ(θ)

approaches S, Σ(θ)-1S approaches I, as a result, the trace of the matrix will approach the

number of observed variables, p

So, as Σ(θ) approaches S, the difference of the trace

and p approaches 0.

Maximum Likelihood Estimation (Cont’d.)

The shape of the multivariate normal curve is defined by:

ℓ1

2 Σ

Substituting an individual’s vector of scores yields the likelihood of that set of scores given the population mean vector μ and covariance matrix Σ

26

11/4/2014

14

Maximum Likelihood Estimation (Cont’d.)

A model’s final parameter estimates are those that yield model-implied variances and covariances (and means) that maximize the combined likelihood of all n cases.

ℓ ℓ ℓ ℓ ⋯ℓ

ℓ1

2 Σ

27

Casewise Log Likelihoods

Likelihoods tend to be very small numbers, and hence their products become practically infinitesimal.

Taking the natural log of the likelihood makes things a bit more manageable.

ℓ ℓ ℓ ⋯ℓℓ ⋯ ℓ

⋯

22

12

Σ12

′Σ

28

11/4/2014

15

Casewise Log Likelihoods (Cont’d.)

With complete data, each case’s contribution to the overall log likelihood (LL) is:

22

12

Σ12

′Σ

In the missing data context, each case’s contribution to the log likelihood is:

22

12

Σ12

′Σ

Data and parameter arrays can vary for each ithcase.

The ith case’s contribution to the overall likelihood is based only on those variables for which that case has complete data.

29

Maximum Likelihood in SEM Model’s final parameter estimates are those

that yield model-implied variances and covariances (and means) that maximize the aggregated casewise log likelihoods:

12

212

Σ12

′Σ

In FIML, no data are ever imputed. Parameters and their SE are estimated directly

using all observed data. FIML is the default in many software (e.g.,

Mplus, Amos)

30

11/4/2014

16

Confirmatory Factor Analysis

Cannot be run easily in basic statistics packages such as SPSS—they do not offer the option to force variables to load on particular factors, only the number of factors.

SEM software easily accommodates CFA models, e.g., MPlus, AMOS, EQS, LISREL.

31

Psychological Distress CFA

32

First-Order CFA Second-Order CFA

11/4/2014

17

Psychological Distress CFA ResultsModel Model Description N AIC DF

Chi Square CFI RMSEA

RMSEA LO90

RMSEA HI90 SRMR ECVI

0a1CFA Caregiver Psychological Distress 227 8298.11 103 283.81 0.901 0.088 0.076 0.100 0.049 36.56

0a20a1 with Q030 and Q031 covaried 227 8236.27 102 219.97 0.935 0.071 0.058 0.084 0.044 36.28

0a3 2nd order CFA built on 0a2 227 8238.27 101 219.97 0.935 0.072 0.059 0.085 0.044 36.29

33

Variable Criterion

Minimum Fit χ2 Nested Model Comparison

CFI (Comparative Fit Index) > 0.95

AIC (Akaike Information Criterion)

Model Comparison Only (Does not have to be nested), Smaller Value = Better Fit

SRMR (Standardized Root Mean Square Residual)

< 0.10, Reasonable Fit< 0.08 Good Fit

RMSEA (Root Mean Square Error of Approximation)

< 0.05 = Good Fit0.05 – 0.08 = Reasonable0.08 – 0.10 = Mediocre> 0.10 = Poor Fit

ECVI As model is changed, smaller value indicates greater likelihood of being generalizable in the population

Reflection on CFA

What is your dissertation/thesis conceptual framework?

Are the constructs in your framework well-defined, and are the definitions well-established?

Could a CFA strengthen your study? Why or why not?

34

11/4/2014

18

Thank You!

All materials from this workshop series can be downloaded at http://csrakes.yolasite.com/resource.php

35

Structural Equation Modeling and Confirmatory Factor Analysis

Documents