L5 – Repeated Measures and Longitudinal Analyses using MIXED 1 Review of GLM Repeated / Intro to Longitudinal Analyses (Done in SPSS because I believe you can’t do traditional repeated measures in R.) The Chapter 5 data file, ch5hortest . . . The hor stands for horizontal. We’ll see what that means shortly. test1, test2, test3: Scores on identical achievement tests taken at 3 different times with approximately equal intervals between tests. effective is a dichotomous variable equal to 1 if the teacher is perceived to be effective and 0 if not. courses ??? I don’t know what this represents. female is a dichotomous variable equal to 1 if teacher is female?
54
Embed
€¦ · Web viewThe Chapter 5 data file, ch5hor test . . . The hor stands for horizontal. We’ll see what that means shortly. test1, test2, test3: Scores on identical achievement
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
L5 – Repeated Measures and Longitudinal Analyses using MIXED 1
Review of GLM Repeated / Intro to Longitudinal Analyses
(Done in SPSS because I believe you can’t do traditional repeated measures in R.)
The Chapter 5 data file, ch5hortest . . .The hor stands for horizontal. We’ll see what that means shortly.
test1, test2, test3: Scores on identical achievement tests taken at 3 different times with approximately equal intervals between tests.
effective is a dichotomous variable equal to 1 if the teacher is perceived to be effective and 0 if not.
courses ??? I don’t know what this represents.
female is a dichotomous variable equal to 1 if teacher is female?
ses is student SES as a Z-score.
ses_mean is mean SES of all students in a school.
courses_mean Mean of courses for a school. I don’t know what this represents.
L5 – Repeated Measures and Longitudinal Analyses using MIXED 2
Questions to ask (p. 142)
1. Is there change in average achievement across the 3 test periods?
2. If there is change, is it systematic as opposed to merely random?
3. It it’s systematic, what is the shape of the curve of change across time?
Linear? Quadratic? Cubic?
4. Is change related to student ses or teacher effectiveness?
The above are all basic questions that can be asked of repeated measures data.
All such questions can be answered using traditional repeated measures analyses.
As we’ll see, they can also be answered by treating repeated measures data as if they were multilevel data.
L5 – Repeated Measures and Longitudinal Analyses using MIXED 3
Analyses of the horizontal file data using the GLM Repeated Measures procedure, p. 151
(These kinds of analyses should be vaguely familiar to you from last semester.)These analyses require the horizontal file, ch5hortest.sav.
I called the repeated measures factor, “time”.
I’ve not mentioned this before, but you can give the variable that is being measured across time periods a unique name.
Type that name into the “Measure Name:” field. Then lick [Add].
Here it is called “test”. That is, I’m measuring something called “test” at each of three time periods.
Obviously, I could have name the measure, time. But, in a more elaborate dataset, several things might be measured at different time periods – test, height, depression, etc.
This says that what we’re measuring at the three time periods are test scores.
L5 – Repeated Measures and Longitudinal Analyses using MIXED 4
The data editor columns corresponding to the three times are specified to SPSS.
I recommend, as usual, that you ask for as many visual displays of the data as are practicable.
L5 – Repeated Measures and Longitudinal Analyses using MIXED 5
I recommend, as usual, that you ask for descriptive statistics, effect sizes, and observed power.
Since we have no covariates, the estimated means will be identical to the observed means.
L5 – Repeated Measures and Longitudinal Analyses using MIXED 6
The output of the analysis using traditional repeated measures.GLM test1 test2 test3 /WSFACTOR=time 3 Polynomial /MEASURE=test /METHOD=SSTYPE(3) /PLOT=PROFILE(time) /EMMEANS=TABLES(time) /PRINT=DESCRIPTIVE ETASQ OPOWER /CRITERIA=ALPHA(.05) /WSDESIGN=time.
General Linear Model[DataSet1] G:\MdbT\P595C(Multilevel)\Multilevel and Longitudinal Modeling with IBM SPSS\Ch5Datasets&ModelSyntaxes\ch5hortest.sav
L5 – Repeated Measures and Longitudinal Analyses using MIXED 8
A nice feature of the GLM Repeated Measures analysis procedure is its automatic test of the shape of the curve of means across time periods. That test assumes that the time periods are equally spaced, however, so don’t rely on it if they are not. We don’t know, for sure, whether the time periods in this dataset are equally space. We’ll assume they are.
As we’ll see, if we want such tests in MIXED, we’ll have to create them using polynomials (ugh). But analyzing the shape using multilevel techniques WILL allow the time periods to be unequally spaced.
The test below tells us that the shape of the curve of means over the three time periods is not precisely linear, but curvilinear. Alas, it doesn’t tell us in this table, what the nature of the curve is – whether it is curved downward or curved upward.
Tests of Within-Subjects Contrasts
Measure:test
Source time
Type III
Sum of
Squares df
Mean
Square F Sig.
Partial Eta
Squared
Noncent.
Parameter
Observed
Powera
time Linear 310416.221 1 310416.221 4801.239 .000 .356 4801.239 1.000
Be careful!! There is no between-subjects factor, but this table will always be displayed. It’s simply telling us that the overall mean of all the scores is significantly different from 0.
L5 – Repeated Measures and Longitudinal Analyses using MIXED 9
Estimated Marginal Means
These are means of the dependent variable within each group computed assuming that all covariates had the same value.
Since there are no covariates, these means will be the observed means.time
Measure:test
time Mean Std. Error
95% Confidence Interval
Lower Bound Upper Bound
1 48.632 .104 48.428 48.837
2 53.107 .106 52.899 53.315
3 57.094 .106 56.886 57.303
Profile Plots
Note that even though the overall relationship looks very nearly linear, the quadratic component was significant, suggesting that the very slight downward bend in the curve is a significant one.
Recall that the sample size was 8000+, meaning that even the smallest real effect will be significant.
Very slight downward curvature
L5 – Repeated Measures and Longitudinal Analyses using MIXED 10
Traditional repeated analyses with student / person factors.
1) Teaching effectiveness:
0 = teacher of the student not judged to be effective; 1 = teacher judged to be effective
2) Student SES as a continuously varying quantity potentially different for each student.
Enter the same responses as those entered when there were no between-subjects covariates.
L5 – Repeated Measures and Longitudinal Analyses using MIXED 11
I could have entered effective as a covariate, since it has only two values.
But I wanted to create graphs with separate lines for each value of effective. For that, it has to be a factor.
L5 – Repeated Measures and Longitudinal Analyses using MIXED 12
Note that since we have a continuous covariate in the model, it makes sense to get the parameters of the equation corresponding to that covariate. That’s why I checked the “Parameter Estimates” box.
GLM test1 test2 test3 BY effective WITH ses /WSFACTOR=time 3 Polynomial /MEASURE=test /METHOD=SSTYPE(3) /PLOT=PROFILE(time time*effective) /EMMEANS=TABLES(time) WITH(ses=MEAN) /EMMEANS=TABLES(effective*time) WITH(ses=MEAN) /PRINT=DESCRIPTIVE ETASQ OPOWER PARAMETER /CRITERIA=ALPHA(.05) /WSDESIGN=time /DESIGN=ses effective.
L5 – Repeated Measures and Longitudinal Analyses using MIXED 13
General Linear Model Output[DataSet1] G:\MdbT\P595C(Multilevel)\Multilevel and Longitudinal Modeling with IBM SPSS\Ch5Datasets&ModelSyntaxes\ch5hortest.sav
Within-Subjects Factors
Measure:test
time
Dependent
Variable
1 test1
2 test2
3 test3
Between-Subjects Factors
N
effective Teacher effectiveness .00 3901
1.00 4769
Descriptive Statistics
effective Teacher
effectiveness Mean Std. Deviation N
test1 .00 46.9255 12.14551 3901
1.00 50.0284 6.82068 4769
Total 48.6323 9.71254 8670
test2 .00 50.5716 12.39563 3901
1.00 55.1815 6.51977 4769
Total 53.1073 9.88757 8670
test3 .00 51.7330 10.02157 3901
1.00 61.4799 7.28562 4769
Total 57.0944 9.89402 8670
L5 – Repeated Measures and Longitudinal Analyses using MIXED 14
a. Exact statisticb. Computed using alpha = .05c. Design: Intercept + ses + effective Within Subjects Design: time
Time: There were significant differences in mean test scores across the three time periods.
Note that SPSS Automatically tested two interactions – the time*ses interaction and the effective*ses interaction. Thanks, SPSS. Remember that interactions are moderation. We’re asking, does ses moderate the increase in test scores over time? And does effective moderate the increase in test scores over time?
Time*ses: Since it’s not significant, it tells us that change across time was the same for low ses kids as it was for high ses kids. No moderation.
Time*effective: Since it is significant, it tells us that the difference in mean test scores across times was different for kids with less effective teachers than it was for kids with more effective teachers. That is, effective moderates the Test~Time relationship.
L5 – Repeated Measures and Longitudinal Analyses using MIXED 15
The data have to pass Mauchly’s test of sphericity in order for us to be able to interpret the Sphericity Assumed line below.
Mauchly's Test of Sphericityb
Measure:test
Within Subjects Effect Mauchly's W
Approx. Chi-
Square df Sig.
Epsilona
Greenhouse-
Geisser Huynh-Feldt Lower-bound
time .969 269.218 2 .000 .970 .971 .500Tests the null hypothesis that the error covariance matrix of the orthonormalized transformed dependent variables is proportional to an identity matrix.a. May be used to adjust the degrees of freedom for the averaged tests of significance. Corrected tests are displayed in the Tests of Within-Subjects Effects table.b. Design: Intercept + ses + effective Within Subjects Design: time
Sphericity did not hold, we failed Mauchly’s test, so we must ignore the “Sphericity Assumed” results on the next page.
(You must ignore them. Don’t mess with the God of Statistics!!)
L5 – Repeated Measures and Longitudinal Analyses using MIXED 16
The results here are the same as found in the Multivariate Results table.
1) There is an effect of time; 2) there is no time*ses interaction; and 3) there is a time*effective interaction.
So mean test performance changes across time, and it changes in a different way for kids with effective teachers than it does for kids with uneffective teachers.
The kid’s ses does not affect how his/her test scores change over time.
What about the effect of ses on test scores overall? That answer is in the Test of Between Subjects Effects below.
But first, the nature of the changes over time . . . linear? Quadratic?
L5 – Repeated Measures and Longitudinal Analyses using MIXED 17
Tests of Within-Subjects Contrasts
Measure:test
Source time
Type III Sum
of Squares df
Mean
Square F Sig.
Partial Eta
Squared
Noncent.
Parameter
Observed
Powera
time Linear 283779.090 1 283779.090 4795.547 .000 .356 4795.547 1.000
The dependent variable in these tests of Between-Subjects Effects is the mean across all levels of the repeated measures variable – the mean across the three time periods.
The mean scores across the 3 times were not related to student SES values (p = .427).(Hmm- this is different from the result with the ch3 data.)
The average scores across the 3 times were related to teaching effectiveness.
High effective: Average of test1,test2,test3 is largeLow effective: Average of test1,test2,test3 is low
L5 – Repeated Measures and Longitudinal Analyses using MIXED 19
The following are reported because of the presence of a covariate, ses, and because I asked for Parameter Estimates.
At each time period, test scores were regressed onto SES. The relationship was NS for each regression.
b. This parameter is set to zero because it is redundant.
So, the average of all three test scores was not related to ses, as shown above in the Tests of Between-Subjects Effects.
This table adds to that by telling us that none of the individual test scores was related to ses – neither test1, nor test2, nor test2, as shown here.
There is no corresponding test for effective because SPSS was told that it’s a factor, not a continuous variable.
L5 – Repeated Measures and Longitudinal Analyses using MIXED 20
Estimated Marginal Means
1. time
Measure:test
time Mean Std. Error
95% Confidence Interval
Lower Bound Upper Bound
1 48.477a .103 48.274 48.680
2 52.877a .104 52.673 53.080
3 56.606a .093 56.424 56.789
a. Covariates appearing in the model are evaluated at the following values:
ses = .0370.
The estimated marginal means at each time are computed as if all participants had the same SES score - the average of all SES scores - .0370.These could be referred to as mean test scores adjusted for ses.
2. Teacher effectiveness * timeMeasure:test
Teacher effectiveness time Mean Std. Error95% Confidence Interval
a. Covariates appearing in the model are evaluated at the following values: ses = .0370.The estimated marginal means at each combination of effectiveness level and time period are computed as if all participants had the same SES score - .0370.
These could be referred to as test means at each combination of time period and effectiveness level adjusted for ses.
Since ses was not significant, these are virtually identical to the observed means reported several pages ago. They may not be in cases when the covariate is significantly related to the dependent variable.
L5 – Repeated Measures and Longitudinal Analyses using MIXED 21
Profile Plots
Overall plot across effective groups.
The graph below explains the slight downward bend in the overall curve.
L5 – Repeated Measures and Longitudinal Analyses using MIXED 22
Students with effective teachers actually gained momentum from the 2nd to the 3rd test.
Students with ineffective teachers were not able to continue upward at the same rate – they began falling behind even more.
Note that students with less effective teachers did more poorly from the beginning. Presumably, lack of effectiveness affected performance on the first test.
Obviously, these results could have huge policy implications.
Overall Summary . . .
The SPSS GLM Repeated Measures procedure does a great job of giving you valuable information concerning your data.
I strongly recommend that you analyze your data with this procedure, if it is appropriate for your data.
Let’s see how well the MIXED procedure can provide similar answers.
L5 – Repeated Measures and Longitudinal Analyses using MIXED 23
Longitudinal Analysis Using MIXED I, 2E p. 191 ff
We’ve analyzed these data using the traditional repeated measures techniques. Now we’ll see how the same data can be analyzed using the multilevel MIXED procedure.
Horizontal vs. Vertical arrangement
Traditional repeated measures analyses require that the repeated measures occupy different columns of the data editor. This arrangement will be called the horizontal arrangement. Here are the first few cases of the Ch 5 data arranged horizontally . . .
The multilevel R procedures and the SPSS MIXED procedure, on the other hand, requires that each repeated measure occupy a different row of the data editor.
This will be called a vertical arrangement.
It’s also called person.period or ppt arrangement.
L5 – Repeated Measures and Longitudinal Analyses using MIXED 24
Here are the Ch5 data arranged vertically.
(Some other variables, e.g., time, have been added. More on those later.
Major points about going from horizontal to vertical . . .
1. Each time period is a row in the vertical arrangement.2. Values which changed from time to time from column to column in the same row now change from row to row in the same column.3. Values which were constant across time are copied from row to row.
L5 – Repeated Measures and Longitudinal Analyses using MIXED 25
The major differences between the “hor” and “ver” versions of the data are that
1) test1, test2, and test3 have been replaced by a single column, test, with the values of test1, test2, and test3 placed on successive lines of the data file.
2) All variables that applied to the person whose values did not change from one time period to the next, e.g., effective and ses, were copied downwards so they appear at each time period.
3) a variable called time, with values 0, 1, and 2 for each of the three successive lines of data has been added.
How do we get from one form to the other? Later. Trust me.
L5 – Repeated Measures and Longitudinal Analyses using MIXED 26
Longitudinal Analyses as Multilevel analyses
Level 1 data:
The individual observations from a given person at different time periods are the Level 1 Y values.
The times at which the observations were taken are the X values. So, invariably, time is the level 1 predictor.
Characteristics that vary from one time to the next are Level 1 characteristics. We won’t actually consider such characteristics here.
Level 1 Equation
Y = Intercept + Slope * Time + Residual
Level 2 data:
People are the Level 2 entities – analogous to groups in cross-sectional analysis. Each person gives us a group of scores.
Characteristics of people – their ses, sex, etc, are Level 2 characteristics
Typically, these characteristics are assumed to affect the intercepts of the equations relating Y to X (time) and the slopes of the same equations.
L5 – Repeated Measures and Longitudinal Analyses using MIXED 27
New Notation (ARGH!!!)
In this chapter, Heck et al, 2E p 187, introduce a new set of symbols for old concepts. So does everyone else, so we can’t blam just Heck and his co-authors.
p replaces B in the Level 1 equationsB replaces g in the Level 2 models of intercept and slope.
I was not responsible for this. The reason probably involves different traditions doing the same analyses.
The cross-sectional tradition used the B + g notation.
The longitudinal tradition uses the p + B notation.
So, for a basic model with a Level 1 predictor (time), a Level 2 Intercept predictor (person characteristic) and a Level 2 slope predictor (person characteristic). . .
(t=time; i=person) or (i=observation at a time period; j=person)
Level 1 ModelYti = p0i + p1iati + eti(a is time or a time-related characteristic; it’s usually just time) Level 2 Model of Intercept
p0i = B00 + B01*Personcharacteristic i + u0i Level 2 Model of Slope
p1i = B10 + B11*Personcharacteristic i + u1i
Combined
Yti = B00 + B01*Personcharacteristic i + u0i + (B10 +B11*Personcharacteristic i + u1i)*ati + eti
Really!!??
L5 – Repeated Measures and Longitudinal Analyses using MIXED 28
A simple random intercepts model of Ch5Vertical.Linear change over time with random intercepts.
Level 1 Model: Y = Intercept + slope * time + error
In symbols: Yti = p0i + p1i*timet+ eti
Level 2 Model of intercept:
Intercept = constant + random deviate
In symbols: p0i = B00 + u0i
Level 2 Model of slope:
Slope = constant
In symbols: p1i = B10
Combined model:
Y = constant + random deviate + slope*time + error
Yti = B00 + u0i + B10*timet + eti
R code and SPSS code to apply this model follow
L5 – Repeated Measures and Longitudinal Analyses using MIXED 29
R application of ch5Vertical data (more on creating vertical data later)
Is there Linear change over time with random intercepts?
R Rcmdr nlme import data ch5growthdata-vertical
> ch5a <- lme(fixed = test ~ time, random = ~1|id,data=ch5vert)> summary (ch5a)Linear mixed-effects model fit by REML Data: ch5vert AIC BIC logLik 189388.4 189421.1 -94690.22
Random effects: Formula: ~1 | id (Intercept) ResidualStdDev: 6.039054 7.759078
Fixed effects: test ~ time Value Std.Error DF t-value p-value(Intercept) 48.71361 0.09996515 17339 487.3059 0time 4.23105 0.05892309 17339 71.8062 0 Correlation: (Intr)time -0.589
Standardized Within-Group Residuals: Min Q1 Med Q3 Max -4.99170789 -0.35203890 -0.06226746 0.30966008 6.64026936
Number of Observations: 26010Number of Groups: 8670
L5 – Repeated Measures and Longitudinal Analyses using MIXED 30
I discovered a new R command, intervals, that will display confidence intervals for all estimates. We can use these to determine if the estimates are significantly different from 0.> intervals (ch5a)Approximate 95% confidence intervals
Random Effects: Level: id lower est. uppersd((Intercept)) 5.89741 6.039054 6.184101
Within-group standard error: lower est. upper 7.677904 7.759078 7.841110 Same Analysis in SPSSSyntaxMIXED test WITH time /FIXED=time | SSTYPE(3) /METHOD=REML /PRINT=G SOLUTION TESTCOV /RANDOM=INTERCEPT | SUBJECT(id) COVTYPE(UN) /REPEATED=time | SUBJECT(id) COVTYPE(ID).<--- Note the new line.
Mixed Model AnalysisFixed Effects
Estimates of Fixed Effectsa
Parameter Estimate Std. Error df t Sig.95% Confidence Interval
Note that while the fixed parameter estimates in nlme and SPSS are identical, there are small differences in the variance parameters. More on this later.
6.0390542 = 36.47017
7.7590782 = 60.30239
L5 – Repeated Measures and Longitudinal Analyses using MIXED 31
2nd Model 1 of Ch5Vertical.Linear change over time with random intercepts and random slopes.
Level 1 Model: Y = Intercept + slope * time + error
In symbols: Yti = p0i + p1i*timet+ eti
Level 2 Model of intercept:
Intercept = constant + random deviate
In symbols: p0i = B00 + u0i
Level 2 Model of slope:
Slope = constant + random deviate
In symbols: p1i = B10 + u1i
Combined model:
Y = constant + deviate + (constant+deviate)*time + error
Y = B00 + u0i + (B10 + u1i)*time + eti
Multiplying through by time . . .
Y = B00 + u0i + B10*time + u1i*time + eti
p0i p1i
L5 – Repeated Measures and Longitudinal Analyses using MIXED 32
R application of ch5Vertical data
Linear change over time with random intercepts and slopes
R Rcmdr nlme import data ch5growthdata-vertical
Note that I’ve unchecked the “Convert value labels to factor levels” box. I’ve found that especially useful when working with labeled dichotomous variables, which are to be treated as covariates.
L5 – Repeated Measures and Longitudinal Analyses using MIXED 33
The call to lme is nearly identical to the call for the random intercepts only model. The only change is “random = ~time” is substituted for “random = ~1” to tell lme that the slope with respect to time is to vary randomly from person to person.
Recall that lme assumes that the intercepts will vary randomly whenever slopes vary randomly.
From rcmdr, importing the datach5vert <- + read.spss("G:/MDBO/html2/p5520/p5520 Data/ch5growthdata-vertical.sav", + use.value.labels=FALSE, max.value.labels=Inf, to.data.frame=TRUE)
> colnames(ch5vert) <- tolower(colnames(ch5vert))
Call to lme> ch5b <- lme(fixed = test ~ time, random = ~time|id,data=ch5vert)> summary (ch5b)Linear mixed-effects model fit by REML Data: ch5vert AIC BIC logLik 189295.6 189344.6 -94641.81
Random effects: Formula: ~time | id Structure: General positive-definite, Log-Cholesky parametrization StdDev Corr (Intercept) 5.997067 (Intr)time 2.109559 -0.097Residual 7.466786
Fixed effects: test ~ time Value Std.Error DF t-value p-value(Intercept) 48.71361 0.09750377 17339 499.6074 0time 4.23105 0.06106200 17339 69.2910 0 Correlation: (Intr)time -0.564
Standardized Within-Group Residuals: Min Q1 Med Q3 Max -4.46673565 -0.33153341 -0.06519042 0.29046169 6.37548055
Number of Observations: 26010Number of Groups: 8670
Recall that this is the correlation between random intercept deviations and random slope deviations.
L5 – Repeated Measures and Longitudinal Analyses using MIXED 34
Note that while the fixed effects estimates are exactly identical, there are small differences in the random effects variances.
Note that the correlation between random intercepts and random slopes is negligible, at -0.097.
Variances35.964824.450237
Added by me.
L5 – Repeated Measures and Longitudinal Analyses using MIXED 35
What have we found so far?
When we assume that intercepts (initial math scores) of students may vary randomly and that slopes of the change in scores across times may vary randomly, we find that test scores increase significantly across the three time periods, increasing on the average of 4.2 points per time period. Note that the traditional repeated measure analysis did not give us this average change over time value.
We find that there is significant random variability in residuals of the individual test values from the predicted values based on time alone, leaving room for more predictors. The traditional analysis did not tell us about that variability.
We find that the variation in intercepts is significantly greater than 0, as it the variation in slopes. Neither of these could have been discovered in the traditional analysis.
We find little correlation between random variation in intercepts and random variation in slopes. The traditional analysis did not mention anything about that.
NOTHING ABOUT QUADRATIC CHANGE OR EFFECTIVE OR SES BECAUSE THESE ARE LEVEL 2 SYSTEMATIC EFFECTS THAT WERE NOT INCLUDED IN THE ABOVE ANALYSES.