ED VUL | UCSD Psychology 201ab Quantitative methods Linear Mixed Effects Models 1
ED VUL | UCSD Psychology
201ab Quantitative methodsLinear Mixed Effects Models
1
ED VUL | UCSD Psychology
Our expanding model landscape
Eq. var t-tests
Multiple regression
ANCOVAOLS regressionANOVA
Gen
eral
lin
ear m
odel
Logistic regressionPoisson regression
Gen
eral
ized
linea
r mod
els
Linear mixed models
Repeated measures ANOVA
Mixed design (split-plot) ANOVA
Generalized linear mixed models
Hierarchical Bayesian models
As we saw, and in general, special cases of broader model classes are usually favored despite being less flexible because they are simpler and allow for easier estimation and inference.
A few of the methods we covered don’t fit into this scheme:- Binomial test- Pearson’s Chi-
square tests- Unequal
variance t-tests
There are other generalizations:- Multivariate
linear model- General(ized)
additive model
ED VUL | UCSD Psychology
Motivating dataWe have pre- and post-class exam scores for 10 males and 10 females. The exam is divided into a qualitative and quantitative section, with 5 parts in each section.
str(test.data)
'data.frame': 400 obs. of 6 variables:$ student: Factor w/ 20 levels "S.1","S.10","S.11",..: 1 1 1 1 1 1 11 ...$ sex : Factor w/ 2 levels "female","male": 2 2 2 2 2 2 2 2 2 2 ...$ test : Factor w/ 2 levels "post","pre": 2 2 2 2 2 2 2 2 2 2 ...$ part : Factor w/ 10 levels "P.1","P.10","P.2",..: 1 3 4 5 6 7 8 9 ...$ section: Factor w/ 2 levels "qualitative",..: 1 1 1 1 1 2 2 2 2 2 ...$ score : num 53 50 79 67 70 68 68 62 65 79 ...
head(test.data)
student sex test part section scoreS.1 S.1 male pre P.1 qualitative 53S.1.2 S.1 male pre P.2 qualitative 50S.1.3 S.1 male pre P.3 qualitative 79S.1.4 S.1 male pre P.4 qualitative 67S.1.5 S.1 male pre P.5 qualitative 70S.1.6 S.1 male pre P.6 quantitative 68
ED VUL | UCSD Psychology
MotivationWe have pre- and post-class exam scores for 10 males and 10 females. The exam is divided into a qualitative and quantitative section, with 5 parts in each section.So: 20 (students) x 20 (time x parts) = 400 measurements
What’s the correlation structure?
How do you analyze this?
ED VUL | UCSD Psychology
MotivationWe have pre- and post-class exam scores for 10 males and 10 females. The exam is divided into a qualitative and quantitative section, with 5 parts in each section.So: 20 (students) x 20 (time x parts) = 400 measurements1) Do students improve from pre to post?2) Do females outperform males?3) Is qualitative or quantitative easier?4) Is improvement different for males and females? For qual. vs quant.?5) Does qual./quant. improve more? easier for males/females? 6) Is male/female difference different for qual/quant? Pre/Post?7) Does learning [pre vs post] alter any qual/quant disparity between
males/females?8) Are some parts easier or harder?9) Are some parts easier for males/females? Improve more?10) Do some parts improve more for males than females?11) Do some students do better or worse?12) Are some students better at qual/quant? Improve more?13) Do some students improve more on qual/quant?
ED VUL | UCSD Psychology
Crossed random effectsWe have pre- and post-class exam scores for 10 males and 10 females. The exam is divided into a qualitative and quantitative section, with 5 parts in each section.So: 20 (students) x 20 (time x parts) = 400 measurements• repeated measures ANOVA worldview: two options:– Students as the unit of analysis.sex as “between student” factorsection and time as “within student” factors.
– Test part as the unit of analysis.section as a “between part” factor, sex and time as “within part” factors.
ED VUL | UCSD Psychology
Analysis by pooling (subject)summary(aov(data=test.data,
score~sex*test*section + Error(student/(test*section))))
Error: studentDf Sum Sq Mean Sq F value Pr(>F)
sex 1 3570 3570 1.443 0.245Residuals 18 44539 2474
Error: student:testDf Sum Sq Mean Sq F value Pr(>F)
test 1 0.90 0.90 0.879 0.361 sex:test 1 33.06 33.06 32.195 2.21e-05 ***Residuals 18 18.48 1.03
Error: student:sectionDf Sum Sq Mean Sq F value Pr(>F)
section 1 5048 5048 661.6 1.20e-15 ***sex:section 1 1604 1604 210.2 2.27e-11 ***Residuals 18 137 8
Error: student:test:sectionDf Sum Sq Mean Sq F value Pr(>F)
test:section 1 1242.6 1242.6 1238.1 < 2e-16 ***sex:test:section 1 61.6 61.6 61.4 3.29e-07 ***Residuals 18 18.1 1.0
Error: WithinDf Sum Sq Mean Sq F value Pr(>F)
Residuals 320 31652 98.91
ED VUL | UCSD Psychology
Analysis by pooling (item)summary(aov(data=test.data,
score~sex*test*section + Error(part/(test*sex))))
Error: partDf Sum Sq Mean Sq F value Pr(>F)
section 1 5048 5048 1.302 0.287Residuals 8 31014 3877
Error: part:testDf Sum Sq Mean Sq F value Pr(>F)
test 1 0.9 0.9 0.046 0.836 test:section 1 1242.6 1242.6 62.891 4.65e-05 ***Residuals 8 158.1 19.8
Error: part:sexDf Sum Sq Mean Sq F value Pr(>F)
sex 1 3570 3570 205.83 5.44e-07 ***sex:section 1 1604 1604 92.48 1.14e-05 ***Residuals 8 139 17
Error: part:test:sexDf Sum Sq Mean Sq F value Pr(>F)
sex:test 1 33.06 33.06 23.53 0.001270 ** sex:test:section 1 61.62 61.62 43.86 0.000165 ***Residuals 8 11.24 1.41
Error: WithinDf Sum Sq Mean Sq F value Pr(>F)
Residuals 360 45042 125.1
ED VUL | UCSD Psychology
Analysis by item/subject pooling
summary(aov(data=test.data, score~sex*test*section + Error(student/(test*section))))
Error: studentDf Sum Sq Mean Sq F value Pr(>F)
sex 1 3570 3570 1.443 0.245Residuals 18 44539 2474
Error: student:testDf Sum Sq Mean Sq F value Pr(>F)
test 1 0.90 0.90 0.879 0.361 sex:test 1 33.06 33.06 32.195 2.21e-05 ***Residuals 18 18.48 1.03
Error: student:sectionDf Sum Sq Mean Sq F value Pr(>F)
section 1 5048 5048 661.6 1.20e-15 ***sex:section 1 1604 1604 210.2 2.27e-11 ***Residuals 18 137 8
Error: student:test:sectionDf Sum Sq Mean Sq F value Pr(>F)
test:section 1 1242.6 1242.6 1238.1 < 2e-16 ***sex:test:section 1 61.6 61.6 61.4 3.29e-07 ***Residuals 18 18.1 1.0
Error: WithinDf Sum Sq Mean Sq F value Pr(>F)
Residuals 320 31652 98.91
summary(aov(data=test.data, score~sex*test*section + Error(part/(test*sex))))
Error: partDf Sum Sq Mean Sq F value Pr(>F)
section 1 5048 5048 1.302 0.287Residuals 8 31014 3877
Error: part:testDf Sum Sq Mean Sq F value Pr(>F)
test 1 0.9 0.9 0.046 0.836 test:section 1 1242.6 1242.6 62.891 4.65e-05 ***Residuals 8 158.1 19.8
Error: part:sexDf Sum Sq Mean Sq F value Pr(>F)
sex 1 3570 3570 205.83 5.44e-07 ***sex:section 1 1604 1604 92.48 1.14e-05 ***Residuals 8 139 17
Error: part:test:sexDf Sum Sq Mean Sq F value Pr(>F)
sex:test 1 33.06 33.06 23.53 0.001270 ** sex:test:section 1 61.62 61.62 43.86 0.000165 ***Residuals 8 11.24 1.41
Error: WithinDf Sum Sq Mean Sq F value Pr(>F)
Residuals 360 45042 125.1
Both strategies give us the wrong answer because they neglect one source of covariation.
Subject analysis (pooling over parts)Ignores within-section among-part variation, most relevant for assessing differences between sections.
Item analysis (pooling over students)Ignores within-sex among-student variation, most relevant for assessing differences between sexes.
ED VUL | UCSD Psychology
Analysis by explicit poolingWe have pre- and post-class exam scores for 10 males and 10 females. The exam is divided into a qualitative and quantitative section, with 5 parts in each section.
ANOVA “by student” (disregarding part random effect)and ANOVA “by part” (disregarding subject random effect) give conflicting results.
This is because they respect some correlation structure (due to students, or due to parts) but not the other.
ED VUL | UCSD Psychology
Crossed random effectsWe have pre- and post-class exam scores for 10 males and 10 females. The exam is divided into a qualitative and quantitative section, with 5 parts in each section.So: 20 (students) x 20 (time x parts) = 400 measurements• repeated measures ANOVA worldview: two options:– Students as the unit of analysis.sex as “between student” factorsection and time as “within student” factors.
– Test part as the unit of analysis.section as a “between part” factor, sex and time as “within part” factors.
• Both of these omit the other source of correlated noise!
ED VUL | UCSD Psychology
We want both
12
summary(aov(data=test.data, score~sex*test*section + Error(student/(test*section))))
summary(aov(data=test.data, score~sex*test*section + Error(part/(test*sex))))
But we can’t do this:summary(aov(data=test.data,
score~sex*test*section + Error(student/(test*section)) + Error(part/(test*sex))))
Need a more flexible way to specify correlation structure.
ED VUL | UCSD Psychology
lme4::lmerinstall.packages(‘lme4’)
Syntax:
- Specify the various random effects yourself, manually!- (thing that varies randomly | grouping variable it varies
over)- E.g., (1|student) intercept varies randomly with student
(section|student) section effect varies across students
13
m = lmer(data=test.data, score~sex*test*section +(1|student)+(1|student:test)+(1|student:section)+(1|student:test:section)+(1|part)+(1|part:test)+(1|part:sex)+(1|part:test:sex)+(1|student:part))
ED VUL | UCSD Psychology
Random vs fixed effects: lay theoryWe have pre- and post-class exam scores for 10 males and 10 females. The exam is divided into a qualitative and quantitative section, with 5 parts in each section.So: 20 (students) x 20 (time x parts) = 400 measurements1) Do students improve from pre to post?2) Do females outperform males?3) Is qualitative or quantitative easier?4) Is improvement different for males and females? For qual. vs quant.?5) Does qual./quant. improve more? easier for males/females? 6) Is male/female difference different for qual/quant? Pre/Post?7) Does learning [pre vs post] alter any qual/quant disparity between
males/females?8) Are some parts easier or harder?9) Are some parts easier for males/females? Improve more?10) Do some parts improve more for males than females?11) Do some students do better or worse?12) Are some students better at qual/quant? Improve more?13) Do some students improve more on qual/quant?
ED VUL | UCSD Psychology
Fixed effects:- pre/post- male/female- qual/quantRandom effects:- student- part
Random vs fixed effects: lay theoryWe have pre- and post-class exam scores for 10 males and 10 females. The exam is divided into a qualitative and quantitative section, with 5 parts in each section.So: 20 (students) x 20 (time x parts) = 400 measurements
Factor levels are of general relevance.We care about offsets for specific levels.
Factor levels are specific to our study. They are random samples of possible levels that might occur in the world.Maybe we care about variance across levels in general, but not the actual offsets for specific levels.
ED VUL | UCSD Psychology
1) Do students improve from pre to post?2) Do females outperform males?3) Is qualitative or quantitative easier?4) Is improvement different for males and females? For
qual. vs quant.?5) Does qual./quant. improve more? easier for
males/females? 6) Is male/female difference different for qual/quant?
Pre/Post?7) Does learning [pre vs post] alter any qual/quant disparity
between males/females?8) Are some parts easier or harder?9) Are some parts easier for males/females? Improve more?10) Do some parts improve more for males than females?11) Do some students do better or worse?12) Are some students better at qual/quant? Improve more?13) Do some students improve more on qual/quant?
Random vs fixed effects: lay theoryWe have pre- and post-class exam scores for 10 males and 10 females. The exam is divided into a qualitative and quantitative section, with 5 parts in each section.So: 20 (students) x 20 (time x parts) = 400 measurements
We can classify these kinds of questions as:Fixed main effects.Fixed 2-way interactions.Fixed 3-way interactions.Random main effects.Random 2-way interactions.Random 3-way interactions.
ED VUL | UCSD Psychology
Complete / No / Partial Pooling• Complete pooling:
• No pooling:
• Partial pooling:
17
yi ~α +βXi + ei
yi =α j[i] +βXi + eiα j = anything
yi =α j[i] +βXi + eiα j ~ Normal(γ,σα )
We estimate an overall mean starting weight, and an overall weight/time slope. These are completely pooled over all people, disregarding individual differences.
We estimate an overall weight/time slope (complete pooling for slope). And individual-specific starting weights with no constraints on their relationship (no pooling for intercept).
We estimate an overall weight/time slope (complete pooling for slope). And individual-specific starting weights with the understanding that they are somewhat similar, and thus are modeled as Normally distributed around some common mean (partial pooling for intercept).
ED VUL | UCSD Psychology
Complete / No / Partial Pooling• Complete pooling:
Consequently: we get one intercept
• No pooling: Consequently: we get lots of intercepts that may differ a lotand be noisy.
• Partial pooling:Consequently: we get lots of intercepts, but they will be drawn toward the overall meanby an amount that scales with our uncertainty about them.
18
yi ~α +βXi + ei
yi =α j[i] +βXi + eiα j = anything
yi =α j[i] +βXi + eiα j ~ Normal(γ,σα )
ED VUL | UCSD Psychology
Complete / No / Partial Pooling
19
Gelman’s example: Our estimates of coefficients for counties with little data are noisy. In the no pooling analysis these counties look extreme because of the noise. In the partial pooling analysis, this noisiness yields uncertainty and coefficients are adjusted to look more like the overall mean.
No pooling Partial pooling
Sample size per county Sample size per county
ED VUL | UCSD Psychology
Random vs fixed effects: formalRandom effect coefficients are thought to come from some distribution. Their variation is treated as noise. They are ‘partially pooled’ so that they ‘shrink’ toward the average (0).
Fixed effects are parameters.There is no distribution for them.They are not pooled at all. They do not “shrink”.
ED VUL | UCSD Psychology
Back to the design we couldn’t handleWe have pre- and post-class exam scores for 10 males and 10 females. The exam is divided into a qualitative and quantitative section, with 5 parts in each section.m = lmer(data=test.data, score~sex*test*section + (1|student), REML = F)
Linear mixed model fit by maximum likelihood ['lmerMod']Formula: score ~ sex * test * section + (1 | student)
Data: test.dataAIC BIC logLik deviance df.resid
2991.907 3031.822 -1485.953 2971.907 390 Random effects:Groups Name Std.Dev.student (Intercept) 10.352 Residual 9.152 Number of obs: 400, groups: student, 20Fixed Effects:
(Intercept) sexmale69.18 -2.18
testpre sectionquantitative-3.64 6.80
sexmale:testpre sexmale:sectionquantitative0.42 -6.44
testpre:sectionquantitative sexmale:testpre:sectionquantitative8.62 -3.14
Here we allow the intercept to vary with student: +(1|student)
ED VUL | UCSD Psychology
Back to the design we couldn’t handleWe have pre- and post-class exam scores for 10 males and 10 females. The exam is divided into a qualitative and quantitative section, with 5 parts in each section.m = lmer(data=test.data, score~sex*test*section +(1|student)+(1|student:test)+(1|student:section)+(1|student:test:section))
This is the full “student” random effect structure.
m = lmer(data=test.data, score~sex*test*section +(1|part)+(1|part:test)+(1|part:sex)+(1|part:test:sex))
This is the full “part” random effect structure.
m = lmer(data=test.data, score~sex*test*section +(1|student)+(1|student:test)+(1|student:section)+(1|student:test:section)+(1|part)+(1|part:test)+(1|part:sex)+(1|part:test:sex)+(1|student:part))
This is the full random effect structure of the design.
ED VUL | UCSD Psychology
Linear Mixed Effects AnalysisWe have pre- and post-class exam scores for 10 males and 10 females. The exam is divided into a qualitative and quantitative section, with 5 parts in each section.m0 = lmer(data=test.data, score~1
+ (1|student) + (1|part) + (1|student:test) + (1|student:section) + (1|part:test) + (1|part:sex) + (1|part:sex:test) + (1|student:test:section)+ (1|student:part), REML = FALSE)
This is the exhaustive list of random effects that might exist in our data.
This exhaustive strategy makes sense for such a factorial design with random intercepts capturing sources of correlated error in our data – to get the by-item and by-subject ANOVAs together, but it’s not something you would do with complex real-world data with variable slopes, etc. In those cases the usually advised procedure is to start with the fixed effects, and add random effects so long as they improve the model.
ED VUL | UCSD Psychology
Linear Mixed Effects AnalysisWe have pre- and post-class exam scores for 10 males and 10 females. The exam is divided into a qualitative and quantitative section, with 5 parts in each section.
m0 = lmer(data=test.data, score~1+ (1|student) + (1|part) + (1|student:test) + (1|student:section) + (1|part:test) + (1|part:sex) + (1|part:sex:test) + (1|student:test:section) + (1|student:part), REML = FALSE)
mF = lmer(data=test.data, score~sex*test*section+ (1|student) + (1|part) + (1|student:test) + (1|student:section) + (1|part:test) + (1|part:sex) + (1|part:sex:test) + (1|student:test:section) + (1|student:part), REML = FALSE)
This is a null model that includes only the random effects, no fixed effects of interest.
This is the full model including all the fixed effects.
anova(m0,mF)
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq) m0 11 1494.4 1538.3 -736.21 1472.4 mF 18 1440.4 1512.3 -702.22 1404.4 67.977 7 3.783e-12 ***
The full model is significantly better than null, so something is going on with our fixed effects.
ED VUL | UCSD Psychology
Interpreting random effect SDs.We have pre- and post-class exam scores for 10 males and 10 females. The exam is divided into a qualitative and quantitative section, with 5 parts in each section.
summary(m0)
Random effects: Groups Name Std.Dev.student:part (Int) 0.88364student:test:section (Int) 0.36681part:sex:test (Int) 0.95617student:section (Int) 0.71859student:test (Int) 0.01777part:sex (Int) 2.94028part:test (Int) 2.54545student (Int) 11.08337part (Int) 9.27407Residual 0.60322
Let’s look at the random effects (errors) to see what happened.
These are partitioning all the variation of our data into independent sources that add up to yield the full covariance. The SDs tell us how much variation is attributable to each source.
Can I test if these reflect significantly non-zero var?- Why do it? They reflect the structure of your
experiment, not really something worth testing.- Better to just compare the amount of variation in
one source to another (e.g. students vs residuals)But I really want to!- You could drop one you are interested in from
the model, and see if a model that includes it is better (AIC, or LRT)
- You could extract coefficients with ranef(), their standard errors with se.ranef() then calculate X=sum((ranef/se.ranef)^2) and compare to chi-squared distribution… Not really common.
ED VUL | UCSD Psychology
Interpreting random effect SDs.
26
We have pre- and post-class exam scores for 10 males and 10 females. The exam is divided into a qualitative and quantitative section, with 5 parts in each section.
summary(m0) summary(mF)
Random effects: Groups Name Std.Dev.student:part (Int) 0.88364student:test:section (Int) 0.36681part:sex:test (Int) 0.95617student:section (Int) 0.71859student:test (Int) 0.01777part:sex (Int) 2.94028part:test (Int) 2.54545student (Int) 11.08337part (Int) 9.27407Residual 0.60322
Random effects:Groups Name Std.Dev.student:part (Int) 0.88375student:test:section (Int) 0.34807part:sex:test (Int) 0.28981student:section (Int) 0.70360student:test (Int) 0.04714part:sex (Int) 0.80917part:test (Int) 0.85788student (Int) 10.69397part (Int) 8.95206Residual 0.60474
Let’s look at the random effects (error) SDs to see what happened.
SDs reduced by a large proportion tell us that fixed effects explained that variation:part:test, part:sex, part:sex:test
Which fixed effects could account for this reduction?
ED VUL | UCSD Psychology
Random effect SDs and residuals
summary(aov(data=test.data, score~sex*test*section + Error(student/(test*section))))
Error: studentDf Sum Sq Mean Sq F value Pr(>F)
sex 1 3570 3570 1.443 0.245Residuals 18 44539 2474
Error: student:testDf Sum Sq Mean Sq F value Pr(>F)
test 1 0.90 0.90 0.879 0.361 sex:test 1 33.06 33.06 32.195 2.21e-05 ***Residuals 18 18.48 1.03
Error: student:sectionDf Sum Sq Mean Sq F value Pr(>F)
section 1 5048 5048 661.6 1.20e-15 ***sex:section 1 1604 1604 210.2 2.27e-11 ***Residuals 18 137 8
Error: student:test:sectionDf Sum Sq Mean Sq F value Pr(>F)
test:section 1 1242.6 1242.6 1238.1 < 2e-16 ***sex:test:section 1 61.6 61.6 61.4 3.29e-07 ***Residuals 18 18.1 1.0
Error: WithinDf Sum Sq Mean Sq F value Pr(>F)
Residuals 320 31652 98.91
summary(aov(data=test.data, score~sex*test*section + Error(part/(test*sex))))
Error: partDf Sum Sq Mean Sq F value Pr(>F)
section 1 5048 5048 1.302 0.287Residuals 8 31014 3877
Error: part:testDf Sum Sq Mean Sq F value Pr(>F)
test 1 0.9 0.9 0.046 0.836 test:section 1 1242.6 1242.6 62.891 4.65e-05 ***Residuals 8 158.1 19.8
Error: part:sexDf Sum Sq Mean Sq F value Pr(>F)
sex 1 3570 3570 205.83 5.44e-07 ***sex:section 1 1604 1604 92.48 1.14e-05 ***Residuals 8 139 17
Error: part:test:sexDf Sum Sq Mean Sq F value Pr(>F)
sex:test 1 33.06 33.06 23.53 0.001270 ** sex:test:section 1 61.62 61.62 43.86 0.000165 ***Residuals 8 11.24 1.41
Error: WithinDf Sum Sq Mean Sq F value Pr(>F)
Residuals 360 45042 125.1
These give us estimates roughly comparable to the random effects:SD(students) = 10.7 ~= sqrt(2474/20)=11.18 ~= sqrt(125)=11.12SD(parts) = 8.95 ~= sqrt(3877/40)=9.84 ~= sqrt(98.91)=9.94However, even these are higher in aov, because we have more error terms to partition the variance in the full mixed model, and because we have partial pooling to draw estimates closer together. This is even more true for the other residual terms.
Subject analysis (pooling over parts)Ignores within-section among-part variation, most relevant for assessing differences between sections.
Item analysis (pooling over students)Ignores within-sex among-student variation, most relevant for assessing differences between sexes.
ED VUL | UCSD Psychology
Interpreting fixed effects.We have pre- and post-class exam scores for 10 males and 10 females. The exam is divided into a qualitative and quantitative section, with 5 parts in each section.
Caution! It’s contentious whether it is sensible to get p values from those t values!
In general, we assess significance by either:- Dropping a coefficient from the full
model, and comparing it to full via LRT (makes sense in cases that are not a factorial experiment)
- Adopting a sequential ANOVA-like procedure of adding coefficients in sequence, and assessing whether they improve the model.
summary(mF)
Fixed effects:Estimate Std. Error t value
(Intercept) 69.1800 5.2767 13.110sexmale -2.1800 4.8309 -0.451testpre -3.6400 0.6060 -6.006sectionquantitative 6.8000 5.7284 1.187sexmale:testpre 0.4200 0.3818 1.100sexmale:sectionquantitative -6.4400 0.9640 -6.681testpre:sectionquantitative 8.6200 0.8565 10.064sexmale:testpre:sectionquantitative -3.1400 0.5384 -5.833
These are the various fixed effect coefficients. In this case, because we have a 2x2x2 fixed design, we have one coefficient per main-effect/interaction.
What do these coefficients mean?
ED VUL | UCSD Psychology
A sequential ANOVA approach.m0 = lmer(data=test.data, score~1 + (1|student) + (1|part) + (1|student:test) + (1|student:section) + (1|part:test) + (1|part:sex)
+ (1|part:sex:test) + (1|student:test:section)+(1|student:part), REML=FALSE)m0.S = lmer(data=test.data, score~sex + (1|student) + (1|part) + (1|student:test) + (1|student:section) + (1|part:test)
+ (1|part:sex) + (1|part:sex:test) + (1|student:test:section)+(1|student:part), REML=FALSE)m0.ST = lmer(data=test.data, score~sex+test + (1|student) + (1|part) + (1|student:test) + (1|student:section) + (1|part:test)
+ (1|part:sex) + (1|part:sex:test) + (1|student:test:section)+(1|student:part), REML=FALSE)m0.STS = lmer(data=test.data, score~sex+test+section + (1|student) + (1|part) + (1|student:test) + (1|student:section)
+ (1|part:test) + (1|part:sex) + (1|part:sex:test) + (1|student:test:section)+(1|student:part), REML=FALSE)m0.STS.I1 = lmer(data=test.data, score~sex+test+section+test:section + (1|student) + (1|part) + (1|student:test)
+ (1|student:section) + (1|part:test) + (1|part:sex) + (1|part:sex:test) + (1|student:test:section)+(1|student:part), REML=FALSE)
m0.STS.I2 = lmer(data=test.data, score~sex+test+section+test:section+sex:test + (1|student) + (1|part) + (1|student:test) + (1|student:section) + (1|part:test) + (1|part:sex) + (1|part:sex:test) + (1|student:test:section)+(1|student:part), REML=FALSE)
m0.STS.I3 = lmer(data=test.data, score~sex+test+section+test:section+sex:test+sex:section + (1|student) + (1|part) + (1|student:test) + (1|student:section) + (1|part:test) + (1|part:sex) + (1|part:sex:test) + (1|student:test:section)+(1|student:part), REML=FALSE)
mF = lmer(data=test.data, score~sex+test+section+sex:test+sex:section+test:section+sex:test:section + (1|student) + (1|part) + (1|student:test) + (1|student:section) + (1|part:test) + (1|part:sex) + (1|part:sex:test) + (1|student:test:section)+(1|student:part), REML=FALSE)
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq) m0 11 1494.4 1538.3 -736.21 1472.4 m0.S 12 1496.4 1544.3 -736.20 1472.4 0.0065 1 0.93594 m0.ST 13 1497.0 1548.9 -735.51 1471.0 1.3915 1 0.23815 m0.STS 14 1497.6 1553.5 -734.78 1469.6 1.4457 1 0.22922 m0.STS.I1 15 1477.9 1537.7 -723.93 1447.9 21.7012 1 3.186e-06 ***m0.STS.I2 16 1476.0 1539.8 -721.98 1444.0 3.9105 1 0.04799 * m0.STS.I3 17 1455.6 1523.5 -710.82 1421.6 22.3145 1 2.315e-06 ***mF 18 1440.4 1512.3 -702.22 1404.4 17.2073 1 3.351e-05 ***
So we made models of ever-increasing complexity, and did sequential nested comparisons.This is the sort of thing that the ANOVA command does for us for linear models.
This is not particularly advisable in LMM analyses that are more naturalistic, less factorial, may involve variable slopes, etc. In those cases it is generally advisable to start with fixed effects, then add random effects, with the goal of building a good model (not ascertaining significance!)
0.+ sex
. + test.+ section
.+ test:section.+ sex:test
.+ sex:section.+ sex:test:section
ED VUL | UCSD Psychology
Mixed model vs subject/itemsummary(aov(data=test.data, score~sex*test*section
+ Error(student/(test*section))))
Error: studentDf Sum Sq Mean Sq F value Pr(>F)
sex 1 3570 3570 1.443 0.245Residuals 18 44539 2474
Error: student:testDf Sum Sq Mean Sq F value Pr(>F)
test 1 0.90 0.90 0.879 0.361 sex:test 1 33.06 33.06 32.195 2.21e-05 ***Residuals 18 18.48 1.03
Error: student:sectionDf Sum Sq Mean Sq F value Pr(>F)
section 1 5048 5048 661.6 1.20e-15 ***sex:section 1 1604 1604 210.2 2.27e-11 ***Residuals 18 137 8
Error: student:test:sectionDf Sum Sq Mean Sq F value Pr(>F)
test:section 1 1242.6 1242.6 1238.1 < 2e-16 ***sex:test:section 1 61.6 61.6 61.4 3.29e-07 ***Residuals 18 18.1 1.0
Error: WithinDf Sum Sq Mean Sq F value Pr(>F)
Residuals 320 31652 98.91
summary(aov(data=test.data, score~sex*test*section + Error(part/(test*sex))))
Error: partDf Sum Sq Mean Sq F value Pr(>F)
section 1 5048 5048 1.302 0.287Residuals 8 31014 3877
Error: part:testDf Sum Sq Mean Sq F value Pr(>F)
test 1 0.9 0.9 0.046 0.836 test:section 1 1242.6 1242.6 62.891 4.65e-05 ***Residuals 8 158.1 19.8
Error: part:sexDf Sum Sq Mean Sq F value Pr(>F)
sex 1 3570 3570 205.83 5.44e-07 ***sex:section 1 1604 1604 92.48 1.14e-05 ***Residuals 8 139 17
Error: part:test:sexDf Sum Sq Mean Sq F value Pr(>F)
sex:test 1 33.06 33.06 23.53 0.001270 ** sex:test:section 1 61.62 61.62 43.86 0.000165 ***Residuals 8 11.24 1.41
Error: WithinDf Sum Sq Mean Sq F value Pr(>F)
Residuals 360 45042 125.1
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq) m0 11 1494.4 1538.3 -736.21 1472.4 m0.S 12 1496.4 1544.3 -736.20 1472.4 0.0065 1 0.93594 m0.ST 13 1497.0 1548.9 -735.51 1471.0 1.3915 1 0.23815 m0.STS 14 1497.6 1553.5 -734.78 1469.6 1.4457 1 0.22922 m0.STS.I1 15 1477.9 1537.7 -723.93 1447.9 21.7012 1 3.186e-06 ***m0.STS.I2 16 1476.0 1539.8 -721.98 1444.0 3.9105 1 0.04799 * m0.STS.I3 17 1455.6 1523.5 -710.82 1421.6 22.3145 1 2.315e-06 ***mF 18 1440.4 1512.3 -702.22 1404.4 17.2073 1 3.351e-05 ***
Mixed model gives us the worst of both by-subject and by-item analyses, by appropriately recognizing that here, part-variance is relevant for section, and subject-variance relevant for sex.
0.+ sex
. + test.+ section
.+ test:section.+ sex:test
.+ sex:section.+ sex:test:section
ED VUL | UCSD Psychology
Crossed random effects.
• Taking into account only one random effect (separating subject and item analyses) gives us incorrect answers.
• We need to take both random effects into account jointly.• lmer() allows us to add many random effects and their
interactions.• For such factorial designs, I favor a sequential model-building
comparison:– Include all random effects from the design.– Add explanatory variables sequentially, and do nested model
comparison via LRT in anova() – like regular Type I ANOVA
ED VUL | UCSD Psychology
What kind of random effect structure?
M = lmer(data=test.data, score~sex*test*section+ (test*section|student) + (sex*test|part) + (1|student:part)) )
M = lmer(data=test.data, score~sex*test*section+(1|student)+(1|part)+(1|student:test)+(1|student:section)+(1|part:test)+(1|part:sex)+(1|part:sex:test)+(1|student:test:section)+(1|student:part) )
This is a dummy coded random effect error structure.It will only work for categorical factors. It makes a few assumptions that may be violated. It is most similar to what the ‘repeated measures’ aov() command does. It will be anti-conservative in a few particular cases with odd covariances for the subject:X interactions.
This is the full “varying intercept and varying slope” random effect structure. Folks who think deeply about these things believe this to be the most generally effective specification. It uses quite a few more parameters to capture the (more flexible) random effect covariance structure. It estimates more parameters and is sometimes harder to fit.
In practice: you may run into estimation problems before you’ve managed to add the full random effect structure. You probably should avoid such highly factorial crossed, structured designs. (If you keep the design simple, the latter specification is better).
ED VUL | UCSD Psychology
Other factors calling for LMM• Repeated measures designs that are unbalanced
(e.g., due to missing data).
• Need to:– Account for lower-level variation when estimating higher-level
coefficients.– Model variation of lower-level coefficients between groups.– Estimating coefficients for specific groups that might have little
data by partially pooling across all groups.– Make predictions at different grouping levels.
ED VUL | UCSD Psychology 34
ED VUL | UCSD Psychology
Data structure decomposition
35
s.idA[s.id] B[s.id] ... C1[s.id]... Y[s.id]
s.id SEX[s.id] MAJOR[s.id] IQ[s.id]SCORE[s.id]JohnDoe m psychology 110 60JaneSmith f business 125 75SueChang f psychology 105 40...
eA[A] eB[B] eAB[A,B]... eC ...
eSEX[]m -1f +1
eMAJOR[]psychology -1business +3cogsci -0.5linguistics -1.5
eIQ0.13
ED VUL | UCSD Psychology
Single table decomposition: LMs.idA[s.id] B[s.id] ... C1[s.id]... Y[s.id]
eA[A] eB[B] eAB[A,B]... eC ...
• Every t-test/ANOVA/ANCOVA (LM/GLM) data structure can be reduced to this single-table “3rd normal form”.
• Each measurement (Y) is uniquely associated with a particular subject/unit.
• Each subject may have infinitely many other properties (like factor level category, covariate values, etc.)
ED VUL | UCSD Psychology
Repeated and mixed-design ANOVAs.id sA[s.id]sB[s.id]... sC1[s.id] ...
eSA[A] eSB[B] eSAB[A,B] ... eSC1...
m.id s.id[m.id] Q[m.id] R[m.id] ... W[m.id] Y[m.id]
eMQ[Q] eMR[R] eQR[Q,R]... eSW ...
• Data structures for repeated measures and mixed-design ANOVA can be reduced to such a 2-table form*
• Every measurement is associated with a particular subject, and might have some measurement categorization scheme (within-subject factors).
• In the course of factoring into error strata, we effectively estimate lots of “random effects” Es[s.id] EQs[s.id,Q]
ED VUL | UCSD Psychology
Predictors at many levels
38
Private Public Private PublicEast Coast West Coast
SchoolsSchoolA,Private, East, $1000/studentSchoolB, Public, East, $2500/student...
Classes (in Schools)
SchoolB
Textbook1 Textbook2 Textbook1 Textbook2Written Exam Oral Exam
SchoolB,Class.B.1, Textbook1, Written, 24 studentsSchoolB, Class.B.2, Textbook1, Oral, 32 students...
Class.B.1Students(in Classes)
ED VUL | UCSD Psychology
Predictors at many levelsWe can manage this design by partitioning variance into strata and proceeding with a mixed-design ANOVA.
However, repeated measures ANOVA can’t handle all the possible structures we may need to deal with.
Mixed-design ANOVA will fail us if we…- add explanatory factors to students- consider crossed random effects (substitute teachers)- add other levels (districts; …, teachers within schools, classes within
teachers,…; etc.)
ED VUL | UCSD Psychology
Something we can’t do…
s.Id d.id[s.id] sG[s.id]sH[s.id]...
eSG[G] eSH[H] eSGH[G,H] ...
m.id s.id[m.id] Q[m.id] R[m.id] ... W[m.id] Y[m.id]
eMQ[Q] eMR[R] eQR[Q,R]... eSW ...
More than two explanatory levels.• E.g., students (with explanatory variables), in schools (with
explanatory variables), in districts (with explanatory variables)
d.iddA[s.id]dB[d.id]...
eDA[A] eDB[B] eDAB[A,B] ...
ED VUL | UCSD Psychology
Something we can’t do…s.id sG[s.id]sH[s.id]...
eSG[G] eSH[H] eSGH[G,H] ...
m.Id s.id[m.id] i.id[m.id] Q[m.id] ... Y[m.id]
eMQ[Q] eMR[R] eQR[Q,R]... eSW ...
Crossed “random effects”• E.g., each measurement is associated with a particular
student (with explanatory variables), and a particular item (with explanatory variables)
i.id iA[s.id]iB[d.id]...
eIA[A] eIB[B] eIAB[A,B] ...
ED VUL | UCSD Psychology
Data structures in need of LMMNormal LM/GLM structure: each unit is uniquely associated with a measurement.
Repeated measures structure:Explanatory variables at only two levels. (e.g., “between subject” and “within subject”).
More levels!Explanatory variables at more than two levels. (e.g., classes in schools in districts)
Crossed random effects!(e.g., subjects with explanatory variables crossed with items with explanatory variables)
Stuff we can’t do appropriately without mixed-effect models.
ED VUL | UCSD Psychology
MotivationWe have pre- and post-class exam scores for 10 males and 10 females. The exam is divided into a qualitative and quantitative section, with 5 parts in each section.So: 20 (students) x 20 (time x parts) = 400 measurements
s.idSEX[s.id]
m.idt.id[m.id] p.id[m.id] SCORE[m.id]
p.idSECTION[p.id]
t.ids.id[t.id] TIME[t.id]
ED VUL | UCSD Psychology
MotivationWe have pre- and post-class exam scores for 10 males and 10 females. The exam is divided into a qualitative and quantitative section, with 5 parts in each section.So: 20 (students) x 20 (time x parts) = 400 measurements
female malepr
e
post
qualitativequantitative
parts
stud
ents