Repeated Measures ANOVA (RM ANOVA) and Mixed Effects … · RM ANOVA: Greenhouse-Geisser / Huynh-Feldt Epsilon It is not uncommon that repeated measures data violate the compound

Post on 26-May-2019

237 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

Lukas Meier (most material based on lecture notes and slides from H.R. Roth)

Repeated Measures ANOVA (RM ANOVA)

and Mixed Effects Models

Repeated Measures ANOVA (RM ANOVA)

Now we want to model everything in “one go”.

We use (multiple) ANOVA approaches. Split-plot model Mixed effects models

1

RM ANOVA: Growth Curves

We have three factors: sex (2 levels) age (4 levels) person (27 levels)

We treat age as a categorical variable. This gives us maximal flexibility as we

do not have to care about the functional form of the age effect.

We set up a model of the form

𝑌𝑖𝑗𝑘 = 𝜇 + 𝛼𝑖 + 𝛿𝑗(𝑖) + 𝛽𝑘 + 𝛼𝛽 𝑖𝑘 + 휀𝑖𝑗𝑘

𝑖: sex, 𝑗: person, 𝑘: time-point

2

effect of sexdistance effect of time-

point 𝑘interaction

time × sex

deviation (or

error) of

person 𝑗

boy girl

16

20

24

28

32

8 10 12 14 8 10 12 14

age

dis

tance

person

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

RM ANOVA: Growth Curves

Person is a block factor.

We cannot have both sex and person as fixed effects in the model because

this model would not be identifiable anymore.

Here it seems quite natural to treat person as a so called random effect (better:

think of a random error per person). That is, we assume 𝛿𝑗(𝑖) ∼ 𝑁(0, 𝜎𝛿2).

This is nothing else than the split-plot model that some have seen in the

ANOVA class. It is not a split-plot design, because age was not randomized!

We have

whole plot = person split plot = age

3

12 12 12

8

10

8

10

8

10

8

10 …

14 141414

12 12 12

8

10

8

10

8

10

8

10 …

14 141414

12

Boy 1 Girl 1Boy 2 Girl 2

12not randomized

here

RM ANOVA: Growth Curves

We therefore have a so called mixed effects model (containing random and

fixed effects).

We can fit this in R with the lmer function in package lmerTest.

Note that the denominator degrees of freedom for sex are only 25 as we only

have 27 observations on the whole-plot level (patients!).

You can think of doing a two-sample 𝑡-test with two groups having 16 and 11

observations, respectively: 25 = 16 + 11 – 2.

4

RM ANOVA: Growth Curves

Going back to the original questions: Are the profiles of girls and boys parallel? check interaction

Does dental distance change over time? check effect of age

Is the average level of boys and girls the same or not? check effect of sex

5

RM ANOVA: Induced Correlation Structure

Let us have a closer look at the model

𝑌𝑖𝑗𝑘 = 𝜇 + 𝛼𝑖 + 𝛿𝑗(𝑖) + 𝛽𝑘 + 𝛼𝛽 𝑖𝑘 + 휀𝑖𝑗𝑘

with 𝛿𝑗(𝑖) ∼ 𝑁(0, 𝜎𝛿2) and 휀𝑖𝑗𝑘 ∼ 𝑁(0, 𝜎2).

We get Var 𝑌𝑖𝑗𝑘 = 𝜎𝛿

2 + 𝜎2

Cor 𝑌𝑖𝑗𝑘 , 𝑌𝑖𝑗𝑘∗ = 𝜎𝛿2/(𝜎𝛿

2 + 𝜎2) if 𝑘 ≠ 𝑘∗ (same person but different time-points)

Cor 𝑌𝑖𝑗𝑘 , 𝑌𝑖∗𝑗∗𝑘∗ = 0 if 𝑖 ≠ 𝑖∗ or 𝑗 ≠ 𝑗∗ (different persons)

Observations of the same person are correlated. The correlation is constant,

i.e., does not depend on how far away the time-points lie (this is not always a

meaningful assumption for growth curves!).

Observations from different persons are independent.

6

RM ANOVA: Induced Correlation Structure

Hence, we have the following correlation matrix per person

1 𝜌 𝜌 𝜌𝜌 1 𝜌 𝜌𝜌 𝜌 1 𝜌𝜌 𝜌 𝜌 1

where 𝜌 = 𝜎𝛿2/(𝜎𝛿

2 + 𝜎2) is the so-called intra-class correlation.

This correlation structure is called compound symmetry.

Here, the empirical correlation matrix is

7

RM ANOVA: Greenhouse-Geisser / Huynh-Feldt Epsilon

It is not uncommon that repeated measures data violate the compound

symmetry assumption.

There are measures which describe the deviation from the compound

symmetry model.

Greenhouse-Geisser Epsilon: 휀GG (rather conservative) Huynh-Feldt Epsilon: 휀HF

We have 휀GG ≤ 휀HF ≤ 1, where “= 1” means no deviation.

Correction is being performed by multiplying both the numerator and the

denominator degrees of freedom of the 𝐹-distribution with 휀GG (or 휀HF).

Program “by hand” or use function Anova in package car.

This only affects within-subjects factors!

Alternative: Use model nesting approach? (structured vs. unstructured

approach?)8

Example: Pulse (Spector, 1987)

3 groups of 8 patients each Drug 1 Drug 2 Control (placebo)

Measure pulse at 5, 10, 15 and 20 minutes after taking medication.

9

control drug1 drug2

60

70

80

90

5 10 15 20 5 10 15 20 5 10 15 20

period

y

subject

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

Example: Pulse

We treat time as a categorical predictor to be flexible enough.

We use the model

𝑌𝑖𝑗𝑘 = 𝜇 + 𝛼𝑖 + 𝛽𝑘 + 𝛼𝛽 𝑖𝑘 + 휀𝑖𝑗𝑘

We use this example to illustrate how one can use other correlation structures.

The random effects (errors) at first sight disappeared!

They are now fully integrated into the error term 휀𝑖𝑗 ∼ 𝑁(0, Σ) ∈ ℝ4

When dealing with longitudinal data it is quite common to use an

autoregressive correlation structure.

10

effect of drugpulseeffect of

time-point 𝑘interaction

drug × time

error term

(correlated)

Example: Pulse

An 𝑨𝑹(𝟏) structure would be (exponential decay)

Σ = 𝜎2

1 𝜙 𝜙2 𝜙3

𝜙 1 𝜙 𝜙2

𝜙2 𝜙 1 𝜙

𝜙3 𝜙2 𝜙 1

The compound symmetry model would be (slightly other notation)

Σ = 𝜎2

1 𝜌 𝜌 𝜌𝜌 1 𝜌 𝜌𝜌 𝜌 1 𝜌𝜌 𝜌 𝜌 1

Many more choices possible.

11

Example: Pulse

For such models, the older package nlme is more user friendly than lme4 or

lmerTest.

We use the function gls (generalized least sq.) or lme from package nlme.

12

Random Intercept / Random Slope Model: Growth Curves

Due to the “nice” profiles we can also try to model the growth curves with a

linear regression model approach (see also summary statistic approach).

We us a random intercept / random slope model:

𝑌𝑖𝑗𝑘 = 𝜇 + 𝛼𝑖 + 𝛿𝑗(𝑖) + (𝛽𝑖 + 𝛽𝑗 𝑖 )𝑥𝑖𝑗𝑘 + 휀𝑖𝑗𝑘

Here: 𝑥𝑖𝑗𝑘 ∈ {8, 10, 12 , 14} is time (as a continuous predictor variable).

It is natural to center time.

That means we use −3,−1, 1 , 3 instead of 8, 10, 12 , 14 , otherwise intercept

and slope estimates will always be correlated.

13

effect of sexdistanceslope in

group 𝑖

person specific deviation from

population slope: 𝑁(0, 𝜎𝛽2).

person specific

deviation from

population

intercept : 𝑁(0, 𝜎𝛿2).

boy girl

16

20

24

28

32

8 10 12 14 8 10 12 14

age

dis

tance

person

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

Random Intercept / Random Slope Model: Growth Curves

We can model the two random effects 𝛿𝑗(𝑖) and 𝛽𝑗(𝑖) as independent random

variables or we can allow any correlation structure.

Most general model is multivariate normal with unspecified covariance

matrix:

(𝛿𝑗 𝑖 , 𝛽𝑗(𝑖)) ∼ 𝑁(0, Σ) ∈ ℝ2,

where Σ is a 2 × 2 covariance matrix.

14

Random Intercept / Random Slope Model: Growth Curves

15

Model with arbitrary correlation structure:

ො𝜎𝛽

ො𝜎

ො𝜎𝛿

Check if model was

interpreted correctly

Corr(𝛿𝑗 𝑖 , 𝛽𝑗 𝑖 )

Random Intercept / Random Slope Model: Growth Curves

16

Model with independence assumption:

ො𝜎𝛿

ො𝜎

ො𝜎𝛽 Check if model was

interpreted correctly

Random Intercept / Random Slope Model: Growth Curves

We can compare the nested models with the anova command.

Smaller model (independent intercept and slope) seems to be complex enough.

However, results are very close anyway.

Compare results with summary statistic approach!

Actually, we can further simplify this model (see R-Code).

17

Random Intercept / Random Slope Model: Additional Insights …

For time points 𝑡𝑘 the expressions for the variances and covariances of the

observations 𝑌𝑖𝑗𝑘 are

Var 𝑌𝑖𝑗𝑘 = 𝜎𝛿2 + 2𝑡𝑘Cov 𝛿𝑗 𝑖 , 𝛽𝑗 𝑖 + 𝑡𝑘

2𝜎𝛽2 + 𝜎2

Cov 𝑌𝑖𝑗𝑘 , 𝑌𝑖𝑗𝑘∗ = 𝜎𝛿2 + 𝑡𝑘 + 𝑡𝑘∗ Cov 𝛿𝑗 𝑖 , 𝛽𝑗 𝑖 + 𝑡𝑘𝑡𝑘∗ 𝜎𝛽

2

This means that for time points

𝑡𝑘 > −Cov 𝛿𝑗 𝑖 ,𝛽𝑗 𝑖

𝜎𝛽2

the variance Var 𝑌𝑖𝑗𝑘 increases (and decreases before that time point).

Of course the data does not always follow this assumption.

See also R-File.

18

Example: Rats (Box, 1950)

19

Weight gain of rats under three different experimental conditions.

27 rats were randomly assigned to three treatment groups:(1) Control(2) Drinking water with thyroxin(3) Drinking water with thiourocil

Each animal was kept separately (important!).

Weight was recorded every week, the first time at the beginning of the

experiment:

Week 0 Week 1 Week 2 Week 3 Week 4

Example: Rats

20

Example: Rats

See R-File.

21

top related