Top Banner
university of copenhagen department of biostatistics Faculty of Health Sciences Basics of repeated measurements Analysis of repeated measurements, NFA 2016 Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics, University of Copenhagen university of copenhagen department of biostatistics Contents Basic concepts for correlated and clusted data Descriptive statistics The multivariate normal distribution Analysis of balanced longitudinal data Baseline-follow up studies 2 / 72 university of copenhagen department of biostatistics Outline What are repeated measurements? Basics of longitudinal data (FLW chapters 1 & 2) The multivariate normal distribution Analysis of response profiles (FLW chapters 3 & 5) SAS proc mixed (FLW section 5.8) Baseline adjustment (FLW section 5.6) Appendix: Supplementary SAS-code 3 / 72 university of copenhagen department of biostatistics What are repeated measurents? Repeated measurements refer to data where the same outcome has been measured several times, in different situations or at different spots, on the same subjects. Special case: longitudinal means repeatedly over time. 4 / 72
18

Basics of repeated measurementsstaff.pubhealth.ku.dk/~jufo/courses/nfa2016/basics2016...I Special case:longitudinalmeansrepeatedly over time. 4/72 university of copenhagen department

Apr 22, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Basics of repeated measurementsstaff.pubhealth.ku.dk/~jufo/courses/nfa2016/basics2016...I Special case:longitudinalmeansrepeatedly over time. 4/72 university of copenhagen department

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Faculty of Health Sciences

Basics of repeated measurementsAnalysis of repeated measurements, NFA 2016

Julie Lyng Forman & Lene Theil SkovgaardDepartment of Biostatistics, University of Copenhagen

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Contents

I Basic concepts for correlated and clusted data

I Descriptive statistics

I The multivariate normal distribution

I Analysis of balanced longitudinal data

I Baseline-follow up studies

2 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Outline

What are repeated measurements?

Basics of longitudinal data (FLW chapters 1 & 2)

The multivariate normal distribution

Analysis of response profiles (FLW chapters 3 & 5)

SAS proc mixed (FLW section 5.8)

Baseline adjustment (FLW section 5.6)

Appendix: Supplementary SAS-code?

3 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

What are repeated measurents?

Repeated measurements refer to data where the same outcome hasbeen measured several times, in different situations or at differentspots, on the same subjects.

I Special case: longitudinal means repeatedly over time.

4 / 72

Page 2: Basics of repeated measurementsstaff.pubhealth.ku.dk/~jufo/courses/nfa2016/basics2016...I Special case:longitudinalmeansrepeatedly over time. 4/72 university of copenhagen department

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

What is clustered data?

Repeated measurements are termed clustered data when the sameoutcome is measured on groups of subjects that are somehowrelated.

5 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Statistical analysis must account for repetitions

The usual assumption is that observations are independent.

If you have repeated or clustered measurements . . .I the assumption of independence is violated.

Ignoring the repetitions/clustering most often leads to:I p-values that are too small or too large.I confidence intervals that are too wide or too narrow.

It is wrong to analyse repeated measurements data with anordinary GLM or ANOVA model!!!

6 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Example: A pre-post studyAverage daily dietary intake for 10 women over 10 pre-menstrualand 10 post-menstrual days.

Subject Pre-menstrual Post-menstrual Difference1 5260 3910 13502 5470 4220 12503 5640 3885 17554 6180 5160 10205 6390 5645 7456 6515 4680 18357 6805 5265 15408 7515 5975 15409 7515 6790 72510 8230 6900 133011 8770 7335 1435

Mean 6753.6 5433.2 1320.5SD 1142.1 1216.8 366.7

D.G. Altman: Practical Statistics for Medical Research, Section 9.57 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Paired data

The most simple example of clustered or repeated measurements.I Two replicates per subject or two subjects per cluster

Examples of paired data:I Same person with treatment and placebo.I A baseline and a follow up measurement.I Twin study.I Comparison of two measurement methods

or reliability of a measurement method.

Quantiative outcomes are analysed with the paired t-test.

8 / 72

Page 3: Basics of repeated measurementsstaff.pubhealth.ku.dk/~jufo/courses/nfa2016/basics2016...I Special case:longitudinalmeansrepeatedly over time. 4/72 university of copenhagen department

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Example: Paired vs unpaired comparison

To compare pre-menstrual and post-menstrual dietary intake.I Test H0 : µ1 = µ2.I Find a confidence interval for µ1 − µ2.

Note the very different results:

Analysis Estimate (95% CI) P-valuepaired t-test 1320 (1074;1567) 0.0000003two-sample t-test 1320 (271; 2370) P=0.01625

9 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

ExplanationThe two-sample t-test assumes the two samples are independent.But there is a strong dependence between pre- and post-intake forthe same woman (correlation 0.95, 95% CI: 0.83-0.99).

10 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Outline

What are repeated measurements?

Basics of longitudinal data (FLW chapters 1 & 2)

The multivariate normal distribution

Analysis of response profiles (FLW chapters 3 & 5)

SAS proc mixed (FLW section 5.8)

Baseline adjustment (FLW section 5.6)

Appendix: Supplementary SAS-code?

11 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Case: A baseline follow-up studyA randomized clinical trial was conducted to compare Eplerenonewith standard treatment of patients with chronic kidney disease.

Outcome: Augmentation index (aix), smaller is better.

Repeated measurements at:I Baseline,I after 12 weeks (safety),I after 24 weeks (end point).

Note: The study was planned with 37 subjects in each group, butonly 25 and 26, resp. could be treated within the time limit.

Boesby et al: Eplerenone Attenuates Pulse Wave Reflection in Chronic KidneyDisease Stage 3–4 - A Randomized Controlled Study, PLOS ONE 2013.12 / 72

Page 4: Basics of repeated measurementsstaff.pubhealth.ku.dk/~jufo/courses/nfa2016/basics2016...I Special case:longitudinalmeansrepeatedly over time. 4/72 university of copenhagen department

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Typical set-up for longitudinal measurements

Two or more groups of subjectsI Often receiving different treatmentsI Possibly randomised at baseline.

Longitudinal measurements, typically as a function ofI duration (of treatment or disease)I age

Do the time courses differ between the groups?

13 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Merits of longitudinal studies

In longitudinal studies measurements are taken repeatedly on thesame subjects over time.

I This allows us to study changes over time within subjectsand factors that influence these changes, e.g. treatment.

I By comparing each subjects responses at two or moreoccations we eliminate extraneuous but unavoidable sources ofvariabitlity among subjects. Thus we obtain more accurateestimates and more certain conclusions about changes overtime than in cross-sectional studies.

14 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Visualizing data: SpaghettiplotsGood for visual inspection because replicates are connected!

Note: Missing data due to failed measurements, side effects,relapse or other illness (missing data discussed further in lecture 4).15 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Why visualization is so important

Graphical description of the data is useful for:

I Exploratory data analysis and hypothesis generation.I Aiding interpretation of planned analyses.I Presentation or saying it in figures rather than in numbers.I Spotting outliers that could otherwise spoil your analysis.I Rough assessment of model assumptions such as normal

distribution or linear trend over time.

Note: Having a large dataset is no excuse for forgetting graphicaldescription. You can divide your data into subgroups or at leastlook at a random subsample.16 / 72

Page 5: Basics of repeated measurementsstaff.pubhealth.ku.dk/~jufo/courses/nfa2016/basics2016...I Special case:longitudinalmeansrepeatedly over time. 4/72 university of copenhagen department

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Balanced and complete data

In a planned study the times of measurements will usually be thesame for all subjects. We have a balanced design

In practice data is most often somewhat unbalanced due todrop-out, missed visits, failed measurements.

I In this case we say that data is incomplete.I But still the design is balanced.

Data from (retrospective) observational studies are most oftenunbalanced both by design and in practice.

Unbalanced desgins are treated in lecture 2.Missing data is treated in lecture 4.

17 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Outline

What are repeated measurements?

Basics of longitudinal data (FLW chapters 1 & 2)

The multivariate normal distribution

Analysis of response profiles (FLW chapters 3 & 5)

SAS proc mixed (FLW section 5.8)

Baseline adjustment (FLW section 5.6)

Appendix: Supplementary SAS-code?

18 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

The distribution of repeated outcomes

Repeated measurements are characterized by beingI mutually dependent or correlated.

We need to characterize their joint distribution.

Standard model for quantiative data: The multivariate normalI Location: mean-vectorI Variability: covariance-matrix

19 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

The multivariate normal distribution

Source: Wikipedia.20 / 72

Page 6: Basics of repeated measurementsstaff.pubhealth.ku.dk/~jufo/courses/nfa2016/basics2016...I Special case:longitudinalmeansrepeatedly over time. 4/72 university of copenhagen department

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

The multivariate normal distribution

Source: Wikipedia.21 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Eplerenone: Scatter plots

Left: Eplerenone. Right: Controls.

Better check of normal distribution: use residual diagnostics (lecture 4).22 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Eplerenone: Summary statistics

Means and std.devs for the three time points:trt=0

Simple Statistics

Variable N Mean Std Dev Sum Minimum Maximumaix0 24 24.64583 9.37559 591.50000 -2.50000 35.00000aix1 24 25.31250 10.60333 607.50000 8.00000 49.50000aix2 24 27.33333 8.70490 656.00000 8.50000 44.50000

trt=1Simple Statistics

Variable N Mean Std Dev Sum Minimum Maximumaix0 26 22.28846 11.11321 579.50000 -5.50000 43.00000aix1 24 19.93750 13.69966 478.50000 -16.50000 38.50000aix2 22 20.38636 11.43192 448.50000 -10.00000 39.00000

Note: this does not tell us any thing about the joint distribution.

23 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Eplerenone: Correlations

Recall: 0 for independence vs ±1 for perfect linear association.trt=0: Pearson Correlation Coefficients

Number of Observations

aix0 aix1 aix2aix0 1.00000 0.78707 0.76148

24 23 23aix1 0.78707 1.00000 0.79525

23 24 24aix2 0.76148 0.79525 1.00000

23 24 24

trt=1: Pearson Correlation CoefficientsNumber of Observations

aix0 aix1 aix2aix0 1.00000 0.67942 0.72694

26 24 22aix1 0.67942 1.00000 0.81741

24 24 22aix2 0.72694 0.81741 1.00000

22 22 22

Note: correlations can be misleading if data is not normal.24 / 72

Page 7: Basics of repeated measurementsstaff.pubhealth.ku.dk/~jufo/courses/nfa2016/basics2016...I Special case:longitudinalmeansrepeatedly over time. 4/72 university of copenhagen department

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Basic concepts: covariance and correlationBoth are used to describe the linear association between twovariables assumed to have a joint normal distribution.

I The covariance between two measurements is:

Cov(Y1,Y2) = E{(Y1 − µ1)(Y2 − µ2)}

. . . in squared units of the original measurements.

I The correlation between two measurements

Cor(Y1,Y2) = Cov(Y1,Y2)SD(Y1)SD(Y2)

. . . it has no units - interpretation is free of scale.

Note: SAS PROC MIXED and most other statistical softwaredisplay the covariances, not correlations, as default output.25 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Matrix notation

Covariances and correlations of the 3D (normal) distribution:

Cov =

σ21 σ12 σ13

σ21 σ22 σ23

σ31 σ32 σ23

, Cor =

1 ρ12 ρ13ρ21 1 ρ23ρ31 ρ32 1

NOTE:I Variances σ2

1, σ22, σ

23 along the diagonal in Cov.

I 1’s along the diagonal in Cor.I Both are symmetric σij = σji and ρij = ρji .I Note the relation ρij = σij/

√σ2

i · σ2j .

26 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

What if data is not normally distributed?

The usual assumption is that outcomes from the same subjectfollow a multivariate normal distribution.

But linear mixed models for repeated outcomes are robust.I If sample size is not too small.I If the distribution of the data is not too skewed.

So your data doesn’t have to be perfectly normal.

Highly skewed data should always be transformed.

Models for counts are treated in lecture 5.Models for binary outcomes are treated in lecture 6.

27 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Outline

What are repeated measurements?

Basics of longitudinal data (FLW chapters 1 & 2)

The multivariate normal distribution

Analysis of response profiles (FLW chapters 3 & 5)

SAS proc mixed (FLW section 5.8)

Baseline adjustment (FLW section 5.6)

Appendix: Supplementary SAS-code?

28 / 72

Page 8: Basics of repeated measurementsstaff.pubhealth.ku.dk/~jufo/courses/nfa2016/basics2016...I Special case:longitudinalmeansrepeatedly over time. 4/72 university of copenhagen department

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Analysis of response profilesComparison of change over n time points within g groups ofsubjects (e.g. different treatments).

I Similar to two-way ANOVA only with correlated data.I Covariates: group and time (both categorical)I Balanced design, but possibly incomplete data.I Do the groups evolve differently with time?

Interest is in the mean parameters (systematic effects)

group = Control group = Eplerenonetime=0 µ11 µ21time=12 µ12 µ22time=24 µ13 µ23

29 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Eplerenone: Averages over time

Seeming improvement over time with Eplerenone.I But what about statistical uncertainty?I We also need to consider the (co)variance . . .30 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Main hypothesis

Analysis of response profiles allows for testing a large number ofdifferent hypothesis about the mean parameters. Which hypothesesare relevant of course depend on the subject matter.

Example: The scientific hypothesis was that there would be apositive effect of Eplerenone compared to the standard treatmentat final follow up.

The relevant statistical nullhypothesis is:

H0: µ13 − µ11 = µ23 − µ21,

I.e. same change in means in the two groups at last follow-up.

31 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Eplerenone: results of analysis

Changes in mean AIX (%) since baseline estimated by theanalysis of response profiles

Week Control Eplerenone Difference P-value12 1.09 (-2.47;4.65) -0.86 (-4.38;2.66) -1.95 (-6.96;3.06) P=0.4424 3.09 (0.07;6.11) -0.51 (-3.56;2.53) -3.61 (-7.90;0.68) P=0.10

There is a seeming improvement at last follow-up with Eplerenonecompared to standard treatment with a mean difference in changein AIX of -3.61 (95% CI: -7.90 to 0.68, P = 0.10) .

Note: The difference between the treatments is not significant.

32 / 72

Page 9: Basics of repeated measurementsstaff.pubhealth.ku.dk/~jufo/courses/nfa2016/basics2016...I Special case:longitudinalmeansrepeatedly over time. 4/72 university of copenhagen department

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Merits of analysis of response profiles

We can use a linear mixed model (PROC MIXED in SAS) todescribe differences between groups at any time point or changesbetween any two time points (explanation follows).

Computationally this is an advantage compared to making manydifferent t-tests. Everything is computed at one go.

Linear mixed models handles data that are missing at randomoptimally whereas t-tests may be biased (more on this lecture 4).

There is a gain in statistical power when doing baseline adjustmentin the analysis of randomized studies (more on this later today).

33 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Linear mixed models (LMMs)We use linear mixed models to analyze quantitative repeatedmeasurements.

Systematic effects (means) are modeled similar to general linearmodels (GLM) including relevant explanatory variables such astime, treatment, age, gender, etc.

Additional specification of a model for the covariance is neededdue to the repeated measurements. We will consider many suchmodels either given in terms of

I So-called covariance pattern models for the residual covarianceI So-called random effects (e.g. in multi-level models)I Or a mixture of these for more complex data.

(More about linear mixed models and applications in lectures 2-4).34 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

The unstructured covariance

With a balanced design and few different time points we don’thave to make any specific assumptions about the covariance; Anunstructured covariance pattern model is assumed.

I One variance parameter for each time pointI One correlation parameter for each pair of time pointsI n + n(n−1)

2 parameters in total with n time points.

Usually all groups are assumed to have the same covariance, butthis assumption can be relaxed.

Note: Not feasible with many time points or groups.

35 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Two-way ANOVA type model for the means

Describe means for the six time-treatment combinations as:

group = Control group = Eplerenonetime=0 β1 β1 + β4time=12 β1 + β2 β1 + β2 + β4 + β5time=24 β1 + β3 β1 + β3 + β4 + β6

I Mean of standard treatment at baseline is reference (intercept)I Change over time with standard treatment (time estimates)I Difference between groups at baseline (group estimate)I Differences in time effects (interaction estimates)

36 / 72

Page 10: Basics of repeated measurementsstaff.pubhealth.ku.dk/~jufo/courses/nfa2016/basics2016...I Special case:longitudinalmeansrepeatedly over time. 4/72 university of copenhagen department

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

SAS output (program on slide 42)

MODEL aix = week treat week*treat / SOLUTION;

Effect week treat Estimate StdError DF t Value Pr > |t|Intercept 24.3431 2.0793 49.4 11.71 <.0001week 12 1.0887 1.7694 46.2 0.62 0.5414week 24 3.0895 1.4995 44.5 2.06 0.0452week 0 0 . . . .treat 1 -2.0547 2.8999 48.9 -0.71 0.4820treat 0 0 . . . .week*treat 12 1 -1.9493 2.4871 45.8 -0.78 0.4372week*treat 12 0 0 . . . .week*treat 24 1 -3.6078 2.1298 45.3 -1.69 0.0971week*treat 24 0 0 . . . .week*treat 0 1 0 . . . .week*treat 0 0 0 . . . .

Type 3 Tests of Fixed Effects

Num DenEffect DF DF F Value Pr > Fweek 2 44.5 0.99 0.3794treat 1 47 1.84 0.1817week*treat 2 44.5 1.43 0.2490

(Confidence intervals omitted due to lack of space)37 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

About the test of the group∗time-interaction

MODEL aix = time treat time*treat / SOLUTION CL;

By testing H0: No group*time-interaction we test thatI mean change over time is identical in all groups . . .I . . . at all follow-up times.

If we aim to show that there is a treatment effect we will get morepower by focusing on a specific time point;

I The time point where the difference is largest.

(A so-called one degree of freedom test, FLW section 5.5)

38 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Post hoc testingThat the group∗time interaction is significant indicate that there isa difference in changes over time between the groups, but

I not between which time points.I not between which groups, if there are more than two.

To find out where differences occur we have to look at estimateddifferences between specific groups at specific time points.

I The total number of comparisons may become large inparticular if there are many time points (or several groups).

I Shouldn’t we adjust for multiple testing?

Learn to do this in P.H.Wesfall, R.D.Tobias & R.D.Wolfinger:Multiple comparisons and multiple testing in SAS (2nd edition),SAS Press, 2011.39 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Outline

What are repeated measurements?

Basics of longitudinal data (FLW chapters 1 & 2)

The multivariate normal distribution

Analysis of response profiles (FLW chapters 3 & 5)

SAS proc mixed (FLW section 5.8)

Baseline adjustment (FLW section 5.6)

Appendix: Supplementary SAS-code?

40 / 72

Page 11: Basics of repeated measurementsstaff.pubhealth.ku.dk/~jufo/courses/nfa2016/basics2016...I Special case:longitudinalmeansrepeatedly over time. 4/72 university of copenhagen department

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Preparing data for analysis

Most often raw data is stored in the wide format (e.g. in Excell).I one row per subjectI several columns with the outcomes for different occations

Example:

id sex age treat aix0 aix1 aix21 1 57 0 10.5 17.5 25.02 1 48 0 -2.5 8.0 8.53 2 54 1 18.0 24.0 23.5...

To fit a linear mixed model with any statistical software datamust be in the so-called long format . . .41 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

The long formatI Each row contains only one observation of the outcome.I A time-variable identifies the time of measurement.I An id-variable identifies measurements from same subject.

Obs id sex age treat week aix1 1 1 57 0 0 10.52 1 1 57 0 12 17.53 1 1 57 0 24 25.04 2 1 48 0 0 -2.55 2 1 48 0 12 8.06 2 1 48 0 24 8.57 3 2 54 1 0 18.08 3 2 54 1 12 24.09 3 2 54 1 24 23.5

10 4 2 46 1 0 26.0...42 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Syntax: Analysis of response profiles

PROC MIXED DATA=ckd PLOTS=all;CLASS id week (ref=’0’) treat (ref=’0’);MODEL aix = week treat treat*week

/ SOLUTION CL DDFM=KR OUTPM=ckdfit;REPEATED week / SUBJECT=id TYPE=UN R RCORR;RUN;

I Syntax is similar to PROC GLM with a MODEL specifying the(linear) relation between outcome and covariates.

I Categorical variable must be declared with CLASS.I The model for the covariance (UN=ustructured) is specified

in a separate REPEATED-statement.I Fitted values and residuals are saved in a dataset ckdfit.I Use the PLOTS-option to get some diagnostic plots.

43 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

The option DDFM=KENWARDROGERS (aka KR)

(or DDFM=SATTERTHWAITE).

A technical option intended to improve the statistical performanceof the t-tests and F-tests.

I It has no effect on balanced data.I In unbalanced situations (i.e for almost all observational

studies and in case of missing observations) degrees offreedom are computed by a more complicated formulae.

I The computations may require a little more time,but in most cases this will not be noticable.

When in doubt, use it!44 / 72

Page 12: Basics of repeated measurementsstaff.pubhealth.ku.dk/~jufo/courses/nfa2016/basics2016...I Special case:longitudinalmeansrepeatedly over time. 4/72 university of copenhagen department

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

SAS: proc mixed output

The Mixed Procedure

Model Information

Data Set WORK.CKDDependent Variable aixCovariance Structure UnstructuredSubject Effect idEstimation Method REMLResidual Variance Method NoneFixed Effects SE Method Kenward-RogerDegrees of Freedom Method Kenward-Roger

Class Level Information

Class Levels Values

id 51 1 2 3 4 5 6 7 8 9 10 11 12 1314 15 16 17 18 19 20 21 22 2324 25 26 28 29 30 31 32 33 3435 36 37 38 39 40 41 42 43 4546 47 48 49 51 52 53 54

week 3 12 24 0treat 2 1 0

45 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

SAS: proc mixed outputDimensions

Covariance Parameters 6Columns in X 12Columns in Z 0Subjects 51Max Obs Per Subject 3

Number of Observations

Number of Observations Read 153Number of Observations Used 144Number of Observations Not Used 9

Iteration History

Iteration Evaluations -2 Res Log Like Criterion

0 1 1070.854549411 2 982.86560047 0.001447352 1 982.26253864 0.000099053 1 982.22468047 0.000000614 1 982.22445749 0.00000000

Convergence criteria met.

Always check that the numerical optimisation has converged.46 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

SAS: proc mixed outputOptions R and RCORR makes SAS print the estimated covarianceand correlation matrices.

Estimated R Matrix for id 1

Row Col1 Col2 Col3

1 106.23 96.3802 80.18932 96.3802 159.64 106.483 80.1893 106.48 106.38

Estimated R Correlation Matrix for id 1

Row Col1 Col2 Col3

1 1.0000 0.7401 0.75442 0.7401 1.0000 0.81713 0.7544 0.8171 1.0000

47 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

SAS: proc mixed output

Fit Statistics

-2 Res Log Likelihood 982.2AIC (smaller is better) 994.2AICC (smaller is better) 994.9BIC (smaller is better) 1005.8

Null Model Likelihood Ratio Test

DF Chi-Square Pr > ChiSq5 88.63 <.0001

Used for comparison of different models?.

? Make sure to use the PROC MIXED METHOD=ML-option if you want touse this to test nested models for the mean-structure (lecture 2).48 / 72

Page 13: Basics of repeated measurementsstaff.pubhealth.ku.dk/~jufo/courses/nfa2016/basics2016...I Special case:longitudinalmeansrepeatedly over time. 4/72 university of copenhagen department

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

SAS: proc mixed outputAt last what is most interesting: estimates and tests.

Solution for Fixed Effects

Effect week treat Estimate StdError DF t Value Pr > |t|Intercept 24.3431 2.0793 49.4 11.71 <.0001week 12 1.0887 1.7694 46.2 0.62 0.5414week 24 3.0895 1.4995 44.5 2.06 0.0452week 0 0 . . . .treat 1 -2.0547 2.8999 48.9 -0.71 0.4820treat 0 0 . . . .week*treat 12 1 -1.9493 2.4871 45.8 -0.78 0.4372week*treat 12 0 0 . . . .week*treat 24 1 -3.6078 2.1298 45.3 -1.69 0.0971week*treat 24 0 0 . . . .week*treat 0 1 0 . . . .week*treat 0 0 0 . . . .

Type 3 Tests of Fixed Effects

Num DenEffect DF DF F Value Pr > Fweek 2 44.5 0.99 0.3794treat 1 47 1.84 0.1817week*treat 2 44.5 1.43 0.2490

(confidence intervals omitted due to lack of space)

49 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

SAS proc mixed outputStandardized (aka Studentized) residuals: Normal distribution?

(Boxplots of residuals vs time and group omitted)50 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Plotting the estimated response profilesUse the output data (ckdfit) from PROC MIXED in:

PROC SGPLOT DATA=ckdfit;WHERE id IN (1,3);SERIES x = week y = pred / GROUP = treat MARKERS;RUN;

Note: Not identical to averages over time (due to missing data).51 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Alternative parametrisations

The same model can be phrased differently to highlight differencesbetween groups at specific time points or changes over time.To compare change over time between groups:

I Include both main effects and the interaction term.MODEL aix = time treat time*treat / SOLUTION CL;

To get mean differences between groups at each time point:I Omit the main effect of group and the intercept.

MODEL aix = time time*treat / NOINT SOLUTION CL;

To get the means for all combinations of group and time.I Include only the interaction term and omit the intercept.

MODEL aix = time*treat / NOINT SOLUTION CL;Note: This can be combined with LSMEANS . . .

52 / 72

Page 14: Basics of repeated measurementsstaff.pubhealth.ku.dk/~jufo/courses/nfa2016/basics2016...I Special case:longitudinalmeansrepeatedly over time. 4/72 university of copenhagen department

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Alternative syntax: LSMEANS

PROC MIXED DATA=ckd;CLASS id week treat;MODEL aix = treat*week / NOINT DDFM=KR;LSMEANS treat*week / DIFF SLICE=week CL;REPEATED week / SUBJECT=id TYPE=UN R RCORR;RUN;

I Estimates the means for all times and treatments,I . . . and all possible differences between them (DIFF-option).I NOINT means that the model does not include an intercept

(so there is no need to specifiy reference groups)I Use SLICE=week to test for overall differences between

multiple groups at each time separately (one-way ANOVA).

53 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

output from LSMEANS

Differences of Least Squares Means

Effect week treat _week _treat Estimate Error DF t Value Pr > |t|week*treat 12 1 12 0 -4.0040 3.5909 45.5 -1.12 0.2707week*treat 12 1 24 1 -0.3423 1.5282 46.8 -0.22 0.8237week*treat 12 1 24 0 -6.0048 3.2739 54.8 -1.83 0.0721week*treat 12 1 0 1 -0.8606 1.7478 45.4 -0.49 0.6248week*treat 12 1 0 0 -2.9153 3.2720 62 -0.89 0.3764week*treat 12 0 24 1 3.6617 3.2988 55.4 1.11 0.2718week*treat 12 0 24 0 -2.0008 1.4868 44.9 -1.35 0.1852week*treat 12 0 0 1 3.1434 3.2554 60.1 0.97 0.3381week*treat 12 0 0 0 1.0887 1.7694 46.2 0.62 0.5414week*treat 24 1 24 0 -5.6625 2.9505 46 -1.92 0.0612week*treat 24 1 0 1 -0.5183 1.5125 46 -0.34 0.7334week*treat 24 1 0 0 -2.5730 2.9485 63.7 -0.87 0.3861week*treat 24 0 0 1 5.1442 2.9019 60.7 1.77 0.0813week*treat 24 0 0 0 3.0895 1.4995 44.5 2.06 0.0452week*treat 0 1 0 0 -2.0547 2.8999 48.9 -0.71 0.4820

Tests of Effect Slices

Num DenEffect week DF DF F Value Pr > Fweek*treat 12 1 45.5 1.24 0.2707week*treat 24 1 46 3.68 0.0612week*treat 0 1 48.9 0.50 0.4820

54 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Outline

What are repeated measurements?

Basics of longitudinal data (FLW chapters 1 & 2)

The multivariate normal distribution

Analysis of response profiles (FLW chapters 3 & 5)

SAS proc mixed (FLW section 5.8)

Baseline adjustment (FLW section 5.6)

Appendix: Supplementary SAS-code?

55 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Baseline in randomized studies

Example: CKD patients were randomized to Eplerenone orstandard treatment.

I We know that the two treatment groups cannot differsystematically at baseline since they represent two randomsamples from the same population.

I We are wasting statistical power when estimating thedifference between the baseline means.

So should we leave out the baseline measurement?I No, then we loose information about changes over time and

again the power of the test of treatment effect is reduced.

56 / 72

Page 15: Basics of repeated measurementsstaff.pubhealth.ku.dk/~jufo/courses/nfa2016/basics2016...I Special case:longitudinalmeansrepeatedly over time. 4/72 university of copenhagen department

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Solution: Constrained model (cLMM)

group = Control group = Eplerenonetime=0 β1 β1 + 0time=12 β1 + β2 β1 + β2 + 0 + β4time=24 β1 + β3 β1 + β3 + 0 + β5

I Intercept.I Time effect with standard treatmentI Difference between groups at baseline = 0!I Differences in time-effects (interaction)

How do we get rid of the redundant parameter in SAS?

57 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

PROC MIXED with baseline adjustment

To fit the constrained linear mixed model (cLMM):1. Redefine treatment variable by joining groups at baseline.b

DATA ckd;SET ckd;treatadj = treat;IF week = 0 THEN treatadj = 0;RUN;

PROC MIXED DATA=ckd;CLASS id week (ref=’0’) treat_adj (ref=’0’);MODEL aix = week treat_adj*week / SOLUTION CL DDFM=KR;REPEATED week / SUBJECT=id TYPE=UN R RCORR;RUN;

58 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Eplerenone: cLMM results

Changes in mean AIX (%) since baseline (with adjustment).Week Control Eplerenone Difference P-value12 1.20 (-2.38;4.78) -0.95 (-4.50;2.59) -2.16 (-7.24;2.92) P=0.4024 3.36 (0.42;6.30) -0.76 (-3.75;2.22) -4.12 (-8.24;-0.01) P=0.049

Note: Significant difference at last follow-up in favour of Eplerenone.

SAS-output:Effect week treatadj Estimate StdError DF t Value Pr > |t| AlphaIntercept 23.2879 1.4430 50 16.14 <.0001 0.05week 12 1.2017 1.7816 46.8 0.67 0.5033 0.05week 24 3.3608 1.4643 48.3 2.30 0.0261 0.05week 0 0 . . . . .week*treatadj 12 1 -2.1552 2.5240 46 -0.85 0.3976 0.05week*treatadj 12 0 0 . . . . .week*treatadj 24 1 -4.1247 2.0436 45.9 -2.02 0.0494 0.05week*treatadj 24 0 0 . . . . .week*treatadj 0 0 0 . . . . .

(confidence intervals omitted)

59 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

cLMM: predicted response profiles

60 / 72

Page 16: Basics of repeated measurementsstaff.pubhealth.ku.dk/~jufo/courses/nfa2016/basics2016...I Special case:longitudinalmeansrepeatedly over time. 4/72 university of copenhagen department

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Classical approaches for handling baseline

1. Two-sample t-test on the end point measurements.I Has less power than the others if the correlation is strong.

2. Two-sample t-test on the changes.I Has less power than the others if the correlation is weak.

3. ANCOVA model including baseline as a covariate.I Always has optimal power (when there are no miising data).

Vickers & Altman, Analysing controlled clinical trials with baseline follow-upmeasurements, BMJ 323, 1123–1124.

61 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

ANCOVA with multiple times of follow-up

I Include baseline as a covariate in the data.I Use change-since-baseline as outcome for direct quantification.I It is suggested to center the baseline variable around its mean

for ease of interpretation, and to omit main effect oftreatment and the intercept in the model.

I Include the baseline*time interaction in the model.

PROC MIXED DATA=ckd; WHERE week > 0;CLASS id week treat (ref=’0’);MODEL aixchange = week treat*week baseline*week

/ NOINT SOLUTION CL DDFM=KR;REPEATED week / SUBJECT=id TYPE=UN R RCORR;RUN;

62 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Eplerenone: ANCOVA results

Expected change in AIX (%) since baseline . . .Week Control Eplerenone Difference P-value12 1.40 (-2.24;5.03) -0.95 (-4.56;2.66) -2.35 (-7.51;2.82) P=0.3624 3.47 (0.57;6.38) -0.76 (-3.73;2.21) -4.23 (-8.42;-0.05) P=0.048

. . . for subjects with average baseline value - !!!!

SAS-output:Standard

Effect week treat Estimate Error DF t Value Pr > |t| Alphaweek 12 1.3967 1.8030 44 0.77 0.4427 0.05week 24 3.4738 1.4392 43 2.41 0.0201 0.05week*treat 12 1 -2.3494 2.5643 44 -0.92 0.3646 0.05week*treat 12 0 0 . . . . .week*treat 24 1 -4.2329 2.0775 43.7 -2.04 0.0477 0.05week*treat 24 0 0 . . . . .baseline*week 12 -0.09247 0.1320 44 -0.70 0.4872 0.05baseline*week 24 -0.2430 0.1058 43.2 -2.30 0.0266 0.05

(confidence intervals omitted)

63 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

ANCOVA: predicted changes over time

64 / 72

Page 17: Basics of repeated measurementsstaff.pubhealth.ku.dk/~jufo/courses/nfa2016/basics2016...I Special case:longitudinalmeansrepeatedly over time. 4/72 university of copenhagen department

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

ANCOVA vs cLMMThe two models have somewhat different interpretations.

I cLMM estimates the population mean response.I ANCOVA estimates the expected response depending on thebaseline value.

The two models estimate the same treatment effect? withsimilar accuracy/power

I Estimates and p-values are very similar.I Except that cLMM can better handle missing data, while

ANCOVA merely deletes subjects with missing baseline ormissing series of follow-up.

? The feature that treatment effect is the same on the subject mean and thepopulation mean is particular to linear models (lectures 5+6).65 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Baseline in observational studies

Compare the outcomes for individuals from different groups (e.g.gender or illness groups):

I The groups are likely to differ in many respects . . . includingthe baseline outcome value!

I Differences in response profiles may be due to many factors,and quantifications will depend on which of these are factorsare included in the model.

I Adjust for the covariates that are sensible in the context.

Is the baseline measurement a sensible covariate?

66 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Baseline in observational studies

Fitzmaurice et al. (2011)[Section 5.6]:

For example, in an observational study examining gender differences in weightgain of infants between 12 months (baseline) and 24 months (...) At baselineboys are on average 1 1/2 pounds heavier than girls, but there is no evidence ofa gender effect on the 12 month change in body weight, with boys and girlsboth gaining approximately 5 1/4 pound. In contrast the analysis of covarianceof the same data reveals a discernible gender effect with boys showing moreweight gain than girls.(...) the analysis of covariance is directed at the conditional question of whetherboys are expected to gain more weight than girls given that they have the sameinitial weight at 12 months. (...) The reasoning is that if a boy and girl havethe same intial weight at 12 months, then there are two possibilities: (1) thegirl is initially overweight and is expected to gain less weight or (2) the boy isinitially underweight and is expected to gain more weight over the 12 months.We advise readers to employ the analysis of covariance approach inlongitudinal settings only if the approach and its implications are fullyunderstood.67 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Outline

What are repeated measurements?

Basics of longitudinal data (FLW chapters 1 & 2)

The multivariate normal distribution

Analysis of response profiles (FLW chapters 3 & 5)

SAS proc mixed (FLW section 5.8)

Baseline adjustment (FLW section 5.6)

Appendix: Supplementary SAS-code?

68 / 72

Page 18: Basics of repeated measurementsstaff.pubhealth.ku.dk/~jufo/courses/nfa2016/basics2016...I Special case:longitudinalmeansrepeatedly over time. 4/72 university of copenhagen department

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

From wide to long format

Data was transformed from the wide to the long format with:

DATA ckd (DROP = aix1-aix2);SET ckd_wide;week = 0; aix = aix0; OUTPUT;week = 12; aix = aix1; OUTPUT;week = 24; aix = aix2; OUTPUT;RUN;

Note: Check that the resulting data looks ok.

69 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Making spaghettiplots

The spaghettiplots were made with:

PROC SGPANEL DATA=ckd;PANELBY treat;SERIES x = week y = aix / GROUP=id;RUN;

Note: Applies to data in the long format.

70 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Summary statistics and pairwise scatterplots

PROC SORT DATA=ckd;BY treat;RUN;

ODS GRAPHICS ON;

PROC CORR DATA=ckd PLOT=MATRIX(HISTOGRAM) NOPROB;BY treat;VAR aix0-aix2;RUN;

Note: Applies to data in the wide format.

71 / 72

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Plotting averages over time

The plot of time-averages were made with:

PROC MEANS DATA=ckd NWAY NOPRINT;CLASS treat week;VAR aix;OUTPUT OUT=ckdmeans MEAN=average;RUN;

PROC SGPLOT DATA=ckdmeans;SERIES x = week y = average / GROUP = treat markers;RUN;

Note: Applies to data in the long format.

72 / 72