Top Banner
Analysis of Clustered and Longitudinal Data Module 2 Vocabulary Slam and an Introduction to Linear Mixed Models (LMMs)
26

Analysis of Clustered and Longitudinal Data Module 2 Vocabulary Slam and an Introduction to Linear Mixed Models (LMMs)

Dec 23, 2015

Download

Documents

Marlene Newman
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Analysis of Clustered and Longitudinal Data Module 2 Vocabulary Slam and an Introduction to Linear Mixed Models (LMMs)

Analysis of Clustered and Longitudinal Data

Module 2

Vocabulary Slam and an Introduction to Linear Mixed Models (LMMs)

Page 2: Analysis of Clustered and Longitudinal Data Module 2 Vocabulary Slam and an Introduction to Linear Mixed Models (LMMs)

2

Agenda• Vocabulary Slam

– We begin this module by defining “clustered” and “longitudinal” data, and address other terms used to identify dependent data structures.

– We define and depict clustered and longitudinal data using the multilevel framework.

• Introduction to Linear Mixed Models (LMMs)– We give a brief overview of LMMs and introduce the

concept of a random effect.

Biostat 512

Page 3: Analysis of Clustered and Longitudinal Data Module 2 Vocabulary Slam and an Introduction to Linear Mixed Models (LMMs)

3

What is Clustered Data?• Clustered Data

– An outcome is measured once for each subject, and subjects belong to (or are “nested” in) clusters, such as families, schools, or neighborhoods.

– The number of subjects in each cluster may vary from cluster to cluster.

– Outcomes measured for members of these groups are likely to be correlated.

• Examples

Biostat 512

Page 4: Analysis of Clustered and Longitudinal Data Module 2 Vocabulary Slam and an Introduction to Linear Mixed Models (LMMs)

4

What is Longitudinal Data?• Longitudinal data

– An outcome is measured for the same person repeatedly over a period of time.

– Different subjects may have different numbers of observations which may be taken at different time points.

– Observations made on the same person are likely to be correlated.

• Examples

Biostat 512

Page 5: Analysis of Clustered and Longitudinal Data Module 2 Vocabulary Slam and an Introduction to Linear Mixed Models (LMMs)

What is Clustered-Longitudinal Data?

• Clustered-Longitudinal data– An outcome is measured repeatedly for the same

subject over time, and subjects are clustered within some unit.

– Subjects may have different numbers of measures, and clusters may have differing numbers of subjects.

– The outcome values for different time points in the same subject are assumed to be correlated.

– Measurements for subjects from within the same cluster are assumed to be correlated.

• Examples

Page 6: Analysis of Clustered and Longitudinal Data Module 2 Vocabulary Slam and an Introduction to Linear Mixed Models (LMMs)

What is Repeated Measures Data?

• Repeated measures data– Multiple observations are made for the same person

over time, space or other dimension.– Each subject need not have all measurements.– Outcomes measured for the same person are likely to

be correlated.

• Examples

Page 7: Analysis of Clustered and Longitudinal Data Module 2 Vocabulary Slam and an Introduction to Linear Mixed Models (LMMs)

7

What is “Multilevel” Data

Biostat 512

• Clustered/longitudinal/repeated measures data is more generally known as “multilevel” data.

• Levels 1,2,3,… • Level 1 is the lowest or most granular level of the data,

and where the outcome variable of interest is measured.• Levels 2,3,… capture higher level information

– Cluster-levels for clustered data– Subject-level for longitudinal data– Subject- and cluster-levels for clustered-longitudinal data

• We will illustrate the multilevel concept for 2 and 3 level data structures.

Page 8: Analysis of Clustered and Longitudinal Data Module 2 Vocabulary Slam and an Introduction to Linear Mixed Models (LMMs)

8

Two-Level Clustered Data Example

Biostat 512

• A research study in education aims to assess the impact of school type (public vs. catholic) as well as student gender and student SES on student-level math achievement scores. Scores are measured once for the students in the school.

Page 9: Analysis of Clustered and Longitudinal Data Module 2 Vocabulary Slam and an Introduction to Linear Mixed Models (LMMs)

9

Two-Level Clustered Data(Students Nested within Schools)

Biostat 512

Student 2 Student 3

School 1

Student 1

School 2…

Student 2 . . .Student 1

Level 2

Level 1

Level 1 Variables: Student Achievement Score, Gender, Student’s SES….Level 2 Variables: Public or Catholic School…

Page 10: Analysis of Clustered and Longitudinal Data Module 2 Vocabulary Slam and an Introduction to Linear Mixed Models (LMMs)

10

Two-Level Longitudinal Data Example

• Researchers are studying the effect of a mother’s vocabulary and the child’s gender on the child’s vocabulary growth.

Biostat 512

Page 11: Analysis of Clustered and Longitudinal Data Module 2 Vocabulary Slam and an Introduction to Linear Mixed Models (LMMs)

11

Longitudinal Data (Vocabulary Measured Over Time)

Biostat 512

Vocab Measured at Time 2

Vocab Measured at Time n1

Child 1

Vocab Measured at Time 1

Child 2…

Vocab Measured at Time 2…

Vocab Measured at Time 1

Level 2

Level 1

Level 1 Variables (Time-Varying): Child Vocabulary Count, Age at each measurementLevel 2 Variables (Time-Invariant): Mother’s Vocabulary, Child’s Gender

Page 12: Analysis of Clustered and Longitudinal Data Module 2 Vocabulary Slam and an Introduction to Linear Mixed Models (LMMs)

12

Two-Level Repeated Measures Example

• Researchers are studying the effect of two different treatments on nucleotide bonding in three regions of the brain in rats. Measurements are taken from the same three regions of the brain of each rat, after each of the two different treatments.

Biostat 512

Page 13: Analysis of Clustered and Longitudinal Data Module 2 Vocabulary Slam and an Introduction to Linear Mixed Models (LMMs)

13

Repeated Measures Data (Rat Brain Example)

Biostat 512

Rat 1

ChemicalMeasured in Region 1, Treatment A

Rat 2…

ChemicalMeasured in Region 2, Treatment A

ChemicalMeasured in Region 3, Treatment B

ChemicalMeasured in Region 1, Treatment A

ChemicalMeasured in Region 3, Treatment B

Level 2

Level 1

Level 1 Variables (Varying): Nucleotide bonding measurement, Brain region, TreatmentLevel 2 Variables (Invariant): Rat gender

… …

Page 14: Analysis of Clustered and Longitudinal Data Module 2 Vocabulary Slam and an Introduction to Linear Mixed Models (LMMs)

14

Three-Level Clustered Data Example

Biostat 512

• A research study in education aims to assess the impact of school, classroom, and student-level variables on student achievement.

Page 15: Analysis of Clustered and Longitudinal Data Module 2 Vocabulary Slam and an Introduction to Linear Mixed Models (LMMs)

15

Three-Level Clustered Data (Students nested in classrooms nested in schools)

Biostat 512

Level 3

Level 1

Level 1 Variables: Student Achievement Score, Gender, Student’s SES…Level 2 Variables: Teacher experience, Class size …Level 3 Variables: School locale (Rural or Urban), School percent low income

Student 2 Student n1

Classroom 1

Student 1

Classroom 2…

…Student 2…Student 1

Level 2

School 1…

Page 16: Analysis of Clustered and Longitudinal Data Module 2 Vocabulary Slam and an Introduction to Linear Mixed Models (LMMs)

16

Three-Level Clustered-Longitudinal Data Example

• Math skills are measured for the same student each year from grades 1 through 6, with students clustered within schools. The goal is to model how student characteristics, such as ethnicity and gender, and school characteristics, such as school size and percent low-income students, affect the math scores of students over time.

Biostat 512

Page 17: Analysis of Clustered and Longitudinal Data Module 2 Vocabulary Slam and an Introduction to Linear Mixed Models (LMMs)

17

Three-Level Clustered-Longitudinal Data (Math scores measured over time for students nested in schools)

Biostat 512

Level 2

Level 1

Level 1 Variables (Time-Varying): Student’s math score, Grade at each measurementLevel 2 Variables (Time-Invariant): Student’s Ethnicity, Student’s GenderLevel 3 Variables (Time-Invariant): School size, Educational Intervention at School Level

Math Score at Grade 2

Math Score at Grade 6

Student 1

Math Score at Grade 1

Student 2…

Math Score at Grade 3…

Math Score at Grade 2

School 1… Level 3

Page 18: Analysis of Clustered and Longitudinal Data Module 2 Vocabulary Slam and an Introduction to Linear Mixed Models (LMMs)

18

What Constitutes a “Level” in Multilevel Data?

• In a clustered data set, each “level” represents a factor that can be thought of as a random sample from a larger population.– The students in a two-level clustered data set can be

thought of as a random sample of students within each school.

– The schools in a two-level clustered data set can be thought of as a random sample from a larger population of schools.

• We want to make inferences to the larger population of students and schools, not confine our inference to the particular students and schools included in this study.

• In a longitudinal data set, level 1 represents the “occasions” within a subject and Level 2 is the subject.– We think of the subjects as being representative of a larger

population of subjects.Biostat 512

Page 19: Analysis of Clustered and Longitudinal Data Module 2 Vocabulary Slam and an Introduction to Linear Mixed Models (LMMs)

19

What is Not a “Level”

• Factors such as Treatment or Gender are not considered to be Levels of data, because they cannot be thought of as a random sample from a larger population.

• We wish to make inferences only about the specific values of Treatment or Gender that are included in our study…not to a larger population of treatments or genders.

Biostat 512

Page 20: Analysis of Clustered and Longitudinal Data Module 2 Vocabulary Slam and an Introduction to Linear Mixed Models (LMMs)

20

Why all this talk about “levels”?

• Understanding the multilevel nature of your dataset is a critical start to the analysis process.– Is the data “clustered”, “longitudinal” or “clustered-

longitudinal”?– How many levels are there? 2,3, more? – What defines each level? – What is the outcome of interest and is it measured at

Level 1?– What other variables are of interest at each level?

• The answers to these questions will drive the entire analysis.

Biostat 512

Page 21: Analysis of Clustered and Longitudinal Data Module 2 Vocabulary Slam and an Introduction to Linear Mixed Models (LMMs)

21

Models for Multilevel Data

• Data are often hierarchical in nature, especially in the social sciences, and we should not ignore this.

• Using single-level (OLS, GLM) analysis leads to:– Unit of analysis problem

• School or child?

– Aggregation bias • School SES vs. child SES?

– Incorrectly estimated precision / standard errors• Results in incorrect p-values and incorrect conclusions

• Linear Mixed Models can appropriately address these problems.

Biostat 512

Page 22: Analysis of Clustered and Longitudinal Data Module 2 Vocabulary Slam and an Introduction to Linear Mixed Models (LMMs)

What are Linear Mixed Models? (LMMs)

LMMs are:• Also known as multilevel models, hierarchical models, random effects

models, mixed models• For a continuous outcome variable, Y• Linear in the parameters (β’s)• For multilevel data, where outcomes measured for the same cluster/subject

are assumed to be correlated and/or the error variance is not constant. In other words, for situations where the GLM assumption (below) is violated.

• Composed of both fixed and random effects, hence, “mixed”• Not the only modeling option for multilevel data with a continuous outcome.

Another option is a marginal model, which we will discuss later in the course.

2~ ( , )i N 0iid

Page 23: Analysis of Clustered and Longitudinal Data Module 2 Vocabulary Slam and an Introduction to Linear Mixed Models (LMMs)

23

Fixed Effects in a LMM• Are usually the focus of the analysis • Can be thought of as similar to parameters in an

ordinary regression model (the Betas)• Can be taken from any level of the data• Help us to explain the variance in Y at each level of

the data• Examples of fixed effects:

– Age, sex, treatment, brain region, marital status, teaching experience

Biostat 512

Page 24: Analysis of Clustered and Longitudinal Data Module 2 Vocabulary Slam and an Introduction to Linear Mixed Models (LMMs)

24

Random Effects in a LMM• Are usually not the primary focus of the analysis, but…• Allow us to account for correlation among observations

within the same level-2 or higher units (e.g. correlations among observations within the same school)

• Allow us to partition the total variance of Y into levels that correspond with the multilevel structure of the data– How much of the variation in student math achievement scores

can be attributed to student-level variability (level 1) versus school-level variability (level 2)?

• Are summarized by their variance and covariance, if there is more than one random effect in the LMM

Biostat 512

Page 25: Analysis of Clustered and Longitudinal Data Module 2 Vocabulary Slam and an Introduction to Linear Mixed Models (LMMs)

25

Random Effects in a LMM (cont)• Come in two flavors:

– Random intercepts– Random slopes

• Are explicitly specified in the model. This is in contrast to the random errors, which are never explicitly specified when a model is fit, but always exist and their variance is always estimated.

• We will introduce the LMM notation and assumptions in the next module.

Biostat 512

Page 26: Analysis of Clustered and Longitudinal Data Module 2 Vocabulary Slam and an Introduction to Linear Mixed Models (LMMs)

26

In Conclusion• Dependent data structures go by many names –

longitudinal, clustered, repeated measures, multilevel.

• Understanding the multilevel nature of a dataset is critical to any analysis.

• OLS regression is not an appropriate technique for modeling multilevel data.

• A Linear Mixed Model is one approach that can be used for dependent data.

Biostat 512