Multilevel modelling short course Mark Tranmer, CCSR
Mar 28, 2015
Multilevel modellingshort course
Mark Tranmer, CCSR
What is multilevel analysis
• Many populations have a group structure of some kind: hierarchical or non-hierarchical.
• For example pupils can be grouped into schools
• Individuals can be grouped into areas.• Pupils can be grouped by school, and by
neighbourhood. • Suppose we wish to assess area variations in
income, possibly with respect to other factors.
What is multilevel analysis?
• If we have district level data we can estimate a district level relationship.
• E.g. average income and average age in each district
• If we have individual level data we can estimate an individual level relationship
• E.g. we can relate a person’s income to a person’s age.
What is multilevel analysis?
• But how do we assess the relationships at the district level and the individual level at the same time?
• We can do this with a multilevel model.
• We can fit this kind of model with specialist software such as MLwiN, which we will use today.
The ecological fallacy
• We could assume that an equation we estimate at the district level also occurs at the individual level, that is to make a cross level inference
• But this is generally not sensible – individuals vary within each district with respect to the variables we wish to relate.
• Hence we could well make invalid inferences about the relationship at the individual level
• This phenomenon is referred to as ‘the ecological fallacy’.
Problems of ignoring population structure
• If we carry out the analysis at the individual level we do not recognise in our analysis that ‘similar’ individuals that live within small sub areas of our population.
• That is, ‘clustering’ occurs• Ignoring this clustering may lead to biased
estimates of summary statistics, especially variances, standard deviations and standard errors.
• Hence we might falsely attribute statistical significance (or non significance) to results if we ignore the clustering.
Examples of multilevel relationships
Some substantive multilevel examples
• Schools. Variations in exam performance.
Level 3: school
Level 2: class
Level 1: pupils
Variations in exam score. ‘School effectiveness’
Some substantive multilevel examples
• Areas: Variations in health Level 3: CountiesLevel 2: DistrictsLevel 1: people • People: Dental data Level 2: People’s mouthsLevel 1: teeth
Some substantive multilevel examples
• Time as a level. Level 2: PersonLevel 1: Occasion • Multivariate. Level 2: PupilLevel 1: subject of exam score.
Terminology
• Nesting. Level k-1 units contained in level k units. E.g.
classes at level 2 nested in schools at level 3. Classes are the level 2 units, schools are the level 3 units.
• Cross classification.Non overlapping higher level units – school andneighbourhood at level 2, pupil at level 1.
Continuous and Binary Response variables
• For a continuous response we use a multilevel model that is an extension of the standard multiple regression model – as we will see this morning.
• For a binary response we use a multilevel model that is an extension of the logistic regression model – as we will see this afternoon.
Data requirements
• What are the data requirements for multilevel modelling?
• The standard requirements are to have available a dataset that includes indicators of the group to which individual unit belongs.
• For example information for a sample of pupils that includes an indicator of the school that they attend.
• Another example is a sample of individuals that includes an indicator of the area in which they live.
Fixed effects
• What about fixed effects analysis?• If we had information on pupils that attended three
schools, we can carry out a fixed effects analysis to compare the three schools based on these sample data.
• We would do this by doing an analysis that includes two dummy variables that allow us to compare the schools.
• We could make inferences from our results about how the three schools compare but we would not want to make wider inferences about ‘all schools’ based on information on only 3 schools.
Multilevel modelling
• For multilevel modelling we would have information on a ‘reasonable number of higher level units’
• What is ‘reasonable’? Snijders and Bosker (1999) recommend at least 10 groups. 20 or more is better.
• We essentially assume we have a representative sample of higher level units in multilevel modelling, so 30 is a good number to have in mind.
Multilevel modelling
• Suppose we had data for pupils based on 30 schools.• We could carry out a fixed effects analysis on these
data by using 29 dummy variables.• Or we could use multilevel modelling which assumes
the schools are themselves a sample. Hence we do not need to estimate so many model parameters using multilevel modelling and it is desirable in this situation.
• Multilevel modelling also takes into account group size in estimation – estimates of residuals for groups with small populations – e.g. a school with 2 pupils – are ‘shrunken’ towards the mean.
Theory: Single level models
• Suppose we have data for 4059 pupils in 65 schools.
• How could we model the data?
• Model 1: pupil level model based on the 4059 pupils
Var(yi) = 2
iii exy 10
Single level models
• Model 2: Or a school level model based on aggregate data for the 65 schools; that is, the school means.
jjj exy 10
Multilevel models: model 3 ‘variance components’ model
Var(yij) = 2u+2
e = 2
i is the pupil subscriptj is the school subscript 2
u measures variation in schools.
2e measures variation in pupils.
ijjij euy 0
Intra-‘class’ correlation
2u /2
= the intra class correlation:
the proportion of the overall variation in exam score attributable to schools. i.e. how similar are exam scores within schools
Random intercepts model
Model 4: 2 level model: pupils in schools, with an explanatory variable.
ijjijij euxy 10
Random slopes model
ijjijjij euxy 010
jj u111
ijjijjijij euxuxy 0110
Model 5: random slopes Where the ‘random slopes coefficient is:
Or alternatively, but equivalently, we can write the model as:
Group level variables
• We can also add group level variables to the model, e.g. the type of school (mixed or single sex), or the percentage of pupils taking free school meals in the school.
ijjjjijjij euzwxy 03210
Binary response variables
• Many response variables are ‘binary’ ‘0/1’ ‘dichotomous’.
• E.g. whether or not a person is unemployed or has a limiting long term illness.
• Risk of unemployment may be associated with personal characteristics and/or where people live. We can use Multilevel logistic models to investigate these issues.
Binary response variables
• Let’s suppose we are looking at the risk of people being unemployed given some demographic characteristics, and also given some information about the area in which they live.
• We can look at this problem using multilevel logistic regression models
Multilevel logistic regression models
Model 6: The basic (two level) multilevel model for a binary response is written as follows. where yij takes the value 0 or 1 for each individual i in
group j (0=not unemployed, 1=employed),
pij is the predicted probability of unemployment for
individual i in area j.
eij is an individual level error,
ijijij epy
Multilevel logistic regression models
jppij uxxxp ...)(Logit 22110
Where 0 is the ‘intercept’ and, 1 to p are
the coefficients of the p explanatory variables
MLwiN for binary response variables.• MLwiN could be used to fit a multilevel model based on
the example of unemployment as a response variable and some demographic information as explanatory variables.
• For this analysis we could use 1991 UK Census data from the Samples of Anonymised records (SAR).
• The MLwiN procedure for binary response variables is slightly more involved than that for continuous response variables.
• See chapter 9 of the mlwin user guide• www.cmm.bristol.ac.uk/MLwiN/download/userman_2005.pdf
SPSS for mutilevel modelling
• In versions of SPSS >= 11.5 it is now possible to fit models for dependent variables with an interval response.
• The syntax on the next slide shows how variance components, random intercepts and random intercepts/slopes models can be fitted for a 2-level example - pupils in schools.
Random intercepts and slopes (on standlrt) model for pupils in Schools. (normexam is continuous response; standlrt is continuous) Explanatory variable. Syntax is as follows.
mixed normexam with standlrt / print = solution / fixed standlrt / random intercept standlrt | subject(school) covtype(UN).
SPSS for multilevel modelling
[ to access via SPSS menus: analyse > mixed models ]
Model Dimensionb
1 1
1 1
2 Unstructured 3 SCHOOL
1
4 6
Intercept
STANDLRT
Fixed Effects
Intercept + STANDLRTaRandom Effects
Residual
Total
Numberof Levels
CovarianceStructure
Number ofParameters
SubjectVariables
As of version 11.5, the syntax rules for the RANDOM subcommand have changed. Yourcommand syntax may yield results that differ from those produced by prior versions. If you areusing SPSS 11 syntax, please consult the current syntax reference guide for more information.
a.
Dependent Variable: NORMEXAM.b.
Estimates of Fixed Effectsa
-.0116529 .0401111 60.653 -.291 .772 -.0918693 .0685635
.5565333 .0201139 56.343 27.669 .000 .5162458 .5968209
ParameterIntercept
STANDLRT
Estimate Std. Error df t Sig. Lower Bound Upper Bound
95% Confidence Interval
Dependent Variable: NORMEXAM.a.
Estimates of Covariance Parametersa
.5536372 .0124922
.0921177 .0187573
.0183415 .0070894
.0149670 .0046961
ParameterResidual
UN (1,1)
UN (2,1)
UN (2,2)
Intercept + STANDLRT[subject = SCHOOL]
Estimate Std. Error
Dependent Variable: NORMEXAM.a.
variance components model only
mixed normexam / print = solution / random intercept | subject(school) covtype(UN).
random intercepts model only
mixed normexam with standlrt / print = solution / fixed standlrt / random intercept | subject(school) covtype(UN).
Reading listBooks:• Plewis, I (1997) ‘Statistics in Education’. Edward Arnold• Snijders T and Bosker R (1999) ‘An introduction to
Basic and Advanced Multilevel modelling. Sage Publications.
• Goldstein, H (1995) Multilevel statisical models. Edward Arnold.
Web:
• http://www.cmm.bristol.ac.uk• Nb: New version of mlwin 2.10 just released : see
website