Some questions (and a few answers) about multilevel modelsgelman/presentations/cdctalk.pdf · Some questions (and a few answers) about multilevel models Andrew Gelman Department of

Structured data and multilevel modelsUnderstanding multilevel models and variance components

Conclusions

Some questions (and a few answers) aboutmultilevel models

Andrew GelmanDepartment of Statistics and Department of Political Science

Columbia University

3 May 2005

Andrew Gelman Q’s and A’s on multilevel models


Conclusions

Themes

I Multilevel models are necessary

I Tools needed to build, fit, check, and understand mlms

I Analogy to linear regression

I Mlm as regression with categorical inputs



Conclusions

Themes







Conclusions

Themes







Conclusions

Themes







Conclusions

Themes







Conclusions

Fitting and understanding multilevel models

I Some of my experiences with multilevel models

I Some challenges and solutions

I Lots of time for questionsI Collaborators:

I Iain Pardoe, Dept of Decision Sciences, University of OregonI David Park, Joseph Bafumi, Boris Shor, Dept of Political

Science, Columbia UniversityI Samantha Cook, Zaiying Huang, Jouni Kerman, Shouhao

Zhao, Dept of Statistics, Columbia UniversityI Phillip Price, Energy and Environment Division, Lawrence

Berkeley National Laboratory



Conclusions











Conclusions











Conclusions











Conclusions











Conclusions

Plan of talk

I Rodents in NYC: apts within buildings within neighborhoods

I State-level opinions from national polls: mlm andpoststratification

I Mlm when number of groups is small

I Finite-population and superpopulation inference

I Understanding a fitted multilevel regression: Anova, averagepredictive effects, partial pooling, and R2

I Why I don’t use the terms “fixed” and “random” effects

I Questions . . .



Conclusions

Plan of talk







I Questions . . .



Conclusions

Plan of talk







I Questions . . .



Conclusions

Plan of talk







I Questions . . .



Conclusions

Plan of talk







I Questions . . .



Conclusions

Plan of talk







I Questions . . .



Conclusions

Plan of talk







I Questions . . .



Conclusions

Plan of talk







I Questions . . .



Conclusions

RodentsOpinionsMLM with few groups

NYC Dept of Health study

I Survey of 16000 apts in 9000 bldgs in 55 neighborhoods inNYC

I Do you have rodents?

I Hierarchical logistic regression:

Pr(yi = 1) = logit−1((Xβ)i + αbldg(i) + γneighborhood(i))

I Try to fit in WinBUGS, but too slow! Solutions:I Fit to subset of the data (900 apts in 500 bldgs)I Fit to all the data, separate model for each neighborhood



Conclusions










Conclusions










Conclusions










Conclusions










Conclusions










Conclusions










Conclusions


National opinion trends

1940 1950 1960 1970 1980 1990 2000

5060

7080

Year

Per

cent

age

supp

ort f

or th

e de

ath

pena

lty



Conclusions


State-level opinion trends

I Goal: estimating time series within each state

I One poll at a time: small-area estimation

I It works! Validated for pre-election polls

I Combining surveys: model for parallel time series

I Multilevel modeling + poststratification

I Poststratification cells: sex × ethnicity × age × education ×state



Conclusions











Conclusions











Conclusions











Conclusions











Conclusions











Conclusions











Conclusions


Multilevel modeling of opinions

I Logistic regression: Pr(yi = 1) = logit−1((Xβ)i )

I X includes demographic and geographic predictors

I Group-level model for the 16 age × education predictors

I Group-level model for the 50 state predictors

I Bayesian inference, summarize by posterior simulations of β:Simulation θ1 · · · θ75

1 ** · · · **...

.... . .

...1000 ** · · · **



Conclusions








1 ** · · · **...

.... . .

...1000 ** · · · **



Conclusions








1 ** · · · **...

.... . .

...1000 ** · · · **



Conclusions








1 ** · · · **...

.... . .

...1000 ** · · · **



Conclusions








1 ** · · · **...

.... . .

...1000 ** · · · **



Conclusions








1 ** · · · **...

.... . .

...1000 ** · · · **



Conclusions


Interlude: why “multilevel” 6= “hierarchical”





I Crossed (nonnested) structure of age, education, state

I Several overlapping “hierarchies”



Conclusions











Conclusions











Conclusions











Conclusions











Conclusions











Conclusions











Conclusions


Poststratification to estimate state opinions

I Implied inference for θj = logit−1(Xβ) in each of 3264 cells j(e.g., black female, age 18–29, college graduate, Georgia)

I PoststratificationI Within each state s, average over 64 cells:∑

j∈s Njθj

/ ∑j∈s Nj

I Nj = population in cell j (from Census)I 1000 simulation draws propagate to uncertainty for each θj



Conclusions





j∈s Njθj

/ ∑j∈s Nj




Conclusions





j∈s Njθj

/ ∑j∈s Nj




Conclusions





j∈s Njθj

/ ∑j∈s Nj




Conclusions





j∈s Njθj

/ ∑j∈s Nj




Conclusions





j∈s Njθj

/ ∑j∈s Nj




Conclusions


CBS/New York Times pre-election polls from 1988

I Validation study: fit model on poll data and compare toelection results

I Competing estimates:I No pooling: separate estimate within each stateI Complete pooling: no state predictorsI Hierarchical model and poststratify

I Mean absolute state errors:I No pooling: 10.4%I Complete pooling: 5.4%I Hierarchical model with poststratification: 4.5%



Conclusions








Conclusions








Conclusions








Conclusions








Conclusions








Conclusions








Conclusions








Conclusions








Conclusions








Conclusions


Validation study: comparison of state errors

1988 election outcome vs. poll estimate

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

no pooling of state effects

Estimated Bush support

Act

ual e

lect

ion

outc

ome

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

complete pooling (no state effects)


Act

ual e

lect

ion

outc

ome

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

multilevel model


Act

ual e

lect

ion

outc

ome



Conclusions


How many groups do you need to fit a mlm?

I 9000 bldgs, 55 neighborhoods, 50 states: that’s okI But why do mlm with only 4 categories?

I Age 18–29, 30–44, 45–64, 65+I Education less than HS, HS, some college, college grad

I Simple to set up as mlm

I No need to choose a “baseline” category”

I Extends to interactions (16 age × education categories)



Conclusions










Conclusions










Conclusions










Conclusions










Conclusions










Conclusions










Conclusions










Conclusions


Finite-population and superpopulation estimands

I Consider the 4 coefficients, βage1 , . . . , βage

4

I Finite-population centering:

β̃agej = βage

j − β̄age, for j = 1, . . . , 4

β̃0 = β0 + β̄age

I Adjusted parameters are more precisely estimated

I Especially when # of groups is smallI Sd of group effects

I βagej ∼ N(0, σ2

age), for j=1,. . . ,4I Superpopulation sd: σage

I Finite-population sd:√

13

∑4j=1(β

agej − β̄age)2



Conclusions




4


β̃agej = βage

j − β̄age, for j = 1, . . . , 4

β̃0 = β0 + β̄age






13

∑4j=1(β

agej − β̄age)2



Conclusions




4


β̃agej = βage

j − β̄age, for j = 1, . . . , 4

β̃0 = β0 + β̄age






13

∑4j=1(β

agej − β̄age)2



Conclusions




4


β̃agej = βage

j − β̄age, for j = 1, . . . , 4

β̃0 = β0 + β̄age






13

∑4j=1(β

agej − β̄age)2



Conclusions




4


β̃agej = βage

j − β̄age, for j = 1, . . . , 4

β̃0 = β0 + β̄age






13

∑4j=1(β

agej − β̄age)2



Conclusions




4


β̃agej = βage

j − β̄age, for j = 1, . . . , 4

β̃0 = β0 + β̄age






13

∑4j=1(β

agej − β̄age)2



Conclusions




4


β̃agej = βage

j − β̄age, for j = 1, . . . , 4

β̃0 = β0 + β̄age






13

∑4j=1(β

agej − β̄age)2



Conclusions




4


β̃agej = βage

j − β̄age, for j = 1, . . . , 4

β̃0 = β0 + β̄age






13

∑4j=1(β

agej − β̄age)2



Conclusions




4


β̃agej = βage

j − β̄age, for j = 1, . . . , 4

β̃0 = β0 + β̄age






13

∑4j=1(β

agej − β̄age)2



Conclusions


Example of finite-pop and superpop ests

1 2 3 4 5 6 7 8

zero−centered parameters, δkadj

airport, k

δ kadj

−0.

50.

00.

5

1 2 3 4 5 6 7 8

uncentered parameters, δk

airport, k

δ k−

0.5

0.0

0.5



Conclusions


Redundant parameterization

I Data model: Pr(yi = 1) = logit−1(β0 + βage

age(i) + βstatestate(i)

)I Usual model for the coefficients:

βagej ∼ N(0, σ2

age), for j = 1, . . . , 4

βstatej ∼ N(0, σ2

state), for j = 1, . . . , 50

I Additively redundant model:

βagej ∼ N(µage, σ

2age), for j = 1, . . . , 4

βstatej ∼ N(µstate, σ

2state), for j = 1, . . . , 50

I Why add the redundant µage, µstate?I Iterative algorithm moves more smoothly



Conclusions






βagej ∼ N(0, σ2

age), for j = 1, . . . , 4


state), for j = 1, . . . , 50



2age), for j = 1, . . . , 4


2state), for j = 1, . . . , 50




Conclusions






βagej ∼ N(0, σ2

age), for j = 1, . . . , 4


state), for j = 1, . . . , 50



2age), for j = 1, . . . , 4


2state), for j = 1, . . . , 50




Conclusions






βagej ∼ N(0, σ2

age), for j = 1, . . . , 4


state), for j = 1, . . . , 50



2age), for j = 1, . . . , 4


2state), for j = 1, . . . , 50




Conclusions






βagej ∼ N(0, σ2

age), for j = 1, . . . , 4


state), for j = 1, . . . , 50



2age), for j = 1, . . . , 4


2state), for j = 1, . . . , 50




Conclusions






βagej ∼ N(0, σ2

age), for j = 1, . . . , 4


state), for j = 1, . . . , 50



2age), for j = 1, . . . , 4


2state), for j = 1, . . . , 50




Conclusions


Motivation for redundant parameterization

80% interval for each chain R−hat

−3

−3

−2

−2

−1

−1

0

0

1

1

2

2

3

3

4

4

1 1.5 2+

1 1.5 2+

1 1.5 2+

1 1.5 2+

1 1.5 2+

1 1.5 2+

mu

eta[1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23][24][25][26][27][28][29][30][31][32][33][34][35][36][37][38][39][40][41][42][43][44][45]

sigma.y

sigma.eta

*

* array truncated for lack of space

medians and 80% intervals

mu

1

3.5

eta

−3

1

111111111 222222222 333333333 444444444 555555555 666666666 777777777 888888888 999999999101010101010101010 121212121212121212 141414141414141414 161616161616161616 181818181818181818 202020202020202020 222222222222222222 242424242424242424 262626262626262626 282828282828282828 303030303030303030 323232323232323232 343434343434343434 363636363636363636 383838383838383838 404040404040404040

*

sigma.y

0.77

0.83

sigma.eta

0

2

deviance

2170

2220

Bugs model at "C:/research/radon/radon.anova.1.txt", 3 chains, each with 100 iterations



Conclusions


Redundant additive parameterization

I Model

Pr(yi = 1) = logit−1(β0 + βage


)βage

j ∼ N(µage, σ2age), for j = 1, . . . , 4


2state), for j = 1, . . . , 50

I Identify using centered parameters:

β̃agej = βage

j − β̄age, for j = 1, . . . , 4

β̃statej = βstate

j − β̄state, for j = 1, . . . , 50

I Redefine the constant term:

β̃0 = β0 + β̄age + β̄age



Conclusions



I Model



)βage

j ∼ N(µage, σ2age), for j = 1, . . . , 4


2state), for j = 1, . . . , 50


β̃agej = βage

j − β̄age, for j = 1, . . . , 4


j − β̄state, for j = 1, . . . , 50





Conclusions



I Model



)βage

j ∼ N(µage, σ2age), for j = 1, . . . , 4


2state), for j = 1, . . . , 50


β̃agej = βage

j − β̄age, for j = 1, . . . , 4


j − β̄state, for j = 1, . . . , 50





Conclusions



I Model



)βage

j ∼ N(µage, σ2age), for j = 1, . . . , 4


2state), for j = 1, . . . , 50


β̃agej = βage

j − β̄age, for j = 1, . . . , 4


j − β̄state, for j = 1, . . . , 50





Conclusions


Redundant multiplicative parameterization

I New model

Pr(yi = 1) = logit−1(β0 + ξageβage

age(i) + ξstateβstatestate(i)

)βage

j ∼ N(µage, σ2age), for j = 1, . . . , 4


2state), for j = 1, . . . , 50

I Identify using centered and scaled parameters:

β̃agej = ξage(βage

j − β̄age), for j = 1, . . . , 4

β̃statej = ξstate

(βstate

j − β̄state), for j = 1, . . . , 50

I Faster convergence

I More general model, connections to factor analysis



Conclusions



I New model



)βage

j ∼ N(µage, σ2age), for j = 1, . . . , 4


2state), for j = 1, . . . , 50



j − β̄age), for j = 1, . . . , 4


(βstate

j − β̄state), for j = 1, . . . , 50





Conclusions



I New model



)βage

j ∼ N(µage, σ2age), for j = 1, . . . , 4


2state), for j = 1, . . . , 50



j − β̄age), for j = 1, . . . , 4


(βstate

j − β̄state), for j = 1, . . . , 50





Conclusions



I New model



)βage

j ∼ N(µage, σ2age), for j = 1, . . . , 4


2state), for j = 1, . . . , 50



j − β̄age), for j = 1, . . . , 4


(βstate

j − β̄state), for j = 1, . . . , 50





Conclusions



I New model



)βage

j ∼ N(µage, σ2age), for j = 1, . . . , 4


2state), for j = 1, . . . , 50



j − β̄age), for j = 1, . . . , 4


(βstate

j − β̄state), for j = 1, . . . , 50





Conclusions


MLM and partial pooling

I Goal is to more accurately estimate coefficients that aregrouped

I A reparameterization can change a model(even if it leaves the likelihood unchanged)

I Redundant additive parameterization

I Redundant multiplicative parameterization

I Weakly-informative prior distribution for group-level varianceparameters



Conclusions










Conclusions










Conclusions










Conclusions










Conclusions

Graphical display of a fitted mlmAnalysis of varianceAverage predictive effectsR2 and pooling factors

Displaying and summarizing inferences

I Displaying parameters in groups rather than as a long list

I Analysis of variance

I Average predictive effects

I R2 and partial pooling factors



Conclusions









Conclusions









Conclusions









Conclusions









Conclusions


Raw display of inference

mean sd 2.5% 25% 50% 75% 97.5% Rhat n.eff

B.0 0.402 0.147 0.044 0.326 0.413 0.499 0.652 1.024 110

b.female -0.094 0.102 -0.283 -0.162 -0.095 -0.034 0.107 1.001 1000

b.black -1.701 0.305 -2.323 -1.910 -1.691 -1.486 -1.152 1.014 500

b.female.black -0.143 0.393 -0.834 -0.383 -0.155 0.104 0.620 1.007 1000

B.age[1] 0.084 0.088 -0.053 0.012 0.075 0.140 0.277 1.062 45

B.age[2] -0.072 0.087 -0.260 -0.121 -0.054 -0.006 0.052 1.017 190

B.age[3] 0.044 0.077 -0.105 -0.007 0.038 0.095 0.203 1.029 130

B.age[4] -0.057 0.096 -0.265 -0.115 -0.052 0.001 0.133 1.076 32

B.edu[1] -0.148 0.131 -0.436 -0.241 -0.137 -0.044 0.053 1.074 31

B.edu[2] -0.022 0.082 -0.182 -0.069 -0.021 0.025 0.152 1.028 160

B.edu[3] 0.148 0.112 -0.032 0.065 0.142 0.228 0.370 1.049 45

B.edu[4] 0.023 0.090 -0.170 -0.030 0.015 0.074 0.224 1.061 37

B.age.edu[1,1] -0.044 0.133 -0.363 -0.104 -0.019 0.025 0.170 1.018 1000

B.age.edu[1,2] 0.059 0.123 -0.153 -0.011 0.032 0.118 0.353 1.016 580

B.age.edu[1,3] 0.049 0.124 -0.146 -0.023 0.022 0.104 0.349 1.015 280

B.age.edu[1,4] 0.001 0.116 -0.237 -0.061 0.000 0.052 0.280 1.010 1000

B.age.edu[2,1] 0.066 0.152 -0.208 -0.008 0.032 0.124 0.449 1.022 140

B.age.edu[2,2] -0.081 0.127 -0.407 -0.137 -0.055 0.001 0.094 1.057 120

B.age.edu[2,3] -0.004 0.102 -0.226 -0.048 0.000 0.041 0.215 1.008 940

B.age.edu[2,4] -0.042 0.108 -0.282 -0.100 -0.026 0.014 0.157 1.017 170

B.age.edu[3,1] 0.034 0.135 -0.215 -0.030 0.009 0.091 0.361 1.021 230

B.age.edu[3,2] 0.007 0.102 -0.213 -0.039 0.003 0.052 0.220 1.019 610

B.age.edu[3,3] 0.033 0.130 -0.215 -0.029 0.009 0.076 0.410 1.080 61

B.age.edu[3,4] -0.009 0.109 -0.236 -0.064 -0.005 0.043 0.214 1.024 150

B.age.edu[4,1] -0.141 0.190 -0.672 -0.224 -0.086 -0.003 0.100 1.036 270

B.age.edu[4,2] -0.014 0.119 -0.280 -0.059 -0.008 0.033 0.239 1.017 240

B.age.edu[4,3] 0.046 0.132 -0.192 -0.024 0.019 0.108 0.332 1.010 210

B.age.edu[4,4] 0.042 0.142 -0.193 -0.022 0.016 0.095 0.377 1.015 160

B.state[1] 0.201 0.211 -0.131 0.047 0.172 0.326 0.646 1.003 960

B.state[2] 0.466 0.252 0.008 0.310 0.440 0.603 1.047 1.001 1000

B.state[3] 0.393 0.196 0.023 0.268 0.380 0.518 0.814 1.002 1000

B.state[4] -0.164 0.209 -0.607 -0.290 -0.149 -0.041 0.228 1.003 590

B.state[5] -0.054 0.141 -0.322 -0.143 -0.061 0.035 0.229 1.001 1000

B.state[6] 0.126 0.206 -0.313 0.010 0.126 0.256 0.512 1.011 1000

B.state[7] 0.095 0.183 -0.263 -0.023 0.087 0.207 0.466 1.004 490

B.state[8] -0.210 0.207 -0.666 -0.322 -0.194 -0.080 0.155 1.001 1000

B.state[9] -2.648 0.728 -4.291 -3.067 -2.602 -2.187 -1.385 1.007 290

B.state[10] 0.097 0.173 -0.296 -0.010 0.115 0.214 0.402 1.014 270

B.state[11] -0.138 0.173 -0.467 -0.253 -0.148 -0.034 0.240 1.005 1000



Conclusions


Raw graphical display

80% interval for each chain R−hat

−4

−4

−2

−2

0

0

2

2

1 1.5 2+

1 1.5 2+

1 1.5 2+

1 1.5 2+

1 1.5 2+

1 1.5 2+

B.0 ●●●

b.female ●●●

b.black ●●●

b.female.black ●●●

B.age[1] ●●●

[2] ●●●

[3] ●●●

[4] ●●●

B.edu[1] ●●●

[2] ●●●

[3] ●●●

[4] ●●●

B.age.edu[1,1] ●●●

[1,2] ●●●

[1,3] ●●●

[1,4] ●●●

[2,1] ●●●

[2,2] ●●●

[2,3] ●●●

[2,4] ●●●

[3,1] ●●●

[3,2] ●●●

B.state[1] ●●●

[2] ●●●

[3] ●●●

[4] ●●●

[5] ●●●

[6] ●●●

[7] ●●●

[8] ●●●

[9] ●●●

[10] ●●●

B.region[1] ●●●

[2] ●●●

[3] ●●●

[4] ●●●

[5] ●●●

Sigma.age ●●●

Sigma.edu ●●●

Sigma.age.edu ●●●

Sigma.state ●●●

Sigma.region ●●●

*

*

* array truncated for lack of space

medians and 80% intervals

B.00

0.20.40.6

●●●

b.female−0.3−0.2−0.1

00.1

●●●

b.black−2.5−2

−1.5−1

●●●

b.female.black−1

−0.50

0.5●●●

B.age−0.4−0.2

00.20.4

●●●

111111111

●●●

222222222

●●●

333333333

●●●

444444444

B.edu−0.5

00.5

●●●

111111111

●●●

222222222

●●●

333333333

●●●

444444444

B.age.edu−1

−0.50

0.5●●●

111111111111111111

●●●

222222222

●●●

333333333

●●●

444444444

●●●

222222222111111111

●●●

222222222

●●●

333333333

●●●

444444444

●●●

333333333111111111

●●●

222222222

●●●

333333333

●●●

444444444

●●●

444444444111111111

●●●

222222222

●●●

333333333

●●●

444444444

B.state−4−202

●●●

111111111

●●●

222222222

●●●

333333333

●●●

444444444

●●●

555555555

●●●

666666666

●●●

777777777

●●●

888888888●●●

999999999

●●●

101010101010101010

●●● ●●●

121212121212121212

●●●●●●

141414141414141414

●●● ●●●

161616161616161616

●●● ●●●

181818181818181818

●●● ●●●

202020202020202020

●●● ●●●

222222222222222222

●●● ●●●

242424242424242424

●●● ●●●

262626262626262626

●●●●●●

282828282828282828

●●● ●●●

303030303030303030

●●● ●●●

323232323232323232

●●● ●●●

343434343434343434

●●● ●●●

363636363636363636

●●● ●●●

383838383838383838

●●● ●●●

404040404040404040

*

B.region−1

−0.50

0.51

●●●

111111111

●●●

222222222

●●●

333333333

●●●

444444444

●●●

555555555

Sigma.age0

0.20.40.6

●●●

Sigma.edu0

0.51

●●●

Sigma.age.edu0

0.10.20.3

●●●

Sigma.state0

0.10.20.30.4

●●●

Sigma.region0

0.51

●●●

deviance2580260026202640

●●●

Bugs model at "C:/books/multilevel/election88/model4.bug", 3 chains, each with 2001 iterations



Conclusions


Better graphical display 1: demographics

femaleblackfemale x black

18−2930−4445−6465+

no h.s.high schoolsome collegecollege grad

18−29 x no h.s.18−29 x high school18−29 x some college18−29 x college grad



65+ x no h.s.65+ x high school65+ x some college65+ x college grad

−2.5

−2.5

−2

−2

−1.5

−1.5

−1

−1

−0.5

−0.5

0

0

0.5

0.5

1

1



Conclusions


Better graphical display 2: within states

−2.0 −1.0 0.00.0

0.4

0.8

Alaska

linear predictor

Pr

(sup

port

Bus

h)

−2.0 −1.0 0.00.0

0.4

0.8

Arizona

linear predictor

Pr

(sup

port

Bus

h)

−2.0 −1.0 0.00.0

0.4

0.8

Arkansas

linear predictor

Pr

(sup

port

Bus

h)

−2.0 −1.0 0.00.0

0.4

0.8

California

linear predictor

Pr

(sup

port

Bus

h)

−2.0 −1.0 0.00.0

0.4

0.8

Colorado

linear predictor

Pr

(sup

port

Bus

h)

−2.0 −1.0 0.00.0

0.4

0.8

Connecticut

linear predictor

Pr

(sup

port

Bus

h)

−2.0 −1.0 0.00.0

0.4

0.8

Delaware

linear predictor

Pr

(sup

port

Bus

h)

−2.0 −1.0 0.00.0

0.4

0.8

District of Columbia

linear predictor

Pr

(sup

port

Bus

h)Andrew Gelman Q’s and A’s on multilevel models


Conclusions


Better graphical display 3: between states

Northeast

R vote in prev elections

regr

essi

on in

terc

ept

0.5 0.6 0.7

−0.

50.

00.

5

CTDEME

MDMA

NHNJ

NYPA

RI

VTWV

Midwest


regr

essi

on in

terc

ept

0.5 0.6 0.7

−0.

50.

00.

5

IL

IN

IA

KS

MI

MN

MONEND

OH

SDWI

South


regr

essi

on in

terc

ept

0.5 0.6 0.7

−0.

50.

00.

5 AL

AR FLGAKY LAMSNC

OKSCTN

TX

VA

West


regr

essi

on in

terc

ept

0.5 0.6 0.7

−0.

50.

00.

5

AKAZ

CA COHIID

MTNV

NMOR

UT

WAWY



Conclusions


Anova and multilevel models

I Each row of the Anova table is a variance componentI Goal

I How important is each source of variation?I Estimating and comparing variance componentsI Not testing if a variance component equals 0

I Multilevel regression solves classical Anova problems



Conclusions








Conclusions








Conclusions








Conclusions








Conclusions








Conclusions








Conclusions


Raw display of inference

mean sd 2.5% 25% 50% 75% 97.5% Rhat n.eff

B.0 0.402 0.147 0.044 0.326 0.413 0.499 0.652 1.024 110

b.female -0.094 0.102 -0.283 -0.162 -0.095 -0.034 0.107 1.001 1000

b.black -1.701 0.305 -2.323 -1.910 -1.691 -1.486 -1.152 1.014 500

b.female.black -0.143 0.393 -0.834 -0.383 -0.155 0.104 0.620 1.007 1000

B.age[1] 0.084 0.088 -0.053 0.012 0.075 0.140 0.277 1.062 45

B.age[2] -0.072 0.087 -0.260 -0.121 -0.054 -0.006 0.052 1.017 190

B.age[3] 0.044 0.077 -0.105 -0.007 0.038 0.095 0.203 1.029 130

B.age[4] -0.057 0.096 -0.265 -0.115 -0.052 0.001 0.133 1.076 32

B.edu[1] -0.148 0.131 -0.436 -0.241 -0.137 -0.044 0.053 1.074 31

B.edu[2] -0.022 0.082 -0.182 -0.069 -0.021 0.025 0.152 1.028 160

B.edu[3] 0.148 0.112 -0.032 0.065 0.142 0.228 0.370 1.049 45

B.edu[4] 0.023 0.090 -0.170 -0.030 0.015 0.074 0.224 1.061 37

B.age.edu[1,1] -0.044 0.133 -0.363 -0.104 -0.019 0.025 0.170 1.018 1000

B.age.edu[1,2] 0.059 0.123 -0.153 -0.011 0.032 0.118 0.353 1.016 580

B.age.edu[1,3] 0.049 0.124 -0.146 -0.023 0.022 0.104 0.349 1.015 280

B.age.edu[1,4] 0.001 0.116 -0.237 -0.061 0.000 0.052 0.280 1.010 1000

B.age.edu[2,1] 0.066 0.152 -0.208 -0.008 0.032 0.124 0.449 1.022 140

B.age.edu[2,2] -0.081 0.127 -0.407 -0.137 -0.055 0.001 0.094 1.057 120

B.age.edu[2,3] -0.004 0.102 -0.226 -0.048 0.000 0.041 0.215 1.008 940

B.age.edu[2,4] -0.042 0.108 -0.282 -0.100 -0.026 0.014 0.157 1.017 170

B.age.edu[3,1] 0.034 0.135 -0.215 -0.030 0.009 0.091 0.361 1.021 230

B.age.edu[3,2] 0.007 0.102 -0.213 -0.039 0.003 0.052 0.220 1.019 610

B.age.edu[3,3] 0.033 0.130 -0.215 -0.029 0.009 0.076 0.410 1.080 61

B.age.edu[3,4] -0.009 0.109 -0.236 -0.064 -0.005 0.043 0.214 1.024 150

B.age.edu[4,1] -0.141 0.190 -0.672 -0.224 -0.086 -0.003 0.100 1.036 270

B.age.edu[4,2] -0.014 0.119 -0.280 -0.059 -0.008 0.033 0.239 1.017 240

B.age.edu[4,3] 0.046 0.132 -0.192 -0.024 0.019 0.108 0.332 1.010 210

B.age.edu[4,4] 0.042 0.142 -0.193 -0.022 0.016 0.095 0.377 1.015 160

B.state[1] 0.201 0.211 -0.131 0.047 0.172 0.326 0.646 1.003 960

B.state[2] 0.466 0.252 0.008 0.310 0.440 0.603 1.047 1.001 1000

B.state[3] 0.393 0.196 0.023 0.268 0.380 0.518 0.814 1.002 1000

B.state[4] -0.164 0.209 -0.607 -0.290 -0.149 -0.041 0.228 1.003 590

B.state[5] -0.054 0.141 -0.322 -0.143 -0.061 0.035 0.229 1.001 1000

B.state[6] 0.126 0.206 -0.313 0.010 0.126 0.256 0.512 1.011 1000

B.state[7] 0.095 0.183 -0.263 -0.023 0.087 0.207 0.466 1.004 490

B.state[8] -0.210 0.207 -0.666 -0.322 -0.194 -0.080 0.155 1.001 1000

B.state[9] -2.648 0.728 -4.291 -3.067 -2.602 -2.187 -1.385 1.007 290

B.state[10] 0.097 0.173 -0.296 -0.010 0.115 0.214 0.402 1.014 270

B.state[11] -0.138 0.173 -0.467 -0.253 -0.148 -0.034 0.240 1.005 1000



Conclusions


Bayesian AnovaSource df Est. sd of effects

0 0.5 1 1.5

sex 1ethnicity 1

sex * ethnicity 1

age 3education 3

age * education 9

region 3region * state 46

0 0.5 1 1.5

Source df Est. sd of effects0 0.5 1 1.5

sex 1ethnicity 1

sex * ethnicity 1age 3

education 3age * education 9

region 3region * state 46

ethnicity * region 3ethnicity * region * state 46

0 0.5 1 1.5



Conclusions


Fixed and random effects?

I What are “fixed” and “random” effects?

I Five incompatible definitions:

1. Fixed effects are constant across individuals; randomeffects vary (Leeuw, 1998)

2. Effects are fixed if they are interesting in themselves,random if you care about the population (Searle, 1992)

3. Fixed effects are the entire population, random are asmall sample from a larger population (Tukey, 1960)

4. Random effects are realized values of a random variable(LaMotte, 1983)

5. Fixed effects are estimated using least squares, randomeffects are esitmated using shrinkage (Snijders, 1999)



Conclusions












Conclusions












Conclusions












Conclusions












Conclusions












Conclusions












Conclusions


How to think about fixed and random effects

I Ideally, allow all coefficients to vary by groupI Main limitation: complicated models can be overwhelming

I Bayesian multilevel modelingI Simultaneously estimate population parameters and individual

coefficientsI Suppose you are estimating a finite set of effects,

then told they are a sample from a larger populationI No need to change the modelI But estimand of interest might change!

I Separation of modeling, inference, and decision analysis



Conclusions










Conclusions










Conclusions










Conclusions










Conclusions










Conclusions










Conclusions










Conclusions










Conclusions


Average predictive effects

I What is E (y | x1 = high) − E(y | x1 = low), with all other x ’sheld constant?

u

v

y



Conclusions




I In general, difference can depend on xI Average over distribution of x in the data

I You can’t just use a central value of x

I Compute APE for each input variable x

I Multilevel factors are categorical input variables



Conclusions










Conclusions










Conclusions










Conclusions










Conclusions










Conclusions


APE: why you can’t just use a central value of x

−6 −4 −2 0 2 4 6 80.0

0.2

0.4

0.6

0.8

1.0

v

E(y

|u, v

)

u = 1 u = 0

| || || ||| | |||| | ||| ||| | || || || || ||| || ||| | | |

avg pred effect = 0.03

pred effect at E(v) = 0.24

−6 −4 −2 0 2 4 6 80.0

0.2

0.4

0.6

0.8

1.0

v

E(y

|u, v

)

u = 1 u = 0

||| | | || ||| ||| ||| | | |||| || | |||| ||| |||||| ||

avg pred effect = 0.11

pred effect at E(v) = 0.03



Conclusions


Framework for average predictive effects

I Regresion model, E (y |x , θ)I Predictors come from “input variables”

I Example: regression on age, sex, age × sex, and age2

I 5 linear predictors (including the constant term)I But only 4 inputs

I Compute APE for each input variable, one at a time, with allothers held constant

I Scalar input u: the “input of interest”I Vector v : all other inputs



Conclusions










Conclusions










Conclusions










Conclusions










Conclusions










Conclusions










Conclusions










Conclusions


Defining predictive effects

I predictive effect:

δu(u(1)→u(2), v , θ) = E(y |u(2),v ,θ)−E(y |u(1),v ,θ)

u(2)−u(1)

I Average over:I The transition, u(1) → u(2)

I The other inputs, vI The regression coefficients, θ



Conclusions





u(2)−u(1)





Conclusions





u(2)−u(1)





Conclusions





u(2)−u(1)





Conclusions





u(2)−u(1)





Conclusions





u(2)−u(1)





Conclusions


Average predictive effects for binary inputs



u(2)−u(1)

I Binary input u:I predictive effect: δu(0 → 1, v , θ) = E (y |1, v , θ)− E (y |0, v , θ)I Average over v1, . . . , vn in the data (or weighted average if

desired)I Average over θ from inferential simulationsI Standard error of APE from uncertainty in θ



Conclusions





u(2)−u(1)





Conclusions





u(2)−u(1)





Conclusions





u(2)−u(1)





Conclusions





u(2)−u(1)





Conclusions





u(2)−u(1)





Conclusions


Scenarios for average predictive effects



u(2)−u(1)

I Continuous inputs

I Unordered discrete inputs

I Variance components

I Interactions

I Inputs that are not always active



Conclusions





u(2)−u(1)

I Continuous inputs



I Interactions




Conclusions





u(2)−u(1)

I Continuous inputs



I Interactions




Conclusions





u(2)−u(1)

I Continuous inputs



I Interactions




Conclusions





u(2)−u(1)

I Continuous inputs



I Interactions




Conclusions





u(2)−u(1)

I Continuous inputs



I Interactions




Conclusions


R2 for multilevel models

I How much of the variance is “explained” by the model?

I Separate R2 for each level

I Classical R2 = 1− variance of the residualsvariance of the data

I Multilevel model:at each level, k units: θk = (Xβ)k + εk

I At each level: R2 = 1− variance among the (Xβ)k ’svariance among the εk ’s



Conclusions










Conclusions










Conclusions










Conclusions










Conclusions










Conclusions


Bayesian R2

I At each levelI θk = (Xβ)k + εk

I R2 = 1− variance among the (Xβ)k ’svariance among the εk ’s

I Numerator and denominator estimated by their posteriormeans

I Posterior distribution automatically accounts for uncertainty

I Bayesian generalization of classical “adjusted R2”



Conclusions


Bayesian R2








Conclusions


Bayesian R2








Conclusions


Bayesian R2








Conclusions


Bayesian R2








Conclusions


Bayesian R2








Conclusions


Example of partial pooling

LAC QUI PARLE

basement

log

rado

n le

vel

0 1

−1

13

AITKIN

basement

log

rado

n le

vel

0 1

−1

13

KOOCHICHING

basement

log

rado

n le

vel

0 1

−1

13

DOUGLAS

basement

log

rado

n le

vel

0 1

−1

13

CLAY

basement

log

rado

n le

vel

0 1

−1

13

STEARNS

basement

log

rado

n le

vel

0 1

−1

13

RAMSEY

basement

log

rado

n le

vel

0 1−

11

3

ST LOUIS

basement

log

rado

n le

vel

0 1

−1

13



Conclusions


Partial pooling factors

I At each level of the model:I θk = (Xβ)k + εk

I λ = 0 if no pooling of ε’sI λ = 0 if complete pooling of ε’s to 0

I Multilevel generalization of Bayesian pooling factorI Can’t simply compare to the “complete pooling” and “no

pooling” estimatesI “No pooling” estimate doesn’t always exist!

I At each level, our pooling factor is defined based on the meanand variance of the εk ’s



Conclusions










Conclusions










Conclusions










Conclusions










Conclusions










Conclusions










Conclusions










Conclusions










Conclusions

Understanding sources of variationMultilevel models even when # groups is small

Understanding sources of variation

I Graphs, not tables, of parameter estimates

I In display, use the grouping infoI Analysis of variance

I Summarize the scale of each batch of predictorsI Go beyond classical null-hypothesis-testing framework

I Average predictive effects for models with nonlinearity andinteractions

I Generalization of R2 (explained variance), defined at eachlevel of the model

I Partial pooling factor, defined at each level



Conclusions











Conclusions











Conclusions











Conclusions











Conclusions











Conclusions











Conclusions











Conclusions











Conclusions


Multilevel models even when # groups is small

I Forget about “fixed and random effects”; think about“finite-pop and superpop estimands” instead

I Always use the multilevel model, but estimand of interestdepends on context

I For small # groups: use the new half-t prior dist for varianceparameters

I Challenges:I Multivariate models (for example, varying-intercept,

varying-slope models)I Models with deep interaction structuresI Automatic graphical displayI Model building



Conclusions










Conclusions










Conclusions










Conclusions










Conclusions










Conclusions










Conclusions










Conclusions









Some questions (and a few answers) about multilevel modelsgelman/presentations/cdctalk.pdf · Some questions (and a few answers) about multilevel models Andrew Gelman Department of

Documents