De�nitions Estimation Inference Challenges & open questions References
Generalized linear mixed models
Ben Bolker
McMaster University, Mathematics & Statistics and Biology
26 September 2014
De�nitions Estimation Inference Challenges & open questions References
Acknowledgments
lme4: Doug Bates, MartinMächler, Steve Walker
Data: Josh Banta, Adrian Stier,Sea McKeon, David Julian,Jada-Simone White
NSERC (Discovery)
SHARCnet
De�nitions Estimation Inference Challenges & open questions References
Outline
1 Examples and de�nitions
2 EstimationOverviewMethods
3 Inference
4 Challenges & open questions
De�nitions Estimation Inference Challenges & open questions References
Outline
1 Examples and de�nitions
2 EstimationOverviewMethods
3 Inference
4 Challenges & open questions
De�nitions Estimation Inference Challenges & open questions References
(Generalized) linear mixed models
(G)LMMs: a statistical modeling framework incorporating:
combinations of categorical and continuous predictors,and interactions
(some) non-Normal responses(e.g. binomial, Poisson, and extensions)
(some) nonlinearity(e.g. logistic, exponential, hyperbolic)
non-independent (grouped) data
De�nitions Estimation Inference Challenges & open questions References
(Generalized) linear mixed models
(G)LMMs: a statistical modeling framework incorporating:
combinations of categorical and continuous predictors,and interactions
(some) non-Normal responses(e.g. binomial, Poisson, and extensions)
(some) nonlinearity(e.g. logistic, exponential, hyperbolic)
non-independent (grouped) data
De�nitions Estimation Inference Challenges & open questions References
(Generalized) linear mixed models
(G)LMMs: a statistical modeling framework incorporating:
combinations of categorical and continuous predictors,and interactions
(some) non-Normal responses(e.g. binomial, Poisson, and extensions)
(some) nonlinearity(e.g. logistic, exponential, hyperbolic)
non-independent (grouped) data
De�nitions Estimation Inference Challenges & open questions References
least−squaresnonlinear
generalizedlinear models
} correlation
smooth
nonlinearityscaled
variance
effectsrandom
} (non−normal errors)
nonlineartime seriesmodels
thresholds;mixtures;compound distributions;etc. etc. etc. etc.
nonlinearity
effects
randomnonlinearity
correlation
(nonlinearity)
generallinear models
logistic regressionbinomial regressionlog−linear models
linear regressionANOVAanalysis of covariancemultiple linear regression
(non−normal errors)(nonlinearity)
GLMM
mixed models
repeated−measurestime series (ARIMA)models;
generalizedadditivemodels
quasilikelihood negativebinomial models
De�nitions Estimation Inference Challenges & open questions References
Coral protection from seastars (Culcita)by symbionts (McKeon et al., 2012)
none shrimp crabs both
Number of predation events
Symbionts
Num
ber
of b
lock
s
0
2
4
6
8
10
1
2
0
1
2
0
2
0
1
2
De�nitions Estimation Inference Challenges & open questions References
Environmental stress: Glycera cell survival(D. Julian unpubl.)
H2S
Cop
per
0
33.3
66.6
133.3
0 0.03 0.1 0.32
Osm=12.8Normoxia
Osm=22.4Normoxia
0 0.03 0.1 0.32
Osm=32Normoxia
Osm=41.6Normoxia
0 0.03 0.1 0.32
Osm=51.2Normoxia
Osm=12.8Anoxia
0 0.03 0.1 0.32
Osm=22.4Anoxia
Osm=32Anoxia
0 0.03 0.1 0.32
Osm=41.6Anoxia
0
33.3
66.6
133.3
Osm=51.2Anoxia
0.0
0.2
0.4
0.6
0.8
1.0
De�nitions Estimation Inference Challenges & open questions References
Arabidopsis response to fertilization & herbivory(Banta et al., 2010)
Log(
1+fr
uit s
et)
0
1
2
3
4
5
unclipped clipped
●●●●●●
●●
●
●●
●●
●
●
●
●●
●
●
●●●
●
●
●
●●● ●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●● ●● ●
●
●●
●
●●
●
●●
●
●
●
●
●● ●
●●
●
●
●
● ●●● ●●● ●●● ●●● ●● ●● ●● ●
●
● ●● ●●● ●●●●
●
●
●
●●
●
●
● ●●
●
●
●●●●●● ●●
●
●● ●
●
●
● ●
●
●
●
●
●
: nutrient 1
unclipped clipped
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
● ●●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●●
●●●
●●
●
●●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●● ●● ●●● ●●●● ●
●
●
●●
●
●
●●
●
●
●●●●●●● ●●● ●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●● ●●●●●●●
●
●●
●
●
●
●
●
●
●
●
●
●
: nutrient 8
De�nitions Estimation Inference Challenges & open questions References
Coral demography(J.-S. White unpubl.)
Before Experimental
●
●
●● ●
●●●
●●
●● ●
●●●
● ●●
●
●●●
●● ●● ●
●
●
●●
●● ●●
● ●●● ●
●
●● ●● ●●●●● ●
●●
● ●● ●
●
●● ●
● ●
●
●●●
●
●
●
●
●●●●
● ●●
●●
● ●
● ●● ●●
●●●
●
●
● ●●
●●
●
●●●
●●●
●●● ●●● ●●
●
●
●
●
●●●●
●
●
●
● ● ●●●
●
●
●
●●
● ●●●● ●●
●
●
●
●
●●●
●
●
●●● ●
●● ●●
●
●●●
●
●●
●● ●●
●●● ●●●●
●
●●● ●●
●
●● ●
●
● ●
●
●●0.00
0.25
0.50
0.75
1.00
0 10 20 30 40 50 0 10 20 30 40 50Previous size (cm)
Mor
talit
y pr
obab
ility
Treatment
●
●
Present
Removed
De�nitions Estimation Inference Challenges & open questions References
Technical de�nition
Yi︸︷︷︸response
∼
conditionaldistribution︷︸︸︷Distr (g−1(ηi )︸ ︷︷ ︸
inverselink
function
, φ︸︷︷︸scale
parameter
)
De�nitions Estimation Inference Challenges & open questions References
Technical de�nition
Yi︸︷︷︸response
∼
conditionaldistribution︷︸︸︷Distr (g−1(ηi )︸ ︷︷ ︸
inverselink
function
, φ︸︷︷︸scale
parameter
)
η︸︷︷︸linear
predictor
= Xβ︸︷︷︸�xede�ects
+ Zb︸︷︷︸randome�ects
De�nitions Estimation Inference Challenges & open questions References
Technical de�nition
Yi︸︷︷︸response
∼
conditionaldistribution︷︸︸︷Distr (g−1(ηi )︸ ︷︷ ︸
inverselink
function
, φ︸︷︷︸scale
parameter
)
η︸︷︷︸linear
predictor
= Xβ︸︷︷︸�xede�ects
+ Zb︸︷︷︸randome�ects
b︸︷︷︸conditionalmodes
∼ MVN(0, Σ(θ)︸ ︷︷ ︸variance-covariancematrix
)
De�nitions Estimation Inference Challenges & open questions References
What are random e�ects?
A method for . . .
accounting for among-individual, within-block correlation
compromising between complete pooling (no among-blockvariance)and �xed e�ects (in�nite among-block variance)
handling levels selected at random from a larger population
sharing information among levels (shrinkage estimation)
estimating variability among levels
allowing predictions for unmeasured levels
De�nitions Estimation Inference Challenges & open questions References
What are random e�ects?
A method for . . .
accounting for among-individual, within-block correlation
compromising between complete pooling (no among-blockvariance)and �xed e�ects (in�nite among-block variance)
handling levels selected at random from a larger population
sharing information among levels (shrinkage estimation)
estimating variability among levels
allowing predictions for unmeasured levels
De�nitions Estimation Inference Challenges & open questions References
What are random e�ects?
A method for . . .
accounting for among-individual, within-block correlation
compromising between complete pooling (no among-blockvariance)and �xed e�ects (in�nite among-block variance)
handling levels selected at random from a larger population
sharing information among levels (shrinkage estimation)
estimating variability among levels
allowing predictions for unmeasured levels
De�nitions Estimation Inference Challenges & open questions References
What are random e�ects?
A method for . . .
accounting for among-individual, within-block correlation
compromising between complete pooling (no among-blockvariance)and �xed e�ects (in�nite among-block variance)
handling levels selected at random from a larger population
sharing information among levels (shrinkage estimation)
estimating variability among levels
allowing predictions for unmeasured levels
De�nitions Estimation Inference Challenges & open questions References
What are random e�ects?
A method for . . .
accounting for among-individual, within-block correlation
compromising between complete pooling (no among-blockvariance)and �xed e�ects (in�nite among-block variance)
handling levels selected at random from a larger population
sharing information among levels (shrinkage estimation)
estimating variability among levels
allowing predictions for unmeasured levels
De�nitions Estimation Inference Challenges & open questions References
What are random e�ects?
A method for . . .
accounting for among-individual, within-block correlation
compromising between complete pooling (no among-blockvariance)and �xed e�ects (in�nite among-block variance)
handling levels selected at random from a larger population
sharing information among levels (shrinkage estimation)
estimating variability among levels
allowing predictions for unmeasured levels
De�nitions Estimation Inference Challenges & open questions References
Outline
1 Examples and de�nitions
2 EstimationOverviewMethods
3 Inference
4 Challenges & open questions
De�nitions Estimation Inference Challenges & open questions References
Maximum likelihood estimation
Best �t is a compromise between two components(consistency of data with �xed e�ects and conditional modes;consistency of random e�ect with RE distribution)
Goodness-of-�t integrates over conditional modes
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
−2
−1
0
1
2
1 2 3 4 5f
y
De�nitions Estimation Inference Challenges & open questions References
Shrinkage: Arabidopsis conditional modes
● ●
●
●
●
●
●●
● ● ●●
● ● ● ●● ● ● ●
● ● ●●
Genotype
Mea
n fr
uit s
et
0 5 10 15 20 25
0
0.1
1
1020
● group meanshrinkage est.
De�nitions Estimation Inference Challenges & open questions References
Estimation methods
deterministic : various approximate integrals (Breslow, 2004)
Penalized quasi-likelihood, Laplace,Gauss-Hermite quadrature, . . .�exibility and speed vs. accuracy
. . .
stochastic (Monte Carlo): frequentist and Bayesian (Booth andHobert, 1999; Ponciano et al., 2009; Sung, 2007)
usually slower but �exible and accurate
De�nitions Estimation Inference Challenges & open questions References
Estimation: Culcita (McKeon et al., 2012)
Log−odds of predation−6 −4 −2 0 2
Symbiont
Crab vs. Shrimp
Added symbiont
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
GLM (fixed)GLM (pooled)PQLLaplaceAGQ
De�nitions Estimation Inference Challenges & open questions References
Outline
1 Examples and de�nitions
2 EstimationOverviewMethods
3 Inference
4 Challenges & open questions
De�nitions Estimation Inference Challenges & open questions References
Wald tests
typical results of summary
exact for ANOVA, regression:approximation for GLM(M)s
fast
approximation is sometimesawful (Hauck-Donner e�ect) parameter
log−
likel
ihoo
d
De�nitions Estimation Inference Challenges & open questions References
2D pro�les for Culcita data
Scatter Plot Matrix
.sig01
2 4 6 8 101214
−3−2−1
0
(Intercept)
0
5
10
1510 15
0 1 2 3
tttcrabs
−10−8−6−4−20
−4 −2 0
0 1 2 3
tttshrimp
−10−8−6−4−2 −6 −4 −2
0 1 2 3
tttboth
−12−10−8−6−4−2
0 1 2 3
De�nitions Estimation Inference Challenges & open questions References
Likelihood ratio tests
better than Wald, but still have to two problems:
�denominator degrees of freedom� (when estimating scale)for GLMMs, distributions are approximate anyway (Bartlettcorrections)Kenward-Roger correction? (Stroup, 2014)
Pro�le con�dence intervals: expensive/fragile
De�nitions Estimation Inference Challenges & open questions References
Parametric bootstrapping
�t null model to data
simulate �data� from null model
�t null and working model, compute likelihood di�erence
repeat to estimate null distribution
should be OK but ??? not well tested(assumes estimated parameters are �su�ciently� good)
De�nitions Estimation Inference Challenges & open questions References
Parametric bootstrap results
True p value
Infe
rred
p v
alue
0.020.040.060.08
0.02 0.06
Osm Cu
H2S
0.02 0.06
0.020.040.060.08
Anoxia
normalt(14)t(7)
De�nitions Estimation Inference Challenges & open questions References
Bayesian inference
If we have a good sample from the posterior distribution(Markov chains have converged etc. etc.) we get most of theinferences we want for free by summarizing the marginalposteriors
post hoc Bayesian can work, but mode at zero causes problems
De�nitions Estimation Inference Challenges & open questions References
Culcita con�dence intervals
●●●
●●●
●●●
●●●
●●●
block
(Intercept)
tttboth
tttcrabs
tttshrimp
−15 −10 −5 0 5 10 15Effect (log−odds of predation)
CI
●
●
●
●
Wald
profile
boot
MCMC
fun
● glmer
glmmADMB
MCMCglmm
De�nitions Estimation Inference Challenges & open questions References
Outline
1 Examples and de�nitions
2 EstimationOverviewMethods
3 Inference
4 Challenges & open questions
De�nitions Estimation Inference Challenges & open questions References
On beyond R
Julia: MixedModels package
SAS: PROC MIXED, NLMIXED
AS-REML
Stata (GLLAMM, xtmelogit)
AD Model Builder
HLM, MLWiN
De�nitions Estimation Inference Challenges & open questions References
Challenges
Small/medium data: inference, singular �ts (blme, MCMCglmm)
Big data: speed!
Worst case: large n, small N (e.g. telemetry/genomics)
Model diagnosis
Con�dence intervals accounting for uncertainty in variances
See also: http://rpubs.com/bbolker/glmmchapter, https://groups.nceas.ucsb.edu/non-linear-modeling/projects
De�nitions Estimation Inference Challenges & open questions References
Spatial and temporal correlations
Sometimes blocking takes care of non-independence ...
but sometimes there is temporal or spatial correlation within
blocks
. . . also phylogenetic . . . (Ives and Zhu, 2006)
�G-side� vs. �R-side� e�ects
tricky to implement for GLMMs,but new possibilities on the horizon (Rousset and Ferdy, 2014;Rue et al., 2009)
De�nitions Estimation Inference Challenges & open questions References
Next steps
Complex random e�ects:regularization, model selection, penalized methods(lasso/fence)
Flexible correlation and variance structures
Flexible/nonparametric random e�ects distributions
hybrid & improved MCMC methods
Reliable assessment of out-of-sample performance
De�nitions Estimation Inference Challenges & open questions References
References
Banta, J.A., Stevens, M.H.H., and Pigliucci, M., 2010. Oikos, 119(2):359�369. ISSN 1600-0706.doi:10.1111/j.1600-0706.2009.17726.x.
Booth, J.G. and Hobert, J.P., 1999. Journal of the Royal Statistical Society. Series B, 61(1):265�285.doi:10.1111/1467-9868.00176.
Breslow, N.E., 2004. In D.Y. Lin and P.J. Heagerty, editors, Proceedings of the second Seattlesymposium in biostatistics: Analysis of correlated data, pages 1�22. Springer. ISBN 0387208623.
Ives, A.R. and Zhu, J., 2006. Ecological Applications, 16(1):20�32.
McKeon, C.S., Stier, A., et al., 2012. Oecologia, 169(4):1095�1103. ISSN 0029-8549.doi:10.1007/s00442-012-2275-2.
Ponciano, J.M., Taper, M.L., et al., 2009. Ecology, 90(2):356�362. ISSN 0012-9658.
Rousset, F. and Ferdy, J.B., 2014. Ecography, page no�no. ISSN 1600-0587. doi:10.1111/ecog.00566.
Rue, H., Martino, S., and Chopin, N., 2009. Journal of the Royal Statistical Society, Series B,71(2):319�392.
Stroup, W.W., 2014. Agronomy Journal, 106:1�17. doi:10.2134/agronj2013.0342.
Sung, Y.J., 2007. The Annals of Statistics, 35(3):990�1011. ISSN 0090-5364.doi:10.1214/009053606000001389.