Hierarchical Bayesian Modeling of Heterogeneity in the
Association between Milk Production and Reproductive Performance of
Dairy Cows
Beyond the Generalized Linear Mixed Model:a Hierarchical
Bayesian Perspective
Robert J. Tempelman,Professor
Department of Animal ScienceMichigan State UniversityEast
Lansing, MI, USAKSU Conference on Applied Statistics in
Agriculture,April 30, 2012
1It is safe to say that improper attention to the presence of
random effects is one of the most common and serious mistakes in
the statistical analysis of dataLittell, R.C, W.W. Stroup and R.J.
Freund. SAS System for Linear Models (2002) pg. 92
This statement was likely intended to apply to biologists
analyzing their own data
It surely does not apply to the experts.right?
2Mixed models in genetics & genomics and agricultureHave we
often thought carefully about how we use mixed models?Do we
sometimes mis-state the appropriate scope of inference? (lapses in
both designand analyses??)Do we always fully appreciate/stipulate
what knowledge we are conditioning on in data analyses?Are there
too far many other efficiencies going untapped?Shrinkage is a good
thing Allison et al. (2006)Hierarchical/Mixed model inference
should be carefully calibrated to exploit data structure while
maintaining integrity of inference scope and upfronting on any
conditioning specifications.Particularly important with GLMM as
opposed to LMM?
3Research and Public Credibility.A disheartening article Raise
standards for preclinical cancer research by Glenn Begley in Nature
(March 29, 2012)Out of 53 papers deemed to be landmark studies in
last decade, scientific findings were confirmed in only 6
cases.Some of these non-reproducible studies have spawned entire
fields (and were cited 100s of times!) and triggered subsequent
clinical trials.What happened?1) some studies based on small number
of cell lines (narrow scope!), 2) obsession to provide a perfectly
clean story; 3) poor data analyses too?
Would having the data available in the public domain help?Maybe
notconsider the case of gene expression microarrays.Data from
microarray studies are routinely deposited in public repositories
(GEO and Array Express)most based on very simple designs.Ioannidis,
J. P. A. et al. 2009. Repeatability of published microarray gene
expression analyses. Nature Genetics 41: 149-155.Out of 18 articles
on microarray-based gene expression profiling published in Nature
Genetics in 2005-2006, only two analyses could be reproduced in
principleWhy? Generally because of incomplete data annotation or
specification of data processing and analysis.
Outline of talkThe scope of inference issue!Its a little murky
sometimes,How scope depends on proper VC estimation.(Generalized)
linear mixed modelsThe implications of good/poor VC estimates on
proper scope of inference.Bayesian methods.Beyond the generalized
linear mixed modelHierarchical extensions provide a potentially
richer class of modelsand a calibration of scope of inference that
may best match intended scope.
6A ridiculously obvious exampleTreatment A vs. B; two mice per
treatmentSuppose you weigh each mouse 3
timesA1B1A220.121.222.121.520.021.322.221.520.121.322.221.520.06721.26722.16721.5How
many experimental (biological replicates)?A1A1B2Are the subsamples
(technical replicates) useful?Duh, Robits n = 2Oh yeah, well sureit
helps control measurement error7Rethinking experimental replication
(Based on a true story!)
Suppose you have one randomly selected (representative) litter
with 12 piglets.assign 4 piglets at random to each of 3 treatments
(A),(B), and (C)ACAAABBIs there any biological
replication???BCB
CCWell, actually no.n = 14 piglets /trt is better than 2 piglets
/trt -> but youre only controlling for measurement error or
subsampling with one litter8Ok, well lets now replicate.
Have three randomly (representative) selected litters with 6
pigs each.assign 2 pigs at random to each of 3 treatments (A),(B),
and (C) within each litter6 pigs per treatmentAAAABBHow many
experimental replicates per trt?
BC
C
C
B
AAB
C
C
BC9Scope of inferenceWell, it might depend on your scope of
inference.its n=6 (pigs) if intended scope of inference is intended
for just those three litters. (narrow scope)Its n=3 (litters) if
intended scope of inference is intended for population of litters
from which the three litters are a random sample. (broad scope)
Analysis better respects the experimental design and intended
scope.McLean, R. A., W. L. Sanders, and W. W. Stroup. 1991. A
unified approach to mixed linear-models. American Statistician 45:
54-64.
10How do we properly decipher scope of inferenceHierarchical
statistical modelingSPECIFICALLY, (generalized) linear mixed
modelsProper distinction of, say, litters as fixed versus random
effects.Previous (RCBD with subsampling) example:Litter effects as
fixed: Narrow scope (sometimes desired!)Litter effects as random:
Broad scope.specify Litter*Treatment as the experimental error->
determines n.Proper mixed model analysis helps delineate true
(experimental) from pseudo (subsampling) replication.
11Scope of inferenceFocus of much agricultural research is
ordinary replication (Johnson, 2006; Crop Science)All replication
conducted at a single research farm/station.If inference scope is
for all farms from which study farm is representative (i.e. farms
are random effects) then.Single farm study -> no replication in
principle: treatment x herd serves as the experimental error
term.Treatment inferences may be influenced by management
nuances.Similar criticism could be levied against university
research.Hence, continued need for multi-state research
projects.
12Meta scope of inferenceTreatments cross-classified with 6
random farms
Treatment ATreatment BTreatment C
Specify farm as fixed: n = 12 cows per trt
Specify farm random: n = 6 trt*farm replicates per group.
Shouldnt treatment effects be expected to be consistent across
farms?.....
13
You can never tell even in the best of cases.THE CURSE OF
ENVIRONMENTAL STANDARDIZATIONenvironmental standardization is a
cause rather than a cure for poor reproducibility of experimental
outcomes14We shouldnt aspire for environmental standardization in
agricultural researchFor the sake of agricultural sustainability
& organismic (plant and animal) plasticitywe just cant!2 MORE
billion people to feed by 2050on less land!management systems &
environments are changing more rapidly than animal populations can
adapt to such changes through natural selection (Hohenboken et al.,
2005)..e.g.Energy policy (corn distillers grain)More intensive
management (larger farms)Climate changeWhat are the implications
for agricultural statisticians? Even greater importance in terms of
using reliable inference procedures AND careful calibration on
scope of inference.
15Desired calibration of mixed model analyses requires reliable
inference proceduresBroader the scope, the greater the
importance.Under classical assumptions (CA), inference on treatment
effects depends on reliability of fixed effects and variance
component inference.Linear mixed models (LMM) under CA.No real
issuesweve already got great software (e.g. PROC MIXED)E-GLS based
on REML works reasonably well.ANOVA (METHOD= TYPE3) for balanced
designs might be even better (Littell et al., 2006)Common tool for
many agricultural statistical consulting centers
Analysis of non-normal data.i.e., binary/binomial/count
datamaybe a different storyFixed effects models (no design
structure)Generalized linear models (GLM) inference is based on
Wald tests/ likelihood ratio tests.Nice asymptotic/large sample
properties.Mixed effects models (treatment and design
structure).More at stake with generalized linear mixed models
(GLMM).Asymptotic inference on fixed effects conditioned
upon.asymptotic inference on variance components (VC)
From Rod Littles 2005 ASA Presidential addressDoes asymptotic
behavior really depend on just n
Impact of design complexity and number of fixed effects/random
effects factors/levels relative to n ?Murky sub-asymptotial
forestsHow many more to reach the promised land of
asymptotia?Status of VC inference in GLMMDefault method in SAS PROC
GLIMMIX: RSPL (PQL-based method).PQL has been discouraged
(McCulloch and Searle,2002) especially with binary data.generally
downward bias!Transformations of count data followed by LMM
analyses sometimes even advocated (Young et al. 1998).Method =
LAPLACE and Method = QUAD might be better suited.Agghbut ML-like
rather than REML- like.Also, cant use QUAD for all model
specifications.How big is this issue?PQL (RSPL) inference (Bolker
et al., 2009)As a rule works poorly for Poisson data when mean <
5 or for binomial data when y,n-y both < 5.Yet 95% of analyses
of binary responses (n=205) and 92% of Poisson responses with means
< 5 (n=48) and 89% of binomial responses with y Animal Science
Example1 2 Pen # 1 3 4 1 2 Pen # 2 3 4 1 2 Pen # 3 3 4 Diet11 2 Pen
# 1 3 4 1 2 Pen # 2 3 4 1 2 Pen # 3 3 4 Diet 2PEN serves a dual
role:Experimental Unit for DietBlock for Drug
Animal is experimental unit for drug4 animals per pen. 1 of 4
diets assigned to one animal within each pen. Pens numbered within
diet23Split Plot-> Plant Science Example1 2 Field # 1 3 4 3 2
Field # 2 4 1 3 1 Field # 3 2 4 Irrigation level 1Irrigation level
2Field serves a dual role:Experimental Unit for irrigation
levelBlock for variety
Plot is experimental unit for variety4 plots per field. 1 of 4
corn varieties assigned to one plots within each field. Fields
numbered within irrigation level4 2 Field # 1 3 1 2 1 Field # 2 4 3
3 1 Field # 3 4 2 24Split plot ANOVA (LMM)
So Inference on Treatment A (Whole Plot Factor) effects should
be more sensitive to VC inference than Treatment B (Sub Plot
Factor) effects.
Since s2e is constrained for binary data in GLMM (logit or
probit link), even less of an issue for Treatment B thereright?A
simulation study.Simulate data from a split plot design.A: whole
plot factor a=3 levels.B: sub plot factor b=3 levels.
B1 B2
B3 WP 1(A1)B3 B1
B2 WP 2(A1)B2 B3
B1 WP 3(A1)A1B1 B2
B3 WP 1(A2)B3 B1
B2 WP 2(A2)B2 B3
B1 WP 3(A2)A2B1 B2
B3 WP 1(A3)B3 B1
B2 WP 2(A3)B2 B3
B1 WP 3(A3)A3n = number of whole plot (WP) per whole plot
factor
n = 3 in figure.
Note: if data is binary at subplot level, it is binomial at
wholeplot level.
Simulation study detailsLets simulate normal (l) and binary (y)
data s2e = 1.00 (lets assume known)s2wp = 0.5Convert normal to
binary data as y = I(l > 0.5)
Note with binary data, s2e = 1.00 is not identifiable in probit
link GLMM (also for logit link).WholePlot factor: binomial; SubPlot
factor: binary.Lets compare standard errors of differences (A1 vs.
A2, B1 vs. B2) as functions of s2wp.
A1A2A3B1-1.0-0.50.0B2-0.50.00.5B30.00.51.0A1A2A3B10.0670.1580.308B20.1580.3080.500B30.3080.5000.692mijProb(l>0.5|mij)SED
of differences (A1 vs A2; B1 vs B2) for conventional LMM analysis
of Gaussian data (n=8)
s2wpStandard errors of A1 vs A2, B1 vs B2No surprises here.
WholePlot Factor Inferences are sensitive to s2wp.
SubPlot Factor Inferences are insensitive to s2wp.SED of
differences (A1 vs A2; B1 vs B2) for conventional (asymptotic) GLMM
analysis of binary data (n=8)s2wpStandard errors of A1 vs A2, B1 vs
B2WholePlot Factor Inferences are sensitive to s2wp.
But so are SubPlot Factor Inferences (albeit less so)!
Misspecification of VC have stronger implications for GLMM than
for LMM?
s2wpImplicationsIf underestimate s2wp -> then understate
standard errors, inflate Type I errors on conventional GLMM
inference on both marginal mean comparisons involving whole plot
AND subplot factors!Obvious opposite implications whenever
overestimating s2wp as well.What kind of performance on VC
estimation do we get with METHOD = RSPL (PQL), LAPLACE, QUADMCMC
methods?And what about Bayesian (MCMC) methods? Should you use
Bayesian mean? median? Others?Back to simulation studyConsider the
same 3 x 3 split plotConsider two different scenarios,n = 16
wholeplots / A level and n=4 wholeplots / A level.20 replicated
datasets for each comparison.Compare LAPLACE, QUAD, RSPL with
Bayesian estimates for estimates s2wp Bayesian estimates: mean or
median (others?).Prior on s2wp :
Prior variance not defined for v 4.
Scatterplot of VC estimates from 20 replicated datasets n = 16
wholeplots / A level
Everything lines up pretty well!
Boxplots of VC estimates from 20 replicated datasets n = 16
wholeplots / A levelProportionof reps withconvergence 19/20 16/20
17/20RSPL biased downwards(conventional wisdom)Scatterplot of VC
estimates from 20 replicated datasets n = 4 wholeplots / A
level
Much less agreementbetween methods
Boxplots of VC estimates from 20 replicated datasets n = 4
wholeplots / A level
Proportionof reps withconvergence 16/20 7/20 0/20Influenced by
prior?Are Bayesian point estimators better/worse than other GLMM VC
estimators?No clear answersI could have tried different
non-informative priors and actually got rather different point
estimates of VC for n = 4.Implications then for Bayesian inference
on fixed treatment effects? Embarassment of riches Inferences is
more than just point estimates; it involves entire posterior
densities!Bayesian inferences on fixed treatment effects average
out (integrate) over the uncertainty on VC.n = 4 might be so badly
underpowered that point is moot for this simulation studybut recall
Bolkers review!!!Any published formal comparisons between
GLS/REML/EB(M/PQL) and MCMC for GLMM?Check Browne and Draper
(2006).Normal data (LMM)Generally, inferences based on GLS/REML and
MCMC are sufficiently close.Since GLS/REML is faster, it is the
method of choice for classical assumptions.Non-normal data
(GLMM).Quasi-likelihood based methods are particularly problematic
in bias of point estimates and interval coverage of variance
components.Some fixed effects are poorly estimated too!Bayesian
methods with certain diffuse priors are well calibrated for both
properties for all parameters.Comparisons with Laplace not done yet
(Masters project anyone?).37 Browne, W.J. and Draper, D. 2006. A
comparison of Bayesian and likelihood-based methods for fitting
multilevel models. Bayesian Analysis. 1: 473-514Why do (some)
animal breeders do Bayesian analysis?Consider the linear mixed
model (Henderson et al. 1959; Biometrics)Y = Xb + Zu + e; e ~
N(0,Is2e)Yn x 1bp x 1 (p >n)u ~ N(0,As2u)i.e., more animals to
genetically evaluate than have data.Animal modelsSomewhat
pathological models but mixed model inference is viable (Tempelman
and Rosa, 2004): borrowing of informationREML seems to work just
fine for Gaussian data.Put Bayes on the shelf
Greatest interest in u, s2u, and s2e.For GLMM in animal
breedingits hard not to be Bayesian.Binary or ordinal categorical
data.Probit link animal models (Tempelman and Rosa, 2004)PQL
methods are completely unreliable in animal modelsRestricted
Laplace a little better but still biased (Tempelman, 1998: Journal
of Dairy Science).Fully Bayes inference using MCMC - > most
viable.Our models are becoming increasingly
pathological!Tens/Hundreds of 1000s of genetic markers (see later)
on each animal for both normal and non-normal traits.In that case,
Bayesian methods become increasingly importanteven for normal
data.i.e., asymptotic inference issues due to p.for same n.
Tempelman, R. J. and G. J. M. Rosa (2004). Empirical Bayes
approaches to mixed model inference in quantitative genetics.
Genetic Analysis of Complex Traits Using SAS. A. M. Saxton. Cary,
NC, SAS Institute Inc.: 149-176.Where is the greatest need for
Bayesian modeling?Multi-stage hierarchical models.When the
classical distributional assumptions do not fit.e.g. e is NOT ~
N(0,Rs2e) or u is NOT ~ N(0,As2u) Examples:Heterogeneous variances
and covariances across environments (scope of inference
issue?)Different distributional forms (e.g. heavy-tailed or
mixtures for residual/random effects).High dimensional variable
selection models (animal genomics)
Heteroskedastic error (Kizilkaya and Tempelman, 2005)Given:
has a certain heteroskedastic specification. determines the
nature of heterogeneous residual variances.
Could do so something similar for GLMM (with
overdispersion..binomial, Poisson, or ordinal categorical data >
3 categories).
41Mixed Model for Heterogeneous Variances.Suppose
with as a fixed intercept residual variancegk > 0 kth fixed
scaling effect (e.g. sex)vl > 0 lth random scaling effect; vl ~
IG(av, av-1) (e.g. herd). E(vl )=1; Var(vl )=1/(av-2) ->
Adjusting (estimating) av: calibrates scope of inference.High av
less heterogeneity, low av higher heterogeneity.
42Birthweights in Italian Piedmontese cattle
95% Credible Intervals in Residual Variances for birthweights
for each of 66 random herdsAlso fitted fixed effects of calf
sex:
ParameterPost Mean/Std95%Cred.Int.14.44 1.03[12.63; 16.70]10.19
0.73[8.89; 11.77]- 4.26 0.53[3.29; 5.36]0.60 0.09[0.46; 0.82]
Lower the , the greater the shrinkage in estimated residual
variances across herds.calibrated pooling of error degrees of
freedom.
Heterogeneous bivariate G-side and R-side inferences!
Bello et al. (2010, 2012)Investigated herd-level and cow-level
relationship between 305-day milk production and calving interval
(CI) as a function of various factors Random (herd) effectsResidual
(cow) effectswell established that joint modeling of correlated
traits provides efficiencies, especially with GLMM!P.S. Nora has
also done this for bivariate Gaussian-binary analyses too. See
Bello et al. (2012b) in Biometrical Journal.44
Herd-Specific and Cow- Specific (Co)variances
Herd kCow j
Letand
45
Rewrite this
Herd kCow jModel each of these different colored terms as
functions of fixed and random effects (in addition to the classical
b and u)!46
Random effect variability in RESIDUAL associations between
traits across herds for
DICM0 DICM1 = 243
Expected range between extreme herd-years 2 = 0.7 d / 100 kg
Ott and Longnecker, 2001
0.00.20.40.6 0.81.0Increase in # of days of CI / 100 kg herd
milk yield0.160.7 d/100kg
0.86
47Sois research irreproducibility sometimes heterogeneity across
studies?and/orfailure to distinguish or calibrate between narrow
versus broad scope of inference.Recall treatment*station (or study)
as the error term for treatment in a meta-replicated study.But
heterogeneity might exist at other levels as well?heterogeneous
residual and random effects (co)variances across farmseven
overdispersion???So inference may require proper calibration at
various levels.Discrete (binary/count) datarecall the problem with
VC inference in GLMM?We actually may need to go even deeper than
that!
Rethinking that perfect storyShould model heterogeneity (of
means, variances, covariances) across studies/farms/times, etc.
explicitly with multi-stage models.Estimates of the corresponding
hyperparameters will indicate how clean or messy the story really
is.Estimate low heterogeneity? HEAVY SHRINKAGE: Broad scope and
narrow scope inference more closely matches up.Estimate high
heterogeneity? LIGHT SHRINKAGE: Broad scope inference might be
almost pointlessinference should be better calibratedShrinkage is a
good thing (Allison et al., 2006)!Neither too broad nor too
narrow.but calibrated.Borrow information across
studies/environments.
Parameter q1
Too broad?Too narrow?Parameter q2
Meta-estimateStudy-spec.High HeterogeneityCalibrationLow
HeterogeneityCalibrationStudy-spec.Mixture (including point mass on
0) priors could be considered as well. KbKb + MuWhat would it take
to model informative heterogeneity?Useful estimates require
moderate to large number of environments! Revisit the utility/rigor
of public data repositories (e.g., through journals)Data AND source
code should be provided (Peng et al. 2011) for ordinary replication
studies.Consider the Biostatistics journal reproducibility review
standard.Worried about somebody elses prioror even model?You could
then reassess for yourself.Concluding commentsMixed model inference
continues to have the primary role for inference in agriculture,
genetics, and other fields.Multi-stage hierarchical Bayesian
extensions offer shrinkage-based inference that may better
calibrate scope for ordinary replication (Kb+Mu)yet provide
reliable broad scope inference(Kb).Might be useful to
retrospectively identify covariates contributing to heterogeneity
to facilitate further shrinkage and hence even better agreement
between broad and narrow scope inference.Lets be careful about
inferences that condition on other estimates as if they were true
values (e.g. empirical Bayes, conventional GLMM)Otherwise we may be
overselling the precision of our inferencesparticularly with
ordinary replication!
Strong implications for technology transfer mission in
agriculture.52SourcedfSSMSEMS
Aa-1SSAMSA
WholePlot Error
(WP(A))n(a-1)SSWP(A)MSWP(A)
Bb-1SSBMSB
A*B(a-1)(b-1)SSABMSAB
Subplot Errorn(a-1)(b-1)SSEMSE
_1396689039.unknown
_1396692701.unknown
_1396692702.unknown
_1396689046.unknown
_922866482.unknown