Introduction Complexity Forms for p D Diagnostics for fit Model comparison criterion Examples Conclusion Bayesian measures of model complexity and fit by D. J. Spiegelhalter, N. G. Best, B. P. Carlin and A. van der Linde, 2002 presented by Ilaria Masiani TSI-EuroBayes student Université Paris Dauphine Reading seminar on Classics, October 21, 2013 Ilaria Masiani October 21, 2013
95
Embed
Reading "Bayesian measures of model complexity and fit"
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion
Bayesian measures of model complexityand fit
by D. J. Spiegelhalter, N. G. Best, B. P. Carlin and A. van derLinde, 2002
presented by Ilaria Masiani
TSI-EuroBayes studentUniversité Paris Dauphine
Reading seminar on Classics, October 21, 2013
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion
Presentation of the paper
Bayesian measures of model complexity and fit by David J.Spiegelhalter, Nicola G. Best, Bradley P. Carlin andAngelika van der LindePublished in 2002 for J. Royal Statistical Society, series B,vol.64, Part 4, pp. 583-639
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion
Outline
1 Introduction
2 Complexity of a Bayesian model
3 Forms for pD
4 Diagnostics for fit
5 Model comparison criterion
6 Examples
7 Conclusion
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion
Outline
1 Introduction
2 Complexity of a Bayesian model
3 Forms for pD
4 Diagnostics for fit
5 Model comparison criterion
6 Examples
7 Conclusion
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion
Introduction
Model comparison:measure of fit (ex. deviance statistic)complexity (n. of free parameters in the model)
=⇒Trade-off of these two quantities
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion
Some of usual model comparison criterion:Akaike information criterion: AIC= −2log{p(y |θ)}+ 2pBayesian information criterion:BIC= −2log{p(y |θ)}+ plog(n)
The problem: both require to know p
Sometimes not clearly defined, e.g., complex hierarchicalmodels
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion
=⇒This paper suggests Bayesian measures of complexity andfit that can be combined to compare complex models.
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionBayesian measure of modelcomplexity
Observations on pD
Outline
1 Introduction
2 Complexity of a Bayesian model
3 Forms for pD
4 Diagnostics for fit
5 Model comparison criterion
6 Examples
7 Conclusion
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionBayesian measure of modelcomplexity
Observations on pD
Complexity reflects the ’difficulty in estimation’.
Measure of complexity may depend on:prior informationobserved data
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionBayesian measure of modelcomplexity
Observations on pD
True model
’All models are wrong, but some are useful’Box (1976)
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionBayesian measure of modelcomplexity
Observations on pD
True model
pt (Y ) ’true’ distribution of unobserved future data Yθt ’pseudotrue’ parameter valuep(Y |θt ) likelihood specified by θt
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionBayesian measure of modelcomplexity
Observations on pD
Residual information
residual information in data y conditional on θ:
−2log{p(y |θ)}
up to a multiplicative constant (Kullback and Leibler, 1951)estimator θ(y) of θt
excess of the true over the estimated residual information:
1 Classical approach: attempts to estimate the samplingexpectation of cΘ
2 Bayesian approach: direct calculation of the posteriorexpectation of cΘ
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionDefinition of the problemClassical criteria for modelcomparison
Bayesian criteria for modelcomparison
Outline
1 Introduction
2 Complexity of a Bayesian model
3 Forms for pD
4 Diagnostics for fit
5 Model comparison criterionClassical criteria for model comparison
6 Examples
7 Conclusion
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionDefinition of the problemClassical criteria for modelcomparison
Bayesian criteria for modelcomparison
Expected optimism: π(θt ) = EY |θt [cΘ{Y , θt , θ(Y )}]All criteria for models comparison based on minimizing
EYrep|θt [L{Yrep, θ(y)}] = L{y , θ(y)}+ π(θt )
Efron (1986) π(θt ) for the log-loss function: πE (θt ) ≈ 2pConsidered as corresponding to a plug-in estimate of fit +twice the effective number of parameters in the model
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionDefinition of the problemClassical criteria for modelcomparison
Bayesian criteria for modelcomparison
Expected optimism: π(θt ) = EY |θt [cΘ{Y , θt , θ(Y )}]All criteria for models comparison based on minimizing
EYrep|θt [L{Yrep, θ(y)}] = L{y , θ(y)}+ π(θt )
Efron (1986) π(θt ) for the log-loss function: πE (θt ) ≈ 2pConsidered as corresponding to a plug-in estimate of fit +twice the effective number of parameters in the model
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionDefinition of the problemClassical criteria for modelcomparison
Bayesian criteria for modelcomparison
Expected optimism: π(θt ) = EY |θt [cΘ{Y , θt , θ(Y )}]All criteria for models comparison based on minimizing
EYrep|θt [L{Yrep, θ(y)}] = L{y , θ(y)}+ π(θt )
Efron (1986) π(θt ) for the log-loss function: πE (θt ) ≈ 2pConsidered as corresponding to a plug-in estimate of fit +twice the effective number of parameters in the model
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionDefinition of the problemClassical criteria for modelcomparison
Bayesian criteria for modelcomparison
Expected optimism: π(θt ) = EY |θt [cΘ{Y , θt , θ(Y )}]All criteria for models comparison based on minimizing
EYrep|θt [L{Yrep, θ(y)}] = L{y , θ(y)}+ π(θt )
Efron (1986) π(θt ) for the log-loss function: πE (θt ) ≈ 2pConsidered as corresponding to a plug-in estimate of fit +twice the effective number of parameters in the model
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionDefinition of the problemClassical criteria for modelcomparison
Bayesian criteria for modelcomparison
Outline
1 Introduction
2 Complexity of a Bayesian model
3 Forms for pD
4 Diagnostics for fit
5 Model comparison criterionBayesian criteria for model comparison
6 Examples
7 Conclusion
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionDefinition of the problemClassical criteria for modelcomparison
Bayesian criteria for modelcomparison
AIME: identify models that best explain the observed databut
with the expectation that they minimize uncertainty aboutobservations generated in the same way
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionDefinition of the problemClassical criteria for modelcomparison
Bayesian criteria for modelcomparison
Deviance information criterion (DIC)
Definition
DIC = D(θ) + 2pD
= D + pD
Classical estimate of fit + twice the effective number ofparametersAlso a Bayesian measure of fit, penalized by complexity pD
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionDefinition of the problemClassical criteria for modelcomparison
Bayesian criteria for modelcomparison
DIC and AIC
Akaike information criterion=⇒ AIC= 2p − 2log{p(y |θ)}θ =MLE
From result (2): pD ≈ p in models with negligible priorinformation =⇒ DIC≈ 2p + D(θ)
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Outline
1 Introduction
2 Complexity of a Bayesian model
3 Forms for pD
4 Diagnostics for fit
5 Model comparison criterion
6 Examples
7 Conclusion
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Outline
1 Introduction
2 Complexity of a Bayesian model
3 Forms for pD
4 Diagnostics for fit
5 Model comparison criterion
6 ExamplesSpatial distribution of lip cancer in Scotland
7 Conclusion
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Data on the rates of lip cancer in 56 districts in Scotland(Clayton and Kaldor, 1987; Breslow and Clayton, 1993)
yi observed numbers of cases for each county iEi expected numbers of cases for each county iAi list for each county of its ni adjacent counties
yi ∼ Pois(exp{θi}Ei)
exp{θi} underlying true area-specific relative risk of lip cancer
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Data on the rates of lip cancer in 56 districts in Scotland(Clayton and Kaldor, 1987; Breslow and Clayton, 1993)
yi observed numbers of cases for each county iEi expected numbers of cases for each county iAi list for each county of its ni adjacent counties
yi ∼ Pois(exp{θi}Ei)
exp{θi} underlying true area-specific relative risk of lip cancer
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Conclusion Spatial distribution of lip cancer Six-cities study
Saturated deviance
D(θ) = 2∑
i
[yi log{yi/exp(θi)Ei} − {yi − exp(θi)Ei}]
(McCullagh and Nelder, 1989, pg 34)
obtained by taking as standardizing factor:−2log{f (y)} = −2
∑i log{p(yi |θi)} = 208.0
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Results
For each model, two independent chains of MCMC (WinBUGS)for 15000 iterations each (burn-in after 5000 it.)
Deviance summaries using three alternative parameterizations(mean, canonical, median).
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Deviance calculations
D mean of the posterior samples of the saturated devianceD(µ) by plugging the posterior mean of µi = exp(θi)Ei intothe saturated devianceD(θ) by plugging the posterior means of α0, αi , γi , δi intothe linear predictor θi
D(med) by plugging the posterior median of θi into thesaturated deviance
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Deviance calculations
D mean of the posterior samples of the saturated devianceD(µ) by plugging the posterior mean of µi = exp(θi)Ei intothe saturated devianceD(θ) by plugging the posterior means of α0, αi , γi , δi intothe linear predictor θi
D(med) by plugging the posterior median of θi into thesaturated deviance
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Deviance calculations
D mean of the posterior samples of the saturated devianceD(µ) by plugging the posterior mean of µi = exp(θi)Ei intothe saturated devianceD(θ) by plugging the posterior means of α0, αi , γi , δi intothe linear predictor θi
D(med) by plugging the posterior median of θi into thesaturated deviance
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Deviance calculations
D mean of the posterior samples of the saturated devianceD(µ) by plugging the posterior mean of µi = exp(θi)Ei intothe saturated devianceD(θ) by plugging the posterior means of α0, αi , γi , δi intothe linear predictor θi
D(med) by plugging the posterior median of θi into thesaturated deviance
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Observations on pDs results
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Observations on pDs results
From result (2): pD ≈ ppooled model 1: pD = 1.0saturated model 5: pD from 52.8 to 55.9models 3-4 with spatial random effects: pD around 31model 2 with only exchangeable random effects: pDaround 43
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Comparison of DIC
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Comparison of DIC
DIC subject to Monte Carlo sampling error (function ofstochastic quantities)
Either of models 3 or 4 is superior to the others
Models 2 and 5 are superior to model 1
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Absolute measure of fit: compare D with n = 56
All models (except pooled model 1) adequate overall fit to thedata =⇒ comparison essentially based on pDs
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Absolute measure of fit: compare D with n = 56
All models (except pooled model 1) adequate overall fit to thedata =⇒ comparison essentially based on pDs
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Outline
1 Introduction
2 Complexity of a Bayesian model
3 Forms for pD
4 Diagnostics for fit
5 Model comparison criterion
6 ExamplesSix-cities study
7 Conclusion
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Subset of data from the six-cities study: longitudinal study ofhealth effects of air pollution (Fitzmaurice and Laird, 1993)
yij repeated binary measurement of the wheezing status ofchild i at time j (1, yes; 0, no), i = 1, ..., I, j = 1, ..., JI = 537 children living in Stuebenville, OhioJ = 4 time pointsaij age of child i in years at measurement point j (7, 8, 9,10 years)si smoking status of child i ’s mother (1, yes; 0, no)
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Subset of data from the six-cities study: longitudinal study ofhealth effects of air pollution (Fitzmaurice and Laird, 1993)
yij repeated binary measurement of the wheezing status ofchild i at time j (1, yes; 0, no), i = 1, ..., I, j = 1, ..., JI = 537 children living in Stuebenville, OhioJ = 4 time pointsaij age of child i in years at measurement point j (7, 8, 9,10 years)si smoking status of child i ’s mother (1, yes; 0, no)
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Conditional response model
Yij ∼ Bernoulli(pij)
pij = Pr(Yij = 1) = g−1(µij)
µij = β0 + β1zij1 + β2zij2 + β3zij3 + bi
zijk = xijk − x ..k , k = 1,2,3xij1 = aij , xij2 = si , xij3 = aijsi
bi individual-specific random effects: bi ∼ N(0, λ−1)
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Conditional response model
Yij ∼ Bernoulli(pij)
pij = Pr(Yij = 1) = g−1(µij)
µij = β0 + β1zij1 + β2zij2 + β3zij3 + bi
zijk = xijk − x ..k , k = 1,2,3xij1 = aij , xij2 = si , xij3 = aijsi
bi individual-specific random effects: bi ∼ N(0, λ−1)
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Conditional response model
Yij ∼ Bernoulli(pij)
pij = Pr(Yij = 1) = g−1(µij)
µij = β0 + β1zij1 + β2zij2 + β3zij3 + bi
zijk = xijk − x ..k , k = 1,2,3xij1 = aij , xij2 = si , xij3 = aijsi
bi individual-specific random effects: bi ∼ N(0, λ−1)
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Conditional response model
Yij ∼ Bernoulli(pij)
pij = Pr(Yij = 1) = g−1(µij)
µij = β0 + β1zij1 + β2zij2 + β3zij3 + bi
zijk = xijk − x ..k , k = 1,2,3xij1 = aij , xij2 = si , xij3 = aijsi
bi individual-specific random effects: bi ∼ N(0, λ−1)
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Model choice: link function g(·)
Model 1: g(pij) = logit(pij) = log{pij/(1− pij)}
Model 2: g(pij) = probit(pij) = Φ−1(pij)
Model 3: g(pij) = cloglog(pij) = log{−log(1− pij)}
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Priors and deviance form
βk flat priorsλ ∼ Gamma(0.001,0.001)
D = −2∑i,j
{yij log(pij) + (1− yij)log(1− pij)}
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Results
Gibbs sampler for 5000 iterations (burn-in after 1000 it.)
Deviance summaries for canonical and meanparameterizations.
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion
Outline
1 Introduction
2 Complexity of a Bayesian model
3 Forms for pD
4 Diagnostics for fit
5 Model comparison criterion
6 Examples
7 Conclusion
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion
Conclusion
pD may not be invariant to the chosen parametrizationSimilarities to frequentist measures but based onexpectations w.r.t. parameters, in place of samplingexpectationsDIC viewed as a Bayesian analogue of AIC, similarjustification but wider applicabilityInvolves Monte Carlo sampling and negligible analytic work
Ilaria Masiani October 21, 2013
Appendix References
References I
McCullagh, P. and Nelder, J.Generalized Linear Models.2nd edn. London: Chapman and Hall, 1989.
Besag, J.Spatial interaction and the statistical analysis of latticesystems.J. R. Statist. Soc., series B, 36, 192-236, 1974.
Clayton, D.G. and Kaldor, J.Empirical Bayes estimates of age-standardised relative riskfor use in disease mapping.Biometrics, 43, 671-681, 1987.
Ilaria Masiani October 21, 2013
Appendix References
References II
Efron, B.How biased is the apparent error rate of a prediction rule?J. Ann. Statistic. Ass., 81, 461-470, 1986.
Fitzmaurice, G. and Laird, N.A likelihood-based method for analysing longitudinal binaryresponses.Biometrika, 80, 141-151, 1993.
Kullback, S. and Leibler, R.A.On information and sufficienty.Ann. Math. Statist., 22, 79-86, 1951.
Ilaria Masiani October 21, 2013
Appendix References
References III
Spiegelhalter, D.J., Best, N.G., Carlin, B.P. and van derLinde, A.Bayesian measures of model complexity and fit.J. Royal Statistical Society, series B, vol.64, Part 4, pp.583-639, 2002.