Euroworkshop on Nonparametric models Schloß H ¨ ohenried, November 2001 Bayesian nonparametrics and flexible structured modelling by Peter Green (University of Bristol, [email protected]). distributions and dependence Dirichlet process and relations mixtures structured modelling space and time c University of Bristol, 2001 1
44
Embed
Bayesian nonparametrics and flexible structured modellingmapjg/papers/euroworkshop.pdf · Bayesian nonparametrics and flexible structured modelling by Peter Green (University of
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Euroworkshop on Nonparametric modelsSchloß Hohenried, November 2001
Bayesian nonparametrics and flexiblestructured modelling
� directness of inference, appealing tonon-statisticians
� integrating all sources of uncertainty
� modular: coherent introduction ofnonparametric components into structuredmodels
� sequential updating: invariance to permutation
� opportunity of using quantitative priorinformation if it exists
� uncovering multiple explanations
� most practical and computational objectionshave been eliminated
2
Bayesian interpretations of frequentistnonparametric procedures
� smoothing splines
� state-space models
� wavelet thresholding
– not the real focus of contemporary research, butperhaps useful in reminding us of “quasi-Bayesian”character of prior assumptions such as smoothnessexpressed by a roughness functional.
3
Bayesian nonparametric modelling ofdistributions
The basic problem: given observationsY�� Y�� � � � � Yn from an unknown probabilitydistribution F on a space �, make inference aboutF .
Parametric answer: restrict F to be F� for somefinite-dimensional parameter �, place a prior � on �
and use the posterior
���jY � � ����nY
i��
f��Yi�
Nonparametric answer: only insist that F lies in abigger (infinite-dimensional?) space, place a prior �on that space, and use the posterior
��F jY � � ��F �
nY
i��
f�Yi�
4
Flexible priors on probability distributions
Are there classes of distributions on distributionsthat are (a) flexible, and (b) permit tractableposterior analysis? A basic ingredient of many ofthem:
The Dirichlet process
Given a ’base’ or ’expectation’ probability measureF� and a positive scalar parameter c, we write
F � D�cF��
if for every measurable partition �B�� B�� � � � � Bn� of� we have
�F �B��� F �B��� � � � � F �Bn��
� Dirichlet�cF��B��� cF��B��� � � � � cF��Bn��
5
Basic properties of the Dirichlet process
E�F �B�� � F��B�
var�F �B�� �F��B���� F��B��
c� �
so c is a measure of concentration about the basemeasure F�.
However, c is also a measure of discreteness. Therandom F is discrete with probability 1.
If F� is continuous, and you draw F � D�cF��, andthen Y�� Y�� � � � � YnjF � F , independently, we findP �Y� � Y�� � ���c� ��.
If c � �, then Y� � Y� � � � � � Yn � Y a.s., whereY � F�!
If c ��, then F � F�, and Yi � F�, i.i.d.
6
Prior to posterior
The beauty of the DP model is the conjugate update:
D�cF�� � data�Y�� Y�� � � � � Yn� � D�cF� � nFn�
where Fn is the empirical distribution of�Y�� Y�� � � � � Yn�.
This is not only of practical benefit, but conferssome ’canonical’ status on the DP model.
7
Relatives of the Dirichlet process
The so-called Mixture of Dirichlet Processes model(more properly Dirichlet Process Mixture) gets roundthe discreteness problem by introducing ’noise’:
Yij� � g��j�i�
where
��� ��� � � � � �njF � F independently
and F � D�cF��
The conjugacy still helps - Gibbs sampling for the �iis trivial - but the inflexibility of the singleparameter c for variability remains severe.
8
Applications of Dirichlet Process Mixtures
By choosing the underlying space �, base measureF� and data-density g appropriately, anastonishingly wide range of practical statisticalmethodologies have been devised within thisframework - often by West and others, at DukeUniversity.
Often the DPM arises as one ingredient in a fullyBayesian hierarchical model.
� mixture modelling
� nonparametric regression
� autoregression
9
Connections with finite mixtures
Green and Richardson (SJS, 2001) showed andexplored a close connection between the MDPmodel and the finite mixture model
So far as modelling the Yi is concerned, the MDPmodel is just the limit of this as k �� and k� � c
(and also according to other limiting regimes).Hardly nonparametric!
10
Other relatives of the Dirichlet process
� Other neutral-to-the-right processes
� Polya trees
� Bernoulli trips
� Quantile pyramids
� Dirichlet diffusion trees
See for example Walker, et al., (JRSS(B), 1999), forthe 4th, Hjort (HSSS, 2002), and for the last, Neal(2001).
11
Bayesian measurement error modelling
with Sylvia Richardson, Laurent Leblond andIsabelle Jaussent (INSERM, Paris)
Aim: to quantify the association between anoutcome Y and a set of covariates Xwhere covariates are imperfectly observed and onlymeasured through “surrogates”.
Ignoring measurement error and treating thesurrogate as the true covariate may produce biasedresults.
12
Why be Bayesian here?
� latent covariates with imprecisely specifiedprior distributions
� combining information on measurementprocess from several sources
� propagating uncertainty
13
Model building – structural specifications
� Y known outcome
� X true (latent) covariate
� U observed surrogate for X
� C known covariates
Formulation of local submodels betweencomponents using– conditional independence assumptions– prior information on the structure of themeasurement process
In regions indexed i � �� �� � � � � n:yi � observed count of disease incidenceEi � expected count based on population size,adjusted for age and sex, etc.
Continuously distributed MRF’s for the jointdistribution of the fi� i � �� �� � � � � ng:Besag, York and Mollie (1991), Clayton andBernardinelli (1992), Best, et al (1999), Wakefieldand Morris (1999)
Parameters characterising spatial dependence areconstant across entire study region
potential risk of over-smoothing and masking oflocal discontinuities, due to global effect of theparameters (concern borne out by empirical studies)
27
Hidden discrete-valued random fields
Common feature of several attempts to address this:replace continuously varying random field for figby an allocation/partition model of the form
i � zi
fj � j � �� �� � � � � kg characterise k componentsfzi� i � �� �� � � � � ng are allocation variables takingvalues in f�� �� � � � � kg
Moving spatial dependence one level higher in thehierarchy, to the fzig has the potential for greaterspatial adaptivity (again seen empirically).
Discreteness in the prior is not imposed onposterior inference. Under Bayesian modelaveraging, the posterior mean risk surface canprovide a smooth estimate.
28
Models in this framework
include
� clustering or segmentation models ofKnorr-Held and Raßer (2000) and Denison andHolmes (2001)
� Green and Richardson (2000) – Potts model forfzig, with the number of states and strength ofinteraction unknown (we retain a Markovianstructure for the fzig)
� Fernandez and Green (2000) – spatial mixturemodels – spatial dependence is pushed yet onelevel higher: the fzig are conditionallyindependent given weights wij � P �zi � j�
29
Hidden Markov model approach
Basic mixture set-up
yi �
kX
j��
wjf��j�j� independently
�
introduce latent allocation variables fzig with
yijz � f��j�zi�
p�zi � j� � wj
Temporal HMM set-up
As above, but i now represents (discrete) time.
Data are a time series �yi�, and �zi� is now a Markovchain.
30
Extension to spatial case for disease mapping
Write relative risk as zi in place of i.
yijz � Poisson�ziEi�
where fzig is a spatially dependent random fieldwith zi � f�� �� � � � � kg.
More commonly we would have covariates xi anduse the model:
yijz � Poisson�ziEiex�
i��
31
Allocation models
In each case, spatial context determined by assumedneighbourhood structure – we say ‘adjacent’ �‘have common boundary’ (i � j). For rare diseases,more complex dependence not justified.
Interpretation and inference in HMRFs andpartition models
Do we really believe there are k groups of regionswith identical relative risks?
� model is being used in a ‘semi-parametric’fashion, not to identify clusters
� inference on fzig rather robust to details ofprior structure – ‘borrows strength’ betweenregions in an adaptive way (by Bayesian modelaveraging)
� avoid over-smoothing of relative risks
� interpret inference on k and z with caution(diagnostic/exploratory)
33
Some issues in model choice for spatialepidemiology
� objectives of the model and of the choice
� statistical paradigm
� specific criteria
One key consideration is the extent to which it isbelieved that all relevant covariates have beenmeasured and included appropriately in the model.
(We can accept that ’all models are wrong’ withoutaccepting that all models are equally useless!)
34
Confounding between spatial structure ofcovariates and random effects
A periodically-voiced concern is over whetherfitting flexible spatial models in addition tocovariates systematically ’dilutes’ estimates ofcovariate effects (the implication being to bedeliberately modest in allowing for unmeasuredcovariates in order not to eliminate the significanceof the measured ones).
This concern is probably unfounded. See the partialreport of an on-going simulation study byRichardson (HSSS, 2002). If spatial correlationbetween covariates and random effects isgenerated, there will be confounding – positive ornegative bias, otherwise, not.
35
Multiple change points in point processes
Example:cyclones hitting the Bay of Bengal
141 cyclones over a period of 100 years(a cyclone is a storm with winds � �� km h��).
Our model is that the intensity as a function of timeis a step function, with an unknown number ofsteps.
The number of steps k is Poisson(), with � , thestep function positions are drawn from the jointdensity � s��s� � s���s� � s�� � � � �sk � sk����L� sk�
and the step heights are independent Gamma( ,�),with � ���� �� and � � ���� n�L�.
37
time
inte
nsity
0 20 40 60 80 100
01
23
38
Posterior for the number of change points k
o
o
oo
o
o
o
o o o o o o
k
prob
abili
ty
0 2 4 6 8 10 12
0.0
0.10
0.20
Zero change points is ruled out; k � � or � moreprobable than under the prior.
39
Posterior density estimates for change-pointpositions
– fixed-bandwidth smoothers either over-smooththe steps, or under-smooth the plateaux.
42
To follow up
Hjort, N. L. (2002) Topics in nonparametricBayesian statistics, in Highly Structured StochasticSystems, OUP, to appear. (For details, seehttp://www.stats.bris.ac.uk/
�peter/L2000/Announce)
Walker, S. G., Damien, P., Laud, P. W. and Smith, A.F. M. (1999) Bayesian nonparametric inference forrandom distributions and related functions (withdiscussion). J. Roy. Statist. Soc. B.
Green, P. J. and Richardson, S. (2001) Modellingheterogeneity with and without the Dirichletprocess, Scandinavian Journal of Statistics, 28,355–375.
Richardson, S. and Green, P. J. (1997) On Bayesiananalysis of mixtures with an unknown number ofcomponents (with discussion) Journal of the RoyalStatistical Society, B, 59, 731–792.
43
Green, P. J. and Richardson, S. (2001) HiddenMarkov models for disease mapping
Fernandez, C. and Green, P. J. (2001) Modellingspatially correlated data via mixtures: a Bayesianapproach
Richardson, S., Leblond, L., Jaussent, I. and Green,P. J. (2000) Mixture models in measurement errorproblems, with reference to epidemiological studies
(the unpublished papers here can be found on theweb page below)