School of Mathematical Sciences Queensland University of Technology Bayesian mixture modelling for characterising environmental exposures and outcomes Darren Eastwood Wraith BCom(Econ), Post Grad Dipl Health Econ & Eval, BMath A thesis submitted for the degree of Doctor of Philosophy in the Faculty of Science, Queensland University of Technology according to QUT requirements. Principal Supervisor: Prof. Kerrie Mengersen Associate Supervisors: Assoc. Prof. Shilu Tong; Dr Clair Alston 2008
205
Embed
Bayesian mixture modelling for characterising ...eprints.qut.edu.au/18467/1/Darren_Wraith_Thesis.pdf · Bayesian mixture modelling for characterising environmental exposures ... Chapter
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
School of Mathematical Sciences
Queensland University of Technology
Bayesian mixture modellingfor characterising
environmental exposuresand outcomes
Darren Eastwood Wraith
BCom(Econ), Post Grad Dipl Health Econ & Eval, BMath
A thesis submitted for the degree of Doctor of Philosophy in the Faculty of
Science, Queensland University of Technology according to QUT requirements.
Principal Supervisor: Prof. Kerrie Mengersen
Associate Supervisors: Assoc. Prof. Shilu Tong; Dr Clair Alston
2008
Abstract
Environmental exposure and outcomes assessment is a great challenge to scientists.
Increasingly more and more detailed data are becoming available to understand the
nature and complexity of the relationships involved. The methodology of mixture
models provides a means to understand, quantify and describe features and relation-
ships within complex data sets. In this thesis, we focussed on a number of applied
problems to characterise complex environmental exposure and outcomes, including:
assessing the interaction between environmental exposures as risk factors for health
outcomes; identifying differing environmental outcomes across a region; and estab-
lishing patterns in the size and concentration of aerosol particles over time. Mixture
model approaches to address these problems are developed and examined for their
suitability in these contexts.
i
List of publications and manuscripts arising from this
thesis
This thesis comprises the following publications which have been accepted, or sub-
mitted, for publication in international refereed journals
Chapter 3: Wraith D. & Mengersen K. Assessing the combined effect of asbestos exposure
and smoking on lung cancer: A Bayesian approach. Statistics in Medicine, 28
February 2007, 1150-1169
Chapter 4: Wraith D. & Mengersen K. A Bayesian approach to assess interaction be-
tween known risk factors: the risk of lung cancer from exposure to asbestos
and smoking. Statistical Methods in Medical Research. (Published online 14
August 2007)
Chapter 5: Wraith D., Mengersen K., Low Choy S., Tong S. Spatial and Temporal Mod-
elling of Ross River virus in Queensland. In Zerger, A. and Argent, R.M. (eds)
MODSIM 2005 International Congress on Modelling and Simulation. Mod-
elling and Simulation Society of Australia and New Zealand, December 2005
Chapter 6: Wraith D., Alston C., Mengersen K., & Hussein T. Bayesian mixture model
estimation of aerosol particle size distributions. Environmetrics (Submitted:
November 2007)
Chapter 7: Wraith D., Alston C., Mengersen K., & Hussein T. Bayesian estimation of
mixtures over time with application to aerosol particle size distributions. Sta-
Martischnig et al. Gateshead, England Hospital CC in shipbuilding area 1972-73 2(1977)Blot et al. (1978) Georgia, USA Hospital CC in shipbuilding area 1970-76 7
Hammond et al. (1979) USA and Canada Cohort. Asbestos insulation workers 1967-76 15
Blot et al. (1980) Virginia, USA Hospital CC in shipbuilding area 1972-76 8
Selikoff et al. (1980) New Jersey, USA Cohort. Amosite asbestos factory 1961-77 14workers
Blot et al. (1982) Florida, USA Hospital based CC in shipbuilding 1970-75 9area
Liddell et al.(1984) Quebec, Canada Cohort. Chrysotile miners and millers 1967-75 17
Pastorino et al.(1984) Lombardy, Italy CC in industrial areas 1976-79 3,4
Berry et al.(1985) East London, England Cohort. Asbestos factory workers 1960-70, 1971-80 16,18
Kjuus et al.(1986) Telemark andVestfold, Norway
Hospital CC in industrial and 1979-83 6shipbuilding areas
de Klerk et al. (1991) Wittenoom, Australia Nested CC in crocidolite miners and 1979-86 1millers
Bovenzi et al.(1993) Trieste, Italy Decedent CC in industrial and 1979-81, 1985-86 5shipbuilding area
McDonald et al. (1993) Quebec, Canada Cohort. Chrysotile miners and millers 1950-92 10
Zhu & Wang (1993) 8 factories, China Cohort. Chrysotile asbestos products 1972-86 11workers
Meurman et al. (1994) North Savo, Finland Anthophyllite miners 1953-91 12
Note: *Study numbering for the purposes of this statistical review (as used by Lee (2002), except
Studies 14 - 19 are referenced here as Studies 13 - 18), and in Table 3.2. For study references, see Lee (2001).
31
ing relative risk estimates for exposure groups to more formal significance testing.
There was no consistent conclusion in favour of either an additive or multiplicative
relation.
Table 3.2: Reported Results of Studies
Study AuthorObserved relative risk estimates (95% CI) Covariance of
Note: Estimates quoted are the mean estimate, below which is the 95% Credible Interval.
67
bigger stars, indicating large values for S, V, γ and P(M).
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
17 18 Overall
gamma
PM
S
V
Figure 4.2: Starplots by study (1-18) and Overall. S is the Synergy Index, V the Multiplicativ-ity Index, PM the probability of a multiplicative relation, and gamma is the power transformationestimate from Rlg (gamma=0 (additive), gamma=1 (multiplicative))
Sensitivity Analysis
Table 4.6 provides the results of the sensitivity analysis for both the relative risk
and mixture models. Estimates of γ from the relative risk models are significantly
lower for cohort studies than case-control studies, indicating less evidence of a sim-
ple multiplicative relationship for cohort studies. The mixture model analysis also
appears to show a clearer difference between type of study. Given a choice of ei-
ther an additive or multiplicative model, the probability of an additive model for
68
Table 4.5: Results of Synergy Index (S) and Multiplicativity Index (V)
Note: - indicates non-convergence and is evidence of overfitting. pD is the effective number ofparameters being used in the model from DIC calculations.
component model (2,711 (k = 2) to 2,701 (k = 3)), without a large increase in the
number of effective parameters (3.59 (k = 2) to 5.06 (k = 3)).
Figure 5.5 illustrates the fitted mixture model with three components for Zone
15 against a time series of the data. In this figure, we can see the three levels of
the time series corresponding to the means of the three components. Figure 5.6
shows a comparison of the fitted mixture model against a histogram of the data. In
comparison, the results from Zone 5 indicate that two components can be fitted to
the data over time.
Figure 5.7 illustrates the fitted mixture model with two components for Zone 5
against a time series of the data, and figure 5.8 shows a comparison of the fitted
mixture model against a histogram of the data.
The results for all zones are shown in Table 5.3. For Zones 8 and 10 the results
suggest only one component or group in the data over time; the results for Zones 4
to 7, 13 and 14 suggest two components, and the results for Zone 15 suggest three
components. For the other zones, the data were too disparate to apply a mixture
83
Time
log(
case
sz15
+1)
1985 1990 1995 2000
01
23
45
0.0 0.1 0.2 0.3
01
23
45
6
Figure 5.5: Plot of fitted mixture model for Zone 15 showing three components against thedata over time (log values). Overall fitted density is shown in Black, and components in Red. Bluelines indicate the estimates of µ for the three components.
84
0 1 2 3 4 5 6
0.0
0.1
0.2
0.3
0.4
Figure 5.6: Plot of fitted mixture model for Zone 15 against a histogram of the data. Overallfitted density is shown in Black, and components in Red.
85
Time
log(
case
sz5+
1)
1985 1990 1995 2000
01
23
4
0.0 0.2 0.4 0.6
01
23
4
Figure 5.7: Plot of fitted mixture model for Zone 5 showing three components against thedata over time (log values). Overall fitted density is shown in Black, and components in Red. Bluelines indicate the estimates of µ for the three components.
86
0 1 2 3 4
0.0
0.1
0.2
0.3
0.4
0.5
log(casesz5+1)
Den
sity
Figure 5.8: Plot of fitted mixture model for Zone 5 against a histogram of the data (densityscale). Overall fitted density is shown in Black, and components in Red.
87
model and draw any substantive conclusions (See Figure 5.4).
Note: † indicates the number of components best addressing the model choice criteria. * indi-cates the data range is too disparate to evaluate a mixture model. pD is the effective number ofparameters being used in the model from DIC calculations
The results from Table 5.3 suggests a spatial pattern to the data, with two
components identified for zones located on the coast (Zones 1,4-7,14,15) compared
to only one component for zones located inland. This spatial pattern for RRv is
supportive of previous evidence associating higher incidences of RRv with coastal
regions (Tong, 2004).
88
5.4 Discussion
We explored a Bayesian mixture model to analyse cases of RRv occurring in 15
climate zones throughout Queensland. We examined two of the zones in detail
and found a higher number of components preferred for data from the zone which
appeared to show a more distinctive pattern (Zone 15). A comparison across all the
zones suggests a higher number of components is identifiable from the data for zones
located along the coast of QLD.
There may be a number of explanations as to why we observe a number of
components or groups in the data over time. Further analysis of Zone 15 suggests
that if we take into account a possible change point in the data around 1991/92 due
to a change in notification practice, the number of components reduces from three to
two. We may also observe two or more components if there has been a substantive
increase in the magnitude of outbreaks over time.
We may also observe differences between the zones in terms of the mean (µ) and
weight (λ) associated with components. Analysis of the means of the components
indicating the change in the level of the data between the components, and the
weight being indicative of the amount of time spent in each component. Even for
zones with the same number of components the disparity between these parameter
estimates may be quite large.
Although we have used a simplifying assumption that the data is i.i.d, this is
not without implications. The most likely implication is that the standard errors
around our estimates are biased, and are likely to be understated. For this reason
no inference for the variance was made. This in some sense a price to pay for the
approach we have adopted. Allowance for the correlated nature of the data is likely
to lead us away from the main aim of the analysis. The primary aim of the analysis
is to classify the data into groups based on changes to the level of the data, rather
than explain the correlation structure of the data.
Explaining the correlation structure of the data is also likely to be difficult. There
does not appear to be a consistent correlation structure to the data, either due to
changes in the magnitude of the seasonal cycles or from changes in the correlation
89
of the data from one period to another. In this case, allowing for the correlation
structure of the data is likely to disguise any changes in the levels that we may
otherwise observe. Alternative approaches such as a Hidden Markov Model, or
a Dirichlet Process mixture have similar inferential difficulties and computational
issues are substantially more involved.
The analysis could be extended in a number of ways. Other distributional forms
could be assume to take account of the large number of zeros in the data and
investigated to assess the difference in the results. Further analysis is also required
to compare the timing of the components across the zones, and we could further
reduce the number of zones into a grouping based on the timing and number of
components observed.
Chapter 6
Bayesian mixture model estimation of aerosol particle
size distributions
In Chapters 6, 7 and 8 we examine approaches to estimate a mixture model at
both single and multiple time points for aerosol particle size distribution (PSD)
data. In this chapter, for estimation of mixture model at a single time point, we
use Reversible Jump MCMC to estimate mixture model parameters including the
number of components which is assumed to be unknown. We compare the results
of this approach to a commonly used estimation method in the aerosol physics
literature. As PSD data is often measured over time at small time intervals, we also
examine the use of an informative prior for estimation of the mixture parameters
which takes into account the correlated nature of the parameters.
6.1 Introduction
There has been recent interest in the estimation of particle size distributions of
aerosol particulate data (Makela et al., 2000; Birmili et al., 2001; Xu et al., 2002;
Whitby et al., 2002; Lu and Bowman, 2004; Hussein et al., 2005). In these pa-
pers, the interest in estimation is largely directed at better understanding aerosol
dynamic processes (i.e., coagulation, nucleation, condensation, and deposition) that
90
91
govern aerosol formation, as growth, and evolution depend on the number, size,
and composition of particles. In the atmosphere, these aerosol characteristics de-
termine the influence of particles on health, climate, cloud formation, and visibility
(Seinfeld and Pandis, 1998). To examine the effects of these impacts, accurate and
computationally efficient estimates of the size and composition of the distribution
are required.
A number of different mathematical representations of size distributions exist,
including discrete, spline, sectional, modal, or monodisperse (Whitby and McMurry,
1997). Two of the most common approaches for representing size distributions are
sectional and modal methods. First introduced by Whitby (1978), a modal represen-
tation treats the aerosol size distribution as a set of individual, typically lognormal,
distributions or modes. Estimation of the modal representation commonly uses an
iterative least squares method (LSM) subject to certain conditions, such as main-
taining a minimum distance between mean estimates of two adjacent components
(for example, see Hussein et al. (2005)).
An alternative modal representation of the size distribution is a finite mixture
model. Mixture models have been the subject of much recent research (Diebolt
and Robert, 1994; McLachlan and Peel, 2000a; Marin et al., 2005; Richardson and
Green, 1997). The Bayesian paradigm for mixture modelling allows for probability
statements to be made directly about the unknown parameters and (perhaps) an
unknown number of components, prior knowledge and expert opinion to be included
in the analysis, and hierarchical descriptions of both local-scale and global features
of the model.
In this chapter, we analyse a sample of aerosol particulate data using a Bayesian
mixture model, and assess the performance of the method using actual and simu-
lated data. We then outline an approach for describing the evolution of the aerosol
particles over time, using an informative prior on a sample of data collected over
one day.
In Section 8.2, we briefly describe particle size distributions, and provide an
illustration with actual data. In Section 3 we outline the methodology of mixture
models, a Gibbs sampling algorithm to estimate the mixture, and a variation to
92
account for the truncation of the data. In Section 4 we present the results of applying
the Bayesian mixture model to some simulated and actual datasets and compare the
results to those obtained by LSM.
6.2 Particle size distribution data
One of the most important physical properties of aerosol particles is their size and
the concentration of particles in terms of their size is referred to as the particle
size distribution. Figure 8.1 shows an example of particle size distribution data
for one measurement or time period. Because aerosol particles are often charged,
their size can be determined from their electrical mobility (McMurry, 2000). A
common instrument that utilizes this principle is the Differential Mobility Particle
Sizer (DMPS). The DMPS includes three main parts: (1) an aerosol particle charger
that produces a steady-state charge distribution for the aerosol particle sample (e.g.
Wiedensohler, 1988; Adachi et al., 1985; Hussin et al., 1983), (2) differential mobility
analyzer (DMA) that separates aerosol particles according to their electrical mobility
(e.g. Hewitt, 1957; Knutson and Whitby, 1975), and (3) a particle counter to count
the number concentrations of the separated aerosol particles after the DMA.
Based on their formation processes, aerosol particles are either primary or sec-
ondary. Primary aerosol particles are directly emitted into the atmosphere or formed
in the atmosphere by condensation or coagulation without chemical reactions. On
the other hand, secondary aerosol particles are formed in the atmosphere by gas-to-
particle conversion processes. Growth of aerosol particles occurs through coagulation
and condensation of hot vapors (e.g. Kulmala et al., 2004). However, the rate of
coagulation depends on the already existing particle number concentration whereas
the rate of condensation depends on the surface area of aerosol particles. There-
fore, particles do not normally grow above 1 mm because the condensation and
coagulation rates decrease as the particle size increases.
In this study we present, as an example, the aerosol particle evolution, before,
during, and after a new particle formation event at a Boreal Forest in Southern
Finland (Figure 8.2). This dataset was selected as it provides a wide ranging repre-
93
1 2 3 4 5 6
0.0
0.1
0.2
0.3
0.4
0.5
Particle Diameter (log(Dp(nm))
Den
sity
Figure 6.1: Histogram of data sampled from Hyytiala, Finland for a single time period
94
sentation of modes for particle size distributions (Dal Masso et al., 2005). Because
aerosol particles are governed by formation and transformation processes, they tend
to form well distinguishable modal feature. For example, during background con-
ditions in the Boreal Forest the particle number size distribution of fine aerosols
(diameter < 2.5 mm) is bi-modal: an Aitken mode (below 0.1 mm) and an accumu-
lation mode (over 0.1 mm). During a new particle formation event a new particle
mode, which is commonly known as nucleation mode, is formed in the atmosphere
with geometric mean diameter bellow 0.025 mm. However, in the urban atmosphere,
aerosol particles are more dynamic because of the different types and properties of
sources of aerosol particles and may show more than 3 lognormal modes. Typically
the number concentrations of aerosol particles in the urban background can be as
high 5× 104cm−3 and very close to a major road they often exceed 105cm−3.
In general, aerosol particles have direct and indirect impacts on the Earth’s
climate. Investigating the modal structure of aerosol particles provides a better un-
derstanding about their dynamic behavior in addition to their effect on the climate.
New particle formation events in the background atmosphere can be one of the best
case studies aiming to understand the dynamical changes that take place in aerosol
particles from the very early stage of their size until they grow and further their
participation in cloud processes. It has been recently observed that new particle
formation can take place anywhere in the globe, further increasing the importance
of understanding the processes involved.
6.3 Methods
In this section, we outline an independent approach to estimating a mixture model
at a single time point using RJMCMC, a two stage approach to estimation of a
mixture over multiple time points, and an approach to estimate a mixture of normal
distributions where there is truncation present in the data.
95
Figure 6.2: An illustration of a new particle formation event at a Boreal Forest sitelocated in Southern Finland. (a) The temporal variation of the particle number size distri-bution and (b) selected particle number size distributions showing the different stages ofthe newly formed particle mode from its early stage. Note that this new particle formationoccurred on a regional scale over the southern part of Finland.
96
6.3.1 Mixture model at a single time point
The density of data (y) given by a finite mixture model can be represented by;
p(y|θ) =k∑
j=1
λjf(y|θj) (6.1)
where k is the number of components in the mixture, λj represents the probability of
membership of the jth component (∑k
j=1 λj = 1), and f(y|θj) is the density function
of component j which has parameters θj.
As component membership of the data is unknown, the usual hierarchical frame-
work for the mixture model involves introducing the latent indicator variable z. In
this model, zi represents the unobserved component membership from which ob-
servation yi is drawn, and is treated as another parameter to be estimated in the
modelling procedure.
In this chapter, we transform the aerosol particle size distribution data using a
natural logarithm prior to fitting the mixture model (Whitby and McMurry, 1997).
In this case, the data (yi) are the natural log of particle diameters (nm), which are
assumed to be normally distributed and the parameters (θj) to be estimated for each
component are therefore the mean (µj) and variance (σ2j ), along with the component
weight (λj). The number of normal components, k, is also assumed to be unknown.
Priors were:
p(µj) ∼ N(ξ, κ−1)
p(σ−2j ) ∼ Gamma(δ, β)
p(β) ∼ Gamma(g, h)
p(λ) ∼ Dirichlet(α1, α2, · · · , αk)
p(k) ∼ Uniform(kmin, kmax)
where ξ, κ, δ, α, η, g, h, kmin and kmax are fixed hyperparameters.
For estimation of the mixture model, we implemented Richardson & Green’s
(1997) Reversible Jump Markov Chain Monte Carlo (RJMCMC) algorithm (for de-
97
tails see Appendix I). This approach is “fully” Bayesian in the sense that a posterior
distribution for the unknown number of components (k) in the mixture model is es-
timated, rather than using a comparative measure, such as the Bayesian Information
Criteria, to assess the fit of mixture models of different dimensions. In section 8.4,
we compare the results of applying the RJMCMC algorithm to results obtained
using the LSM for measurements at particular time periods.
Mixture model estimation over multiple time periods
To examine aerosol dynamic processes, measurements of aerosol particle size dis-
tribution data are often taken regularly over time with the measurement intervals
typically ranging from 5 minutes to an hour. For the smaller measurement inter-
vals the data and associated parameter estimates are likely to be highly correlated
across time. Most estimation of the particle size distribution data does not take in
to account the likely correlation between parameters other than in most cases using
the previous parameter estimates as starting values in an optimisation routine for
the current period. Allowing for correlation between the parameter estimates over
time is likely to lead to improvements in estimation, inference and efficiency.
One approach to this problem is to extend the RJMCMC mixture model de-
scribed in Section 6.3.1 to allow for evolution of parameters θjt over time periods t,
t = 1, . . . , T with kt, the number of mixture components at time t, also unknown
and possibly unequal. This single modelling approach requires reversible moves not
only within time periods but also across them. In our experience this was computa-
tionally very costly and required substantial pre-processing to ensure good mixing,
labelling and convergence. Moreover, further post-processing was required to obtain
adequate summary statistics and between component mapping.
As an alternative, we adopted a two-stage approach to estimation of a mixture
model over multiple time points. In the first stage, for each time period t, we
implemented the RJMCMC algorithm of Section 6.3.1 and estimated kt. We then
calculated k′= maxt=1,...,T (kt). In the second stage, we fixed kt = k
′and estimated
(θjt = (µj, σj, λj); j = 1, . . . , k′; t = 1, . . . , T ). As we do not observe all of the k
′
components in every time period, we allowed component weights to be “effectively”
98
zero (inf(λt)=0.001) if required.
In the second stage of this algorithm, we considered two sets of priors. The
first was the set of independent priors: p(λ) ∼ Dirichlet(α1, ..., αk); p(µj|σ2j ) ∼
N(ξj,σ2
j
nj);
p(σ2j ) ∼ IG(
vj
2,
s2j
2), where αj, ξj, nj, vj and sj are fixed hyperparameters. The second
allowed for temporal correlation. In the case of a Gaussian mixture model, we have
three parameters (µ, σ and λ) for which we could utilise information from previous
time periods. In this chapter, we adopt one such informative prior for λ as it was
the parameter of most interest in the aerosol study. We note that alternative priors
for λ could be defined and that informative priors can be constructed for µ and σ
instead or on multiple parameters.
Gustafson and Walker (2003) proposed a prior for λ that downweights large
changes in probabilities in successive periods. For time period t, t = 2, . . . , T define
p(λt) ∼ Dirichlet(1, . . . , 1) exp(−∑T
t=2
∑Jj=1(λjt − λj,t−1)
2
φ) (6.2)
where smoothing increases as φ → 0.
A potential advantage of using information about estimates over the whole time
period (t = 1 . . . T ) is the additional information this may provide to guide parameter
estimates in the current period. This may be an advantage at times where large
changes in the parameter estimates are occurring for single time periods.
To sample from the posterior distribution of λ, Gustafson and Walker (2003) pro-
pose a rejection sampling algorithm in which the candidate distribution is Beta(m(r)jt +
1,m(r)kt + 1) where m
(r)jt is the number of observations allocated to component j or k
(where j 6= k) for time period t at iteration r. A limitation of this rejection sampling
scheme is that it becomes problematic for large sample sizes and we discuss this issue
later in relation to the results.
6.3.2 Accounting for truncated data
In this section we outline the use of a second latent variable to estimate a mixture
of normal distributions where there is truncation present in the data.
99
A feature of some of the data used to estimate particle size distributions is a
definitive lower and upper bound for the particle size. For example, particle con-
centrations may be measured with a range of particle size from 3nm to 650nm,
depending on the measurement device used. Preliminary investigation of some sam-
pled data from Hyytiala (Finland), revealed the possibility of there being truncation
of the data on the lower and upper bounds. Figure 8.1 shows a sample of particle
size distribution data which clearly illustrates truncation on the lower bound of the
data.
Measurement of aerosol particles is commonly observed in the form of a number
of distinct particle size ranges, or channels, the size and number of the channels
being governed by the type and setup of the measurement instrument (Hussein
et al., 2004). For example, in the sampled data from Hyytiala (See Section 6.4.2),
we observed 32 distinct size partitions (bins) covering the range from 3nm to 650nm.
For estimation of truncated normal distributions using Gibbs sampling, we took a
missing data latent variable approach and introduced a new variable (y = (y, y∗))
to consist of the original data (y) in the largest and smallest size bins (yU and yL)
and the assumed missing data (y∗) consisting of size measurements smaller than the
lower bound, and larger than the upper bound of the original data.
To extend the boundary of the range of the data to be included in (y), we created
an additional number of bins for sizes less than (yL), and greater than (yU). For
example, for the sampled data used for fitting in the next section, we created four
additional bins to the left of the original lower bound, and three to the right of the
original upper bound. The space between size bins was evenly spread in proportion
to the size between the original bins.
We then estimated the parameters of the mixture model using the original data
and (y). At the end of each iteration y∗ was reallocated using the current parameter
values.
An advantage of this latent variable approach is that the algorithm described in
Section 6.3.1 can be readily applied to estimate y∗ and the mixture parameters based
on y. The approach is generalisable to other missing data assumptions we might
have about the original data, although in this chapter we confine our approach to
100
the issue of truncation. For the purposes of this chapter we now refer to the analysis
allowing for truncated data as ‘truncated Normal’.
6.4 Results
In this section, we present the results of applying the RJ algorithm outlined above
to a simulated and actual dataset. For both datasets analysed we used a uniform
prior for k over the range k = 1, . . . , 10 and the following weakly informative hyper-
parameter values: κ = 1/R2; α=2; g = 0.2; h = 10/R2 and δ=1, where R equals the
range of the data. Hyperparameter values for α and g encourages similar values for
σj without being informative about their absolute size, and the value for κ reflects
a weak prior belief in ξ. Results were based on 200,000 iterations with a burnin
period of 100,000.
6.4.1 Simulated data: single time point
In this section, we use simulated data to validate estimates from the RJMCMC
algorithm outlined above. The data comprised four components truncated on the
lower bound, with characteristics representative of the aerosol data described in
Section 6.3.2. Figure 6.3 shows the kernel density estimator of simulated data with
fitted results from normal and truncated normal approaches. The corresponding
posterior estimates of the component parameters and 95% credible intervals are
given in Table 6.1.
Due to the clear truncation of the data in Figure 6.3, we would expect the
truncated Normal distributions to fit the data better in the area of truncation
than a model that ignores this truncation. For both models the posterior estimate
for the number of components (k) was highest for four components (truncated:
P (k = 4) = 0.54, non-truncated: P (k = 4) = 0.93). The point estimates for the
parameters from the truncated Normal distribution for almost all parameters are
much closer to the true values than for estimates from the non-truncated version
(See Table 6.1). Ignoring truncation appears to result in less weight assigned to
the first component, with a mean value lower than the true value, and estimates
101
1 2 3 4 5 6
0.0
0.1
0.2
0.3
0.4
Diameter (natural log scale)
Den
sity
Figure 6.3: Kernel density estimator of simulated data (black) with fitted results fromnormal (dark green) and truncated normal (blue) approaches. Simulated data based onparameters: k = 4;µ = (1.40, 2.30, 3.70, 5.10);σ = 0.30; λ = (0.10, 0.10, 0.60, 0.20)
102
for standard deviation (σ1) and weight (λ1) smaller than the true values. For the
first component, the true value for the weight of the component (λ1) is 0.10, and
the mean estimates for the assumed non-truncated and truncated distributions are
0.0671 and 0.1017, respectively. In the non-truncated model, estimates for the sec-
ond and third components appear then to compensate for smaller estimates from the
first component, with standard deviation and weights for these components larger
than the true values. For the second component, the true value for σ2 is 0.30, and for
λ2 is 0.10. The mean estimates for the non-truncated distribution for σ2 and λ2 are
0.40 and 0.13, respectively. The results thus suggest that accounting for truncation
may not only result in a better fit for the associated component but also better fits
for neighbouring components.
Table 6.1: Estimated parameter values from Bayesian mixture model analysis using RJMCMCalgorithm with simulated data. Based on 200,000 iterations with a burnin of 100,000. CI =Credible Interval
Figure 8.1 shows a plot of actual data, taken from measurements at Hyytiala, a
boreal forest site in Southern Finland (SMEAR II) (Vesala et al., 1998). This dataset
was selected as it provides a wide ranging representation of modes for particle size
distributions (Dal Masso et al., 2005). Figure 6.4 show the results of fitting a normal
and truncated normal distribution to a single time period from this dataset.
1 2 3 4 5 6
0.0
0.1
0.2
0.3
0.4
0.5
Particle Diameter (log(Dp(nm))
Dens
ity
1 2 3 4 5 6
0.0
0.1
0.2
0.3
0.4
0.5
Particle Diameter (log(Dp(nm))
Figure 6.4: Histograms of data sampled from Hyytiala, Finland with estimated overallfit and components for non-truncated Normal (left, k=4) and truncated normal (right,k=3) overlaid
The non-truncated mixture model appears to fit two components with small
variance, whereas the truncated normal mixture model fits one component with
no apparent loss in fit. In practice, a result like this may suggest that there are
two sources for the smaller sized particles instead of one or alternatively a different
104
underlying aerosol process.
We can compare these with an iterative least squares method (LSM) commonly
used for estimation of the modal method in the aerosol literature. Different research
groups have developed their own algorithms and most involve some degree of user
input for the number of components (Makela et al., 2000; Birmili et al., 2001; Whitby
et al., 1991). For this reason, we chose the fully automated algorithm outlined by
Hussein et al. (2005), which compares favourably to Makela et al. (2000), and to
previous versions (Hussein et al., 2004). The aim here is not to comment on this
algorithm or on LSM as a methodology but rather to offer a brief comparison of
results in the context of this case study.
Figure 6.5 shows the results of fitting using LSM and our algorithm for a sample
of data from Hyytiala. The solid line is the predicted fit, and the dotted lines display
the components. The LSM appears to underfit both the small and large sizes of the
particle size distribution. Our algorithm identified two more modes to describe these
extremes of the particle size distribution, and the resulting five component model
appears to provide an improved fit.
Figure 6.6 again shows the results of fitting using the two approaches for a sec-
ond sample of data from Hyytiala. The fit of the LSM appears to ignore either a
second component (mean of 3.6nm) or skewness of the main component. From pre-
liminary investigation, this component gradually emerges around this time period.
Our algorithm provides a better fit for this second component and hence overall.
The main difference between the results of the two algorithms in the examples
that we have seen appears to be that our approach is performing a much more
thorough search of the parameter space including the number of components. By
using uninformative priors, our model choice criteria using RJMCMC is also largely
driven by the data, and is thus avoiding the use of any subjective influences on
model choice.
105
1 2 3 4 5 6
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
Particle Diameter (log(Dp(nm))
Dens
ity
1 2 3 4 5 6
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
Particle Diameter (log(Dp(nm))
Figure 6.5: Histograms of data sampled from Hyytiala, Finland with estimated overallfit and components from RJMCMC (left) and LSM (right) overlaid
106
1 2 3 4 5 6
0.00.2
0.40.6
0.81.0
1.2
Particle Diameter (log(Dp(nm))
Dens
ity
1 2 3 4 5 6
0.00.2
0.40.6
0.81.0
1.2
Particle Diameter (log(Dp(nm))
Figure 6.6: Histograms of data sampled from Hyytiala, Finland with estimated overallfit and components from RJMCMC (left) and LSM (right) overlaid
107
6.4.3 Results for mixture model estimation over multiple
time points
In this section, we apply our two stage algorithm to data collected over one day from
Hyytiala, measured at 10 minute intervals (T=144). Figure 6.7 shows the results
of the first stage of the algorithm, with a plot of the posterior mean estimates for
µjt at each time point t, with the size of the circles indicating the corresponding
weight λjt. The average of the number of components estimated with the highest
probability over the day was four, and the largest number of components was five.
12
34
5
Time
Pos
terio
r m
ean
estim
ates
for
µ
00:00 04:00 08:00 12:00 16:00 20:00 23:50
Figure 6.7: Plot of posterior mean values for µjt obtained from the first stage using theRJMCMC algorithm for one day (Hyytiala measurement station). The size of the circlesindicating the weight (λjt) corresponding to µjt
108
Based on these results, for the second stage of the analysis we set the number of
components to be five (k′= 5) and hyperparameters to be: ξ = (1.1, 2.0, 3.0, 4.0, 5.0);
s2tj = 2.0225; vtj = 0.092025; and nj = 200. The hyperparameter values for ξ were
chosen to adequately represent the parameter space, and hyperparameter values for
s2tj and vtj were chosen to indicate an uninformative prior and allow σjt to range
from 0 to 1.5. The large value of nj was chosen to be large enough to prevent label
switching on the means, and small enough to have neglible influence on the posterior
estimates. Over the time period of interest, the average concentration of particles
was approximately 100,000 and ranged from 2,000 to 120,000. In light of this, the
value for nj is relatively small.
Figure 6.8 shows a plot of the posterior mean estimates for the parameters (µ and
λ) for each component over the course of the day, obtained from the second stage of
analysis using independent priors. The figure indicates a nucleation event occurring
around 08:00. Such events are a common feature of the data from this measurement
station (Sogacheva et al., 2005). Characteristic of such an event is a large increase in
the number of smaller sized particles (Nucleation < 20nm) , which typically grow in
size over the next few hours to either the Aitken (25-90nm) or accumulation modes
(100+nm). From the parameter estimates (Figure 6.8), we see that most of the
weight for the mixture prior to 08:00 is in the third component (bottom panel, µ ≈20nm, λ=0.8), however from 08:00 to 12:00, we see a large increase in the weight
for the first component (µ ≈ 4nm, 08:00 to 10:00), followed by a large increase in
the weight for the second component (µ ≈ 10nm, 10:00 to 12:00). Components 4
and 5 appear and disappear during the course of the day (bottom panel). This
may have a physical interpretation as the presence of an actual component source
or alternatively it may represent convenient modelling of the skewness of a single
component through multiple Gaussian distributions. In either case, the evolving
nature of these components is clearly depicted through this model and motivates
scientific interest.
Figure 6.9 shows a plot of the posterior mean estimates for the parameters (µ
and λ) for each component over the course of the day, obtained from the second
stage of analysis using an informative prior. From Figure 6.9, we can see that the
109
mu
12
34
5
lam
bda
0.0
0.2
0.4
0.6
0.8
1.0
00:00 04:00 08:00 12:00 16:00 20:00 23:50
Figure 6.8: Plot of parameters (µ and λ) over one day (Hyytiala, Finland) for the inde-pendent approach. Stage 2 of the analysis for the evolution of parameters. Measurementstaken every 10 minutes. Colours indicate the components to which parameter estimatesbelong (The parameter estimates for the first component are Black, parameters for thesecond component are Red, for the third component they are Green, etc.)
110
parameter estimates from the informed prior approach appear largely to follow the
parameter estimates from the independent approach with some degree of smoothing
on the weights. Here φ was set equal to 0.05. In analysis not shown, smaller values
of φ suggest a smoother pattern to the weights over time. Alternatively φ could
be treated as unknown and estimated, although care is required in this case. The
results under the informed prior suggest that at times we may be able to better infer
patterns in the data or in some cases remove some anomalies.
As indicated in Section 6.3.1, a limitation of the rejection sampling algorithm
for the informed prior approach as outlined by Gustafson and Walker (2003), is the
specification of the candidate distribution Beta(m(r)jt + 1, m
(r)ikt + 1) for large sample
sizes. We found that for our large sample size of particles and with the volatility in
weights for some periods of time the acceptance rate of proposed parameter values
was exceedingly low (< 5%). This appeared to be particularly the case for estimation
during the period between 11:00 to 13:00 where the sample size increased markedly
(>75,000 particles) and with much variability. This is clearly due to a very narrowly
defined distribution if both m(r)jt and m
(r)kt are large, and if neighbouring estimates
of λjt (λj,t−1 and λj,t+1) are, under the independent approach, some distance away.
Further research to investigate alternative forms of the candidate distribution would
be beneficial to improve computation time.
6.5 Discussion
We used a Bayesian mixture model to estimate particle size distributions for a sam-
ple of real and simulated datasets. We also proposed a modification to the standard
Gibbs sampler to handle the case of truncated data on both the lower and upper
bound. The results from using the algorithm were promising, and the method pro-
vides considerable flexibility both in estimation and inference.
By estimating the parameters of a mixture model using the RJMCMC algorithm,
we can make probabilistic statements about all unknown parameters, including the
number of components (k). In the case of the number of components, this avoids
the need to use comparative measures for fixed values of k, but also places the
111
mu
12
34
5
lam
bda
0.0
0.2
0.4
0.6
0.8
1.0
00:00 04:00 08:00 12:00 16:00 20:00 23:50
Figure 6.9: Plot of parameters (µ and λ) over one day (Hyytiala, Finland) for theinformed prior approach. Stage 2 of the analysis for the evolution of the parameters. Mea-surements taken every 10 minutes. Colours indicate the components to which parameterestimates belong (The parameter estimates for the first component are Black, parametersfor the second component are Red, for the third component they are Green, etc.)
112
assessment of model fit on a probabilistic basis which can be used for inference. For
example, we can say with some probability whether two or three modes exist at a
particular point in time. Of further interest may be the concentration of particles for
each mode. To examine this, we can estimate the probability that the concentration
of particles is above or below certain thresholds of interest.
In a Bayesian approach, prior knowledge or expert opinion and hierarchical de-
scriptions of both local-scale and global features can be included in the model. Al-
though, we have generally used weakly informative priors for parameter estimates,
more informative priors can be used in situations where this information is available.
In the case of frequent measurements of aerosol particle size distribution over time,
including prior information may assist in estimation and inference. For estimation,
the identifiability of mixtures is an important issue (Marin et al., 2005) and we may
be able to better identify individual components in time periods where there appears
to be some degree of overlap. By obtaining smoother parameter estimates over time
we may be able to more clearly establish patterns or identify anomalies from the
data.
While the Bayesian method for mixture models offers both flexibility in estima-
tion and inference, the interpretability of the parameters estimated is an important
question. For example, particle size distribution fits with five components may not
readily have a physical interpretation. From preliminary investigation of various size
distribution data, we generally found that extra components were needed to account
for the skewness of some size distribution data. Depending on their location in the
size distribution neighbouring components may need to be combined. The Gaussian
representation of mixture densities makes this relatively straightforward. Interpreta-
tion of results is also aided by including prior or expert opinion, which may have the
effect of restricting parameter estimates to known domains. Alternatively, further
investigation of the particle source for these components may be needed.
Chapter 7
Bayesian estimation of mixtures over time with
application to aerosol particle size distributions
In this chapter, we examine in some detail the issue of using informative priors for
estimation of mixtures at multiple time points. In this analysis, the use of two
different informative priors, and an independent prior are compared using simulated
and actual data. The use of informative priors may provide useful information
in which to better identify component parameters at each time point, and as an
aid for inference provide information in which to more clearly establish patterns in
the parameters over time. As this chapter is designed to be read independently of
Chapter 6, Section 2 describing PSD data and the first part of Section 3 outlining
mixture models, are largely repeated from Chapter 6.
7.1 Introduction
Interest in the estimation of aerosol particle size distributions is largely directed at
In this chapter, we briefly explore two different informative priors for estimation
of mixtures where the data are highly correlated, and all parameters in the mixture
are allowed to vary. Different datasets, with features similar to actual particle size
distribution data, are used to highlight the influence of using informative priors and
identify situations where placing informative priors may not be beneficial.
An outline of the chapter follows. In Section 8.2, we briefly describe particle
size distributions, and provide an illustration with actual data. In Section 8.3, we
outline the standard mixture model setup for a single time point and then outline
two approaches to estimation of a mixture model where we have more than one
time point. Section 8.4 presents results on the performance of the approaches on
several simulated datasets and actual data, and we conclude in Section 7.5 with
some discussion and possibilities for further work.
7.2 Particle size distribution data
One of the most important physical properties of aerosol particles is their size and
the concentration of particles in terms of their size is referred to as the particle size
distribution. Figure 7.1 shows an example of particle size distribution data for one
measurement or time period. Because aerosol particles are often charged, their size
can be determined from their electrical mobility (McMurry, 2000).
In this study we present, as an example, the aerosol particle evolution, before,
during, and after a new particle formation event at a Boreal Forest in Southern
Finland (Figure 8.2). This dataset was selected as it provides a wide ranging repre-
sentation of modes for particle size distributions (Dal Masso et al., 2005). Because
aerosol particles are governed by formation and transformation processes, they tend
to form well distinguishable modal feature. For example, during background con-
ditions in the Boreal Forest the particle number size distribution of fine aerosols
(diameter < 2.5 mm) is bimodal: an Aitken mode (below 0.1 mm) and an accumu-
lation mode (over 0.1 mm). During a new particle formation event a new particle
mode, which is commonly known as nucleation mode, is formed in the atmosphere
116
1 2 3 4 5 6
020
040
060
080
010
00
Particle Diameter
Con
cent
ratio
n
Predicted fitComponents
Figure 7.1: Estimated overall fit and components from RJMCMC for one time period. Con-centration of particles (dN/dlog(Dp)[cm3]) by particle diameter (log(Dp(nm)))
117
with geometric mean diameter bellow 0.025 mm. However, in the urban atmosphere,
aerosol particles are more dynamic because of the different types and properties of
sources of aerosol particles and may show more than 3 lognormal modes. Typically
the number concentrations of aerosol particles in the urban background can be as
high 5× 104cm−3 and very close to a major road they often exceed 105cm−3.
7.3 Methods
In this section, we briefly describe a mixture model, outline a two stage approach to
estimation of parameters over time, and describe three types of priors for temporal
evolution of the parameters.
7.3.1 Mixture representation
The density of data (y) at a given time period is represented by a finite mixture
model
p(y|θ) =k∑
j=1
λjf(y|θj) (7.1)
where k is the number of components in the mixture, λj represents the probability
of membership of the jth component (∑k
j=1 λj = 1), and f(y|θj) is the density
function of component j which has parameters θj.
As component membership of the data is unknown, a computationally convenient
method of estimation for mixture models is to use a hidden allocation process and
introduce a latent indicator variable zi, which is used along the lines of a missing
variable approach to allocate observations yi to each component.
In this chapter, we adopt the common assumption of fitting log-normal distri-
butions to aerosol particle size distribution data (Whitby and McMurry, 1997). As
PSD data are often measured with a definite lower and upper bound for the size of
the particles we introduce a slight modification and assume that the data follow a
truncated normal distribution. As is commonly assumed, we take the data (y) to
be the log of particle diameters (nm), and the parameters to be estimated (θj) for
each component are the mean (µ), variance (σ2) and weight (λ). The number of
118
Figure 7.2: An illustration of a new particle formation event at a Boreal Forest sitelocated in Southern Finland. (a) The temporal variation of the particle number size distri-bution and (b) selected particle number size distributions showing the different stages ofthe newly formed particle mode from its early stage. Note that this new particle formationoccurred on a regional scale over the southern part of Finland.
119
components k was also considered to be unknown. Priors for the first stage of the
analysis were:
p(µj) ∼ N(ξ, κ−1)
p(σ−2j ) ∼ Gamma(δ, β)
p(β) ∼ Gamma(g, h)
p(λ) ∼ Dirichlet(α1, α2, · · · , αk)
p(k) ∼ Uniform(kmin, kmax)
where ξ, κ, δ, α, η, g, h, kmin and kmax are fixed hyperparameters.
In the first stage of the temporal analysis, for each time period we implemented
Richardson & Green’s (1997) RJMCMC algorithm. Although this algorithm is easily
fit at a single time point, the use of RJMCMC for mixture models with temporal data
requires significant pre-processing with respect to mixing coverage and convergence,
as well as post-processing to provide adequate summary statistics and between time
component mapping. As an alternative, we considered a two-stage approach. In
the first stage, the number of components was estimated at each time point using
RJMCMC. In the second stage, we fixed the number of components (k) to the
maximum observed at any time period and independently estimated the parameters
of the mixture model (µ,σ, and λ) for each time period using a Gibbs sampler
algorithm. As we do not observe all of the components in every time period, we
allow component weights to be zero (inf(λt)=0.001) if required.
The Gibbs sampler is iterated until the Markov Chains for the parameters have
converged to stationary posterior distributions. For the second stage, priors were
p(λ) ∼ Dirichlet(α1, ..., αk)
p(µj|σ2j ) ∼ N(ξj,
σ2j
nj
) (7.2)
p(σ2j ) ∼ IG(
vj
2,s2
j
2)
where αj, ξj, nj, vj and sj are fixed hyperparameters.
120
For the independent prior case, we use uninformative priors for µ, σ and λ. Priors
for the dependent prior are discussed below.
7.3.2 Choice of temporal prior
In the second stage, three priors were considered for linking parameter values (µ,σ,λ)
over time. The first of these was the independent prior, in which the correlated
nature of the data was ignored completely and parameters were independently esti-
mated at each time point. The second and third were termed the ‘informed prior’
and ’penalised prior’, as described below.
Informed Prior
In this approach we use the information provided from previous and future time
periods as prior information for the current period. For the main results we focus on
a simple case where posterior estimates from the previous period are used as prior
information for the current period. We do this to illustrate the influence of a simple
prior specification on the posterior estimates of parameters (θ).
In the case of a mixture model using Gaussian distributions, we have three pa-
rameters (µ, σ and λ) for which we could utilise available prior information to aid in
estimation. Preliminary investigation indicates that all three parameters are likely to
show strong evidence of autocorrelation, so here we examine the effect of smoothing
on each of these parameters.
For p(λ), we allow αj in Equation 7.2 to vary and reflect prior information about
λt−1,j. Thus, we set αj = θjmt−1,j where mt−1,j is the mean of the number of
observations allocated to component j in the previous time period, and θj is fixed
at some value. An alternative is to impose a distribution on θ, say θj ∼ U(0, 1) (or
N(1, 0.5)) but we do not present the results for this approach in this chapter.
For the specification of prior information for µ and σ, we set ξjt = µjt−1,
vj = nj/σ2jt−1 and sj = nj and increase the value of nj from the value set for
the independent case to reflect the degree of dependency for these parameters from
the previous period.
121
Penalised Prior
An alternative to using the informed prior described in Section 7.3.2 is to use a re-
parameterisation of the prior to reflect the degree of dependency between parameters.
Gustafson and Walker (2003) proposed a prior for λ that downweights large changes
in probabilities in successive periods. Thus, at time period t (t = 2, . . . , T ), p(λt|zt)
is defined as
p(λt|zt) ∼ Dirichlet(1, . . . , 1) exp(−∑T
t=2
∑Jj=1(λt,j − λt−1,j)
2
φ) (7.3)
where smaller values of φ imply greater smoothing. The above formulation can
naturally be extended to dependence on more than a single time period.
A potential advantage of using both information about estimates forwards and
backwards in time is the additional information this may provide to guide parameter
estimates in the current period. This may be an advantage at times where large
changes in the parameter estimates are occurring for single time periods at a time.
For comparative purposes in Section 8.4.1, we compare the results of using a similar
formulation for λ in the informed prior approach.
To sample from the posterior distribution of λ, we use a rejection sampling
approach proposed by Gustafson and Walker (2003).
Prior distributions p(µ) and p(σ) are set as for the independent approach (Equa-
tion 7.2).
7.4 Results
In this section we present and assess the results using simulated data and then
present the results of applying the approaches to particle size distribution data from
Hyytiala, Finland. We use the simulated data to test the impact of the different
prior representations and the degree of smoothing. We first use an informative and
penalised prior only on the weights (λ), and then assess the influence of using an
informative prior on µ and σ.
122
7.4.1 Simulated Data
Data Setup
We simulated datasets indicative of the type of behaviour of aerosol particle size
distribution data observed at Hyytiala, a boreal forest site in Southern Finland
(SMEAR II) (Vesala et al., 1998). A particular feature of this particle size distribu-
tion data is both a growth in the mean and weight for a component.
We simulated data from three different cases. In the first case (D1), we simulated
data with parameters which can be characterised as having medium correlation
across time. In this case the mixture is well identified and interest is in whether the
results from the informed and penalised approaches are largely the same as for the
independent approach.
In practice it is quite common to observe sudden large changes in the number
of particles measured which may persist for a number of time periods. This is more
often observed when there are relatively few particles for a particular size group,
and more so for the smaller sized particles. Thus, for the second data set (D2) we
simulate data for the first component where the weight at smaller values is quite
volatile. For this dataset the mixture is also well identified.
For the third data set (D3), we simulated data which are highly correlated across
time, a feature of particle size distribution data observed in practice for most time
periods where measurements are commonly taken at small time intervals. This
dataset was also simulated with parameter estimates where at times the mixture is
not well identified. Of interest in this setting is to see the effect of using either the
informed prior or penalised prior approach.
For the results to follow, except as specified otherwise, for the independent,
informed prior and penalised prior approaches, we set the hyperparameters to be:
Figure 7.3 shows the results of the informed prior, penalised prior and independent
approaches compared to the actual dataset D1. As expected for this well defined
123
dataset, the results for the informed and penalised prior approaches appear to largely
follow the results obtained from the independent approach.
Component 1
1.5
2.0
2.5
3.0
0.2
0.4
0.6
0.8
0 20 40 60 80 100
0.1
0.3
0.5
0.7
Component 2
3.2
3.3
3.4
3.5
3.6
3.7
0.2
0.4
0.6
0.8
0 20 40 60 80 100
0.15
0.20
0.25
Component 3
4.7
4.8
4.9
5.0
5.1
0.2
0.4
0.6
0.8
0 20 40 60 80 100
0.1
0.3
0.5
0.7
Figure 7.3: Plot of estimated parameters (µ (top panels), σ (middle panels) and λ
(bottom panels) for approaches using simulated data (D1): Simulated data (Black); Inde-pendent (Red); Informed Prior (Green); Penalised Prior (Blue)
Figure 7.4 shows the results of analysis using dataset D2, using the informed
prior approach with θ = 0.5 and penalised prior approach with φ = 0.08 and the
independent prior approach. The values for θ and φ were chosen to allow prior infor-
mation to be influential, but not overwhelm the posterior estimates. Interestingly
it is evident that smoothing on the weights results in compensatory measures by
both µ and σ. The compensatory measures appear to be more pronounced when
λ is volatile over time. In this case, the prior is imposing larger adjustments away
124
from the data at each time point. We see this most clearly in the results for the
first two components, but not for the third component. A possible explanation for
this observation is that for the first component, µ is able to adjust to a higher value
which is supportive of a greater value for λ, and in some sense borrow support from
the second component. However, for the third component, µ is not able to increase
or decrease in support of a lower value for λ by borrowing support from a nearby
component.
Component 1
1.5
2.0
2.5
3.0
0.4
0.6
0.8
1.0
1.2
1.4
0 20 40 60 80 100
0.1
0.2
0.3
0.4
0.5
Component 2
3.0
3.5
4.0
4.5
0.4
0.6
0.8
1.0
0 20 40 60 80 100
0.20
0.30
0.40
Component 3
4.6
4.8
5.0
5.2
0.4
0.5
0.6
0.7
0 20 40 60 80 100
0.2
0.3
0.4
0.5
0.6
0.7
Figure 7.4: Plot of estimated parameters (µ (top panels), σ (middle panels) and λ
(bottom panels) for approaches using simulated data (D2): Simulated data (Black); Inde-pendent (Red); Informed Prior (Green); Penalised Prior (Blue)
As shown in Figure 7.5 (black line), for the third data set (D3) we simulated data
for the first component with a mean value increasing from 1.5 to 3.0, and weight
125
increasing from 0.1 to 0.6 and then decreasing to 0.3, over time. Often a consequence
of the growth in the first component is a decline in size and weight for the larger
sized particles, and this is reflected in the weight for the second component following
an opposite pattern to the first component. For the third component, the weight
increases from 0.1 to 0.3 over time. The parameters µ and λ are simulated with
some noise around the parameter values, and the sample size is 1000.
Component 1
1.5
2.0
2.5
0.2
0.3
0.4
0.5
0.6
0 20 40 60 80 100
0.0
0.1
0.2
0.3
0.4
0.5
Component 2
3.1
3.2
3.3
3.4
3.5
3.6
0.2
0.4
0.6
0.8
0 20 40 60 80 100
0.2
0.4
0.6
0.8
Component 3
4.0
4.5
5.0
0.4
0.6
0.8
1.0
0 20 40 60 80 100
0.1
0.3
0.5
0.7
Figure 7.5: Plot of estimated parameters (µ (top panels), σ (middle panels) and λ
(bottom panels) for approaches using simulated data (D3): Simulated data (Black); Inde-pendent (Red)
Figure 7.5 also shows the results of using the independent approach. We see that
at times the parameter estimates for the independent approach deviate away from
the actual data.
126
Figures 7.6 and 7.7 show the results for the informed prior and penalised prior
compared to the actual data, respectively. In Figure 7.6, the results show the effect
of varying the degree of smoothing on λ for the informed prior using θ=(0.1,0.5,1.3).
For the results of the penalised prior, we vary the degree of smoothing on λ using
φ=(0.04,0.08,0.12).
Component 1
1.5
2.0
2.5
3.0
0.50
0.60
0.70
0 20 40 60 80 100
0.1
0.2
0.3
0.4
0.5
Component 23.
353.
453.
553.
650.
500.
600.
700.
80
0 20 40 60 80 100
0.3
0.4
0.5
0.6
0.7
0.8
Component 3
4.9
5.0
5.1
5.2
0.50
0.55
0.60
0 20 40 60 80 100
0.10
0.15
0.20
0.25
Figure 7.6: Plot of estimated parameters (µ (top panels), σ (middle panels) and λ
(bottom panels) for Informed Prior approach using simulated data (D3): Simulated data(Black); Theta=0.1 (Green); Theta=0.8 (Blue); Theta=1.3 (Purple)
In Figure 7.6, we can see that the parameter estimates for λ for all three values
of θ appear to closely follow the actual data, with the closest estimates to the actual
data being for θ = 0.5 and 1.3. As we are only using an informed prior on the
weights the parameter estimates for µ and σ appear to be quite variable over time
127
Component 1
1.5
2.0
2.5
0.55
0.60
0.65
0 20 40 60 80 100
0.1
0.2
0.3
0.4
0.5
Component 23.
303.
403.
503.
600.
500.
550.
600.
650.
70
0 20 40 60 80 100
0.3
0.4
0.5
0.6
0.7
0.8
Component 3
4.8
4.9
5.0
5.1
5.2
0.50
0.54
0.58
0.62
0 20 40 60 80 100
0.10
0.15
0.20
0.25
Figure 7.7: Plot of estimated parameters (µ (top panels), σ (middle panels) and λ
(bottom panels) for Penalised Prior approach using simulated data (D3): Simulated data(Black); φ=0.04 (Brown); φ=0.08 (Light Blue); φ=0.12 (Dark Green)
128
compared to the actual data. However, the variability appears to be slightly less
for these variables than for the independent approach (Figure 7.5) and closer to the
actual data over time. Of interest is the closeness of the parameter estimates of µ
and σ for components 1 and 2 which more clearly follow the true growth occurring in
component 1 and the stability over time for component 2 compared to that observed
for the independent approach.
In Figure 7.7, the parameter estimates for the penalised prior approach appear
to deviate slightly from the actual data for components 1 and 2. For the third
component, the parameter estimates for the penalised prior approach follows the
actual data with some noise. Overall, the results from the penalised prior approach
are similar to the independent approach but with less variability over time.
Smoothing on µ and σ
We turn now to an assessment of the impact of using an informative prior for µ
or σ over time. We present results for the highly correlated data set, since this is
the most sensitive of the simulated data as discussed above. Here we set nj = 25,
ξjt = µjt−1, vj = 200/σ2jt−1 and sj = 200.
In Figure 7.8, the parameter estimates for the informative prior for µ appear to
more closely follow the actual data than using an informative prior for σ. Although
the parameter estimates for both approaches appear to be further away from the
actual data than using an informative prior for λ, they do appear to be closer than
under the independent approach.
Figure 7.9 shows the results of using an informative prior on both µ and λ. In
this example, the results are similar to using an informative prior only on λ. Thus
depending on the objectives of the analysis, using an informative prior on both
parameters may not be needed.
In general, from the results of smoothing on µ, σ and λ it appears that large
adjustments to one parameter (e.g from volatility in some time periods) are not
supported unless compensatory measures can be taken by the other parameters.
In analyses not shown here, we compared the results from using a three period
centred moving on the weight for the informed prior to the results of using the
129
Component 1
1.5
2.0
2.5
0.55
0.60
0.65
0 20 40 60 80 100
0.1
0.2
0.3
0.4
0.5
Component 23.
303.
403.
503.
600.
500.
550.
600.
650.
70
0 20 40 60 80 100
0.3
0.4
0.5
0.6
0.7
0.8
Component 3
4.8
4.9
5.0
5.1
5.2
0.50
0.55
0.60
0.65
0 20 40 60 80 100
0.10
0.15
0.20
0.25
Figure 7.8: Plot of estimated parameters (µ (top panels), σ (middle panels) and λ
(bottom panels) for Informed Prior approach using simulated data (D3): Simulated data(Black); Smoothing on µ (Orange); Smoothing on σ (Dark Green))
130
Component 1
1.5
2.0
2.5
3.0
0.2
0.4
0.6
0.8
0 20 40 60 80 100
0.0
0.1
0.2
0.3
0.4
0.5
Component 23.
13.
23.
33.
43.
53.
60.
20.
40.
60.
8
0 20 40 60 80 100
0.2
0.4
0.6
0.8
Component 3
4.0
4.5
5.0
0.2
0.4
0.6
0.8
0 20 40 60 80 100
0.1
0.3
0.5
0.7
Figure 7.9: Plot of estimated parameters (µ (top panels), σ (middle panels) and λ
(bottom panels) for approaches using simulated data (D3): Simulated data (Black); Inde-pendent (Red); Smoothing on µ and λ (Green)
131
penalised prior to assess whether the form specified by the penalised prior had a
different influence on the results. We found the results from using both approaches
to be largely the same.
7.4.2 Case study
The data set studied here was taken from a measurement site at Hyytiala, Finland; a
plot of the measurements for the day selected is shown in Figure 8.2. This particular
day was selected as it shows a new particle formation event occurring, whereby a
new mode of aerosol particles appears with a significant influx of particles (as high as
106cm3) with a geometric mean diameter (< 10 nm), growing later into the Aitken
(25-90nm) or accumulation modes (100+nm). In terms of a temporal mixture model
setting, we will be able to assess the performance of the three prior specifications
outlined previously as new components are introduced and both a growth in the
mean and weight for those components are observed.
As outlined in Section 8.3, the first stage of our approach is to apply RJMCMC
to each time period. These results are then used to guide the choice of the number
of components and initial parameter estimates for the second stage analysis, in
which temporally correlated priors are used to model the evolution of the mixture
parameters over time. Figure 8.9 shows the results of the first stage of the algorithm,
with a plot of the posterior mean estimates for µjt at each time point t, with the
size of the circles indicating the corresponding weight λjt. The average number
of components estimated with the highest probability over the day was four; the
minimum number of components was one, and the maximum number of components
was five.
For the second stage, we fixed the number of components to be five. For the inde-
pendent approach, we set the hyperparameters to be: ξ = (1.5, 2.2, 3.0, 3.8, 4.2, 5.1);
s2t,j = 2.0225; vt,j = 0.092025; and nj = 200. Figure 7.11 shows the results of
estimation using the independent approach.
From previous results of using the three prior specificatons to simulated data
(Section 8.4.1) we generally found closer parameter estimates to the actual data
over time using an informed prior on µ or λ. For data that are quite noisy (D2), we
132
0 20 40 60 80 100
12
34
5
Time
Pos
terio
r m
ean
estim
ates
for
µ
Figure 7.10: Plot of posterior mean estimates for µj from RJMCMC algorithm for oneday (Hyytiala). Stage 1 of analysis for temporal evolution of parameters. Larger circlesindicate greater weight for that component
133
3
]), a
s.ts
(MuM
eans
Ind[
, 4])
, as.
ts(M
uMea
nsIn
d[, 5
]))
23
45
3
]), a
s.ts
(Pro
pMea
nsIn
d[, 4
]), a
s.ts
(Pro
pMea
nsIn
d[, 5
]))
0 20 40 60 80 100 120 140
0.0
0.2
0.4
0.6
0.8
1.0
Figure 7.11: Plot of estimated parameters (µ (top panel), λ (bottom panel) under anindependent prior over time. Stage 2 of analysis for temporal evolution of parameters.
134
also observed that the informative and penalised prior specifications can cause large
adjustments to other parameters. Thus, caution must be exercised when applying
the approaches to data of this type.
Figure 7.12 shows the results of estimation using the informed prior with smooth-
ing on all of the weights and only the mean for component 3. Comparing the results
to the independent approach in Figure 7.11, the parameter estimates for the informed
prior appear to show smoothly growing estimates for µ over time for components 1
and 2, and smoother parameter estimates for λ.
3
]), a
s.ts
(MuM
eans
TP
[, 4]
), a
s.ts
(MuM
eans
TP
[, 5]
))2
34
5
3])
, as.
ts(P
ropM
eans
TP
[, 4]
), a
s.ts
(Pro
pMea
nsT
P[,
5]))
0 20 40 60 80 100 120 140
0.0
0.2
0.4
0.6
0.8
1.0
Figure 7.12: Plot of estimated parameters (µ (top panel) and λ (bottom panel) underan informed prior over time. Stage 2 of analysis for temporal evolution of parameters.Informed prior specified for λ in all components and µ3
In analyses not shown here, we also considered the effect of using an informative
prior on other parameters and found the results to vary to a small degree. Our
choice in using an informative prior on µ3 and λ was guided partly by interest in
135
λ and from prior belief in µ for the larger sized particles, forming in this case the
background concentration of particles, to be highly correlated over time. We were
also guided by our choice of parameters by the variability of the parameter estimates
from the first stage of the analysis.
7.5 Discussion
In this chapter, we explored the problem of estimating Bayesian mixture models
at multiple time points. Under different situations, approaches that employ infor-
mation about neighbouring time points compared favourably to results based on
an independent approach. By including additional temporal information about pa-
rameters for correlated time periods we may be able to better identify individual
components at each time point. As an aid for inference, we may also be able to ob-
tain smoother parameter estimates over time and from this be able to more clearly
establish patterns or identify anomalies from the data.
The results highlight a number of observations about mixture representations at
multiple time points. First, analysis of the evolution of parameters of a mixture over
multiple time points highlights the large degree of dependency that exists between
component parameters. Changes to a parameter in one component may flow on to
the parameter in a nearby component. Depending on the context of the study, we
can anticipate this dependency to be more readily apparent for the weight parameter
but we found similar dependencies to exist for other parameters. The second is the
need to be mindful that the same parameter in one component may have a different
correlation structure over time to the same parameter in another component. In
the context of particle size distribution data, we often observed greater volatility
in estimates for the smaller particles compared to the larger sized particles and
so at times the correlation structure of the parameters between these respective
components appeared to be quite different.
A possible effect of using informative priors in this context is to impose a prior
not supported by the data or to impose a temporal correlation structure where
such a structure does not exist, and thereby cause unnecessary adjustments to other
136
parameters. We observed this most clearly in the results from the simulated data
where at times the data was quite noisy. For this dataset, using an informative
prior for a parameter which supported large adjustments away from the actual data,
resulted in large compensatory adjustments being made by other parameters not only
within the same component, but also to parameters in neighbouring components.
The easy solution may be to use an appropriate correlation structure for components
but of course this may not always be known apriori.
A further result of the dependency that can exist between parameters of com-
ponents and within component parameters is that the inclusion of correlation in-
formation to aid in the identifiability of the mixture, may not be required for all
parameters or alternatively all components. In the context of a mixture with a small
number of components, we may only need to provide more information about one
parameter for an influential component in order to separate out the influence of com-
peting components. This result will also be invaluable if the correlation structure
for one parameter or parameters for one component are more readily known. In the
context of a mixture of Gaussians, we generally found that an informative prior was
only needed on µ or λ or possibly both. This result could well be context specific and
influenced by any reliance on the means for defining (in terms of size) and ordering
of components. The choice of which parameter to use more information may also
be guided by whether it is a parameter of interest for inference as demonstrated in
analysis of the case study where most interest was in the behaviour of λ over time.
In this case, and in general, one must be careful in the analysis of just one parameter
as it can largely be a conditional analysis in view of the behaviour of other possible
cross-correlated parameters within the same component and between components.
While many of the above difficulties may seem to be avoided if smoothing ap-
proaches are applied retrospectively on parameter estimates from an independent
mixture model, this type of analysis may largely ignore the true mapping of compo-
nents or path of parameters over time. From the results of the simulated data, the
large degree of dependency that we observe between the parameters of a mixture
over time suggests that including temporal information to better identify one of the
parameters at a single time point can flow on to affect other parameters. This could
137
change inference about both the mixture representation at a point in time, and also
the behaviour of mixture parameters over time.
In general, one of the potential difficulties in using an informative prior approach
to smooth parameter estimates over time is the variable degree of influence the prior
may have in the posterior. If the primary objective is to obtain smoothed parameter
estimates over time, larger sample sizes and noisiness of the data at times may
warrant increasingly restrictive priors. In such cases where the objective might be
to downplay the influence of the data, a number of alternative approaches to increase
the influence of prior information can be used Ibrahim et al. (2003). In all cases, it is
valuable to undertake a sensitivity analysis in order to assess the effect of the prior.
Such an analysis should include the independent prior as a baseline comparison.
Alternative approaches which are less sensitive to the form in which prior infor-
mation is given in the model, and/or include covariate information could also be
used to aid in estimation.
For estimation of aerosol particle size distributions, the dynamics of the aerosol
process and the complexity of the influences on particle concentration and size,
demand the use of approaches which utilise as much information from the data as
possible. To this end, the inclusion of temporal information may be helpful.
Chapter 8
Bayesian hierarchical modelling for a time series of
mixtures
In this chapter, we address some of the issues raised in Chapter 7, and explore a
hierarchical approach to estimation of mixture parameters over time in which an
informative prior is placed at two different levels. Simulated and actual data is used
to assess the performance of the approach. As this chapter is designed to be read
independently of the previous chapters, Section 2 describing PSD data and the first
part of Section 3 outlining mixture models, are largely repeated from Chapter 7.
8.1 Introduction
In Chapter 7, we explored the problem of estimating a Bayesian mixture model at
multiple time points using an informative or penalised prior which carried informa-
tion about the correlation of parameters for neighboring time points. The analysis,
in general, highlighted a number of observations about mixture representations at
multiple time points. First, analysis of the evolution of parameters of a mixture over
multiple time points highlighted the large degree of dependency that exists between
component parameters. For example, in the case of a mixture of Gaussians, large
changes to the weight parameter over time for one component not only was reflected
138
139
in adjustments to the weights of other components but also to the mean parameter
of the associated component and neighbouring components.
The second is the observation that a parameter in one component may have a dif-
ferent correlation structure over time to the same parameter in another component.
In the context of PSD data, we often observe greater volatility in concentration lev-
els for the smaller sized particles compared to the larger sized particles and we can
expect this to be reflected in the corresponding parameter estimates for a mixture
model over time.
In light of the above observations, a possible effect of using informative priors
in this context is to impose a prior not supported by the data or to impose a
temporal correlation structure where such a structure does not exist, and thereby
cause unnecessary adjustments to other parameters.
In this chapter, we explore the problem of estimating a Bayesian mixture model
at multiple time points using a hierarachical model for the parameters in which
an informative prior is placed at two different levels. The aim of exploring this
approach is to address some of the issues raised in the previous paper and develop
an alternative approach which is less sensitive to the form in which prior information
is given in the model.
An outline of the chapter follows. In Section 8.2, we briefly describe particle
size distributions, and provide an illustration with actual data. In Section 8.3, we
outline the standard mixture model setup for a single time point and a two stage
approach for estimation of a mixture model at multiple time points. For estimation
of a mixture model at multiple time points we introduce a hierarchical approach
to estimation. Section 8.4 presents results on the performance of the approach on
several simulated datasets and actual data, and we conclude in Section 8.5 with
some discussion and possibilities for further work.
8.2 Particle size distribution data
One of the most important physical properties of aerosol particles is their size and
the concentration of particles in terms of their size is referred to as the particle size
140
distribution. Figure 8.1 shows an example of particle size distribution data for one
measurement or time period. Because aerosol particles are often charged, their size
can be determined from their electrical mobility (McMurry, 2000).
1 2 3 4 5 6
0.0
0.1
0.2
0.3
0.4
0.5
Particle Diameter (log(Dp(nm))
Den
sity
Figure 8.1: Histogram of data sampled from Hyytiala, Finland for a single time period
In this study we present, as an example, the aerosol particle evolution, before,
during, and after a new particle formation event at a Boreal Forest in Southern
Finland (Figure 8.2). This dataset was selected as it provides a wide ranging repre-
sentation of modes for particle size distributions (Dal Masso et al., 2005). Because
aerosol particles are governed by formation and transformation processes, they tend
to form well distinguished modal features. For example, during background con-
ditions in the Boreal Forest the particle number size distribution of fine aerosols
(diameter < 2.5 mm) is bimodal: an Aitken mode (below 0.1 mm) and an accumu-
lation mode (over 0.1 mm). During a new particle formation event a new particle
mode, which is commonly known as nucleation mode, is formed in the atmosphere
with geometric mean diameter below 0.025 mm. However, in the urban atmosphere,
141
aerosol particles are more dynamic because of the different types and properties of
sources of aerosol particles and may show more modes. Typically the number con-
centrations of aerosol particles in the urban background can be as high as 5×104cm−3
and very close to a major road they often exceed 105cm−3.
8.3 Mixture models
In this section, we briefly describe a mixture model, outline the independent and
informed prior representations, and outline a hierarchical approach to estimation of
parameters.
The density of data (y) at a given time period is represented by a finite mixture
model
p(y|θ) =k∑
j=1
λjf(y|θj) (8.1)
where k is the number of components in the mixture, λj represents the probability
of membership to the jth component (∑k
j=1 λj = 1), and f(y|θj) is the density
function of component j which has parameters θj.
As component membership of the data is unknown, a computationally convenient
method of estimation for mixture models is to use a hidden allocation process and
introduce a latent indicator variable zij, which is used along the lines of a missing
variable approach to allocate observations yi to each component.
In this chapter, we adopt the common assumption of fitting log-normal distri-
butions to aerosol particle size distribution data (Whitby and McMurry, 1997). As
PSD data are often measured with a definite lower and upper bound for the size of
the particles we introduce a slight modification and assume that the (log) data follow
a truncated normal distribution. Thus, we take the data (y) to be the log of particle
diameters (nm), and the parameters to be estimated (θj) for each component are
the mean (µ), variance (σ2) and weight (λ). The number of components k is also
considered to be unknown.
For the independent, informative prior and hierarchical approaches (except where
142
Figure 8.2: An illustration of a new particle formation event at a Boreal Forest sitelocated in Southern Finland. (a) The temporal variation of the particle number size distri-bution and (b) selected particle number size distributions showing the different stages ofthe newly formed particle mode from its early stage. Note that this new particle formationoccurred on a regional scale over the southern part of Finland.
143
stated otherwise), priors were:
p(µj) ∼ N(ξ, κ−1)
p(σ−2j ) ∼ Gamma(δ, β)
p(β) ∼ Gamma(g, h)
p(λ) ∼ Dirichlet(α1, α2, · · · , αk)
p(k) ∼ Uniform(kmin, kmax)
where ξ, κ, δ, α, η, g, h, kmin and kmax are fixed hyperparameters.
In the first stage of the temporal analysis, for each time period we implemented
Richardson & Green’s (1997) RJMCMC algorithm to estimate both θt and kt (t =
1, . . . , T ). Although this algorithm is easily fit at a single time point, the use of
RJMCMC for mixture models with temporal data, where both θ and k may vary et
each time point, requires significant pre-processing with respect to mixing coverage
and convergence, as well as post-processing to provide adequate summary statistics
and between time component mapping. As an alternative, we consider a two-stage
approach. In the first stage, the number of components is estimated at each time
point using RJMCMC. In the second stage, we fix the number of components (k)
to the maximum observed at any time period and then independently estimate the
parameters θj (j = 1, . . . , k) for each time period using a Gibbs sampler algorithm.
As we do not observe all of the components in every time period, we allow component
weights to be ‘effectively zero’ (inf(λt)=0.001) if required. The Gibbs sampler is
iterated until the Markov Chains for the parameters have converged to stationary
posterior distributions.
In the second stage, for estimation of parameters of a mixture at multiple time
points, independent estimation of θ at each time period does not allow for any
information about θ to be shared over time. An alternative to this independent
approach is to use an informative prior where information provided from previous
and future time periods is used as prior information for the current period. In a
previous paper, we explored the use of this type of approach in some detail. Here, the
prior was imposed directly on elements of θ and strong sensitivity of the posterior
144
estimates tom the specification of these priors was observed. In this paper, we
explore an alternative representation as described in section 1.3.1 and present the
results from the previous approach as a comparison to the results from the new
hierarchical approach.
We focus, in particular, on a simple case where posterior estimates from the
previous period are used as prior information in the current period. As the weight
parameter in a mixture is often of interest in analysing PSD data, we present the
results from using an informative prior for this parameter. Thus specification of
prior information for λ can be achieved by allowing δt,j in the Dirichlet prior at time
t to depend on λt−1,j.
For the results to follow for the informative prior approach, we specify that δt,j =
θjmt−1,j where mt−1,j is the mean number of observations allocated to component j
in the previous time period. The parameter θj reflects how strongly the information
from the previous time period is used as prior information for the current period. In
this chapter, we choose to fix θ = 0.5; alternatively we could estimate this parameter
but we do not pursue this approach in this paper.
8.3.1 Hierarchical time series approach for mixture models
In this section, we outline a hierarchical approach for the estimation of parameters
of a mixture model for multiple time points.
Smoothing on µ
The hierarchical approach for µ is specified as,
µjt ∼ N(φjt, V1)
φjt ∼ N(φjt−1, V2)(8.2)
where V1 and V2 are fixed scalars, reflecting the variability of µjt and φjt re-
spectively. In this hierarchical formulation, the parameter µ is used to estimate the
mixture distribution at the level of the data, and φ represents the underlying corre-
lation of µ over time (assuming an AR(1) process). In this setting, we can interpret
145
the ratio V2/V1 as reflecting the amount of information we have about the underlying
behaviour (signal) of µ in comparison to estimates at the level of the data (noise).
For the first time period (t=1), we set φjt = µjt. For estimation of µ and φ we
use a Gibbs sampling scheme. For details see the Appendix.
Smoothing on λ
For independent data, a convenient prior for λ is to use a Dirichlet distribution,
However, it is difficult to work with a Dirichlet in a time series or hierarchical
approach, mainly due to the inflexibility of the Gamma distribution. An alternative
formulation of the Dirichlet in terms of the Beta distribution does not appear to
provide greater flexibility.
Another alternative is to use a Logistic-Normal prior for λ (LN(λ; Xt, Σd)) where,
Wt ∼ MV N(Xt, Σd)
λj,t =exp(Wj,t)∑kj=1 exp(Wj,t)
(8.4)
Using this functional form, the parameterisation of λ in terms of a multivariate
normal distribution allows for a suitably flexible form in which to explore a hierar-
chical structure for this parameter. Such flexibility, in comparison to the Dirichlet
distribution, has recently been investigated in a hierarchical approach for pooling of
estimates across different sampling units (Hoff, 2003).
In a hierarchical setting and similar to the model used for µ we can further say
146
that,
Xt ∼ MV N(Xt−1, Σs)
γj,t =exp(Xj,t)∑kj=1 exp(Xj,t)
(8.5)
where Σd and Σs reflect the variability of Wt and Xt respectively. In this hier-
archical formulation, the parameter λ is used to estimate the mixture model at the
level of the data, and γ represents the underlying correlation of λ over time (assum-
ing an AR(1) process). For the results to follow the diagonal entries of Σd and Σs are
fixed to reflect the noisiness of the data and the degree of smoothing respectively,
and off diagonal entries are set to zero. Alternatively, we could estimate Σd and Σs
but we do not pursue this approach in this paper.
For estimation of λ and γ we use a Gibbs sampling scheme with a Metropolis
Hastings step For details see the Appendix. For identifiability, both Wt and Xt are
k − 1 dimensional, and λk = 1−∑k−1j=1 λj (with same identification used for γ).
8.4 Results
In this section we present and assess the results using simulated data and then
present the results of applying the approach to particle size distribution data from
Hyytiala, Finland.
8.4.1 Simulated Data
Data Setup
We simulated data which is indicative of the type of behaviour of aerosol particle
size distribution data observed at Hyytiala, a boreal forest site in Southern Finland
(SMEAR II) (Vesala et al. (1998)). A particular feature of this particle size distribu-
tion data is both a growth in the mean and weight for a component. Two datasets
are simulated. The first provides an illustration of a particular feature of PSD data
for some time periods and the second is representative of most time periods.
In practice it is quite common to observe sudden large changes in the number
of particles measured which may persist for a number of time periods. This is more
147
often observed when the number of particles for a particular size group are low, and
more so for the smaller sized particles. For the first dataset (D1) we simulate data
for the first component where the weight at smaller values is quite volatile. For this
dataset the mixture is well identified.
For the second dataset (D2), we simulated data which is highly correlated across
time, a feature of particle size distribution data observed in practice for most time
periods where measurements are commonly taken at small time intervals. This
dataset was simulated with parameter estimates where the mixture is not well iden-
tified during the second half of the time period.
Results from simulated dataset D1
As shown in Figure 8.3 (black line), for the first simulated dataset (D1) we simulated
data for the first component with a mean value increasing slowly over time from 1.5,
and weight increasing from 0.2 to 0.5. For the first half of the time period, the
weight for the first component was simulated with a large degree of noise to reflect
the observed volatility of smaller sized particles in practice at relatively low weights.
The parameter µ was simulated with some noise around the parameter values; σ
kept constant at 0.55, and the sample size was 1000.
Also shown in Figure 8.3 are the results from the independent (red) and informed
prior approaches (green). For the results from the informed prior, we can see that
the effect of using an informative prior on the weights (λ) over time results in
compensatory measures by both µ and σ. We can see this most clearly in the results
for the first component where we see large adjustments to µ1 in compensation for
a smoother estimate of λ1 over time, which is clearly in contrast to the actual data
(black) and results from the independent approach (red line). Of interest is that we
don’t see large compensatory measures for the parameters of the third component,
where for the first half of the time period the actual behaviour of the weight (λ3) is
highly variable. The difference appears to be that in the first component, the mean is
able to adjust to a higher value which is supportive of a greater weight, and in some
sense borrow support from the second component. For the third component, the
mean is not able to increase or decrease in support of a lower weight by borrowing
148
Component 1
1.5
2.0
2.5
3.0
0.4
0.6
0.8
1.0
1.2
1.4
0 20 40 60 80 100
0.1
0.2
0.3
0.4
0.5
Component 23.
03.
54.
04.
50.
50.
70.
91.
1
0 20 40 60 80 100
0.20
0.30
0.40
Component 3
4.6
4.8
5.0
5.2
0.45
0.55
0.65
0.75
0 20 40 60 80 100
0.2
0.3
0.4
0.5
0.6
0.7
Figure 8.3: Plot of estimated parameters over time for simulated dataset D1: µ (toppanels), σ (middle panels), λ (bottom panels). Actual data (Black), Independent (Red),Informed Prior (Green)
149
support from a nearby component. Both the independent approach and informed
prior approaches overestimate the weight in the second component.
Figure 8.4 provides the results of the hierarchical model for µ (green and blue)
and the actual data (black). For these results, V1 and V2 are fixed at 0.36 and 0.04
respectively. From Figure 8.4, we can see that the estimates for φ are a smooth
version of the much noisier estimates for µ.
Component 1
1.4
1.6
1.8
2.0
0.4
0.5
0.6
0.7
0.8
0 20 40 60 80 100
0.1
0.2
0.3
0.4
0.5
Component 2
3.4
3.6
3.8
4.0
0.5
0.6
0.7
0.8
0 20 40 60 80 100
0.20
0.25
0.30
0.35
0.40
Component 3
4.85
4.95
5.05
5.15
0.50
0.55
0.60
0 20 40 60 80 100
0.2
0.3
0.4
0.5
0.6
0.7
Figure 8.4: Plot of estimated parameters over time for simulated dataset D1. µ (toppanels), σ (middle panels), λ (bottom panels). Actual data (Black), Hierachical approachfor µ (Dark Green), φ (Blue)
Figure 8.5 provides the results of the hierarchical model for λ (green and blue)
and the actual data (black). For these results, V1 and V2 are fixed at 0.2 and
0.015 respectively. From Figure 8.5 the estimates for γ roughly follow the more
variable estimates for λ over time. In contrast to the results from the informed
150
prior (Figure 8.3), we don’t see any large compensatory adjustments being made
to parameters by using a hierarachically based informative prior for γ. Apart from
estimates for γ, the variability of the other parameter estimates are comparable to
the results from the independent approach.
Component 1
1.4
1.6
1.8
2.0
0.4
0.5
0.6
0.7
0.8
0 20 40 60 80 100
0.1
0.2
0.3
0.4
0.5
Component 2
3.4
3.6
3.8
4.0
0.5
0.6
0.7
0.8
0 20 40 60 80 100
0.20
0.25
0.30
0.35
0.40
Component 3
4.85
4.95
5.05
5.15
0.50
0.55
0.60
0 20 40 60 80 100
0.2
0.3
0.4
0.5
0.6
0.7
Figure 8.5: Plot of estimated parameters over time for simulated dataset D1. µ (toppanels), σ (middle panels), λ (bottom panels). Actual data (Black), Hierachical approachfor λ (Dark Green), γ (Blue)
Results from simulated dataset D2
As shown in Figure 8.6 (black line), for the second simulated dataset (D2) we sim-
ulated data for the first component with a mean value increasing from 1.5 to 3.0,
and weight increasing from 0.1 to 0.6 and then decreasing to 0.3, over time. Often a
consequence of the growth in the first component is a decline in size and weight for
151
the larger sized particles, and this is reflected in the weight for the second component
following an opposite pattern to the first component. For the third component, the
weight increases from 0.1 to 0.3 over time. The parameters µ and λ are simulated
with some noise around the parameter values, and the sample size is 1000.
Component 1
1.5
2.0
2.5
3.0
0.35
0.45
0.55
0.65
0 20 40 60 80 100
0.1
0.2
0.3
0.4
0.5
Component 2
3.20
3.30
3.40
3.50
0.40
0.50
0.60
0.70
0 20 40 60 80 100
0.4
0.5
0.6
0.7
0.8
Component 3
4.7
4.8
4.9
5.0
5.1
5.2
0.35
0.45
0.55
0 20 40 60 80 100
0.10
0.15
0.20
0.25
0.30
Figure 8.6: Plot of estimated parameters over time for simulated dataset D2. µ (top panels),σ (middle panels), λ (bottom panels). Actual data (Black), Independent (Red), Informed Prior(Green)
Also shown in Figure 8.6 are the results from the independent (red) and informed
prior approaches (green). For the results of the informed prior, the parameter esti-
mates appear to closely follow the actual data in comparison with the independent
approach. Of interest is the closeness of the parameter estimates of µ and σ for com-
ponents 1 and 2 over the second half of the time period, which more clearly follow
the true growth occurring in component 1 and the stability over time for component
152
2.
Figure 8.7 provides the results of the hierarchical model for µ (green and blue)
and the actual data (black). For these results, V1 and V2 are fixed at 0.04 and
0.0025 respectively. The parameter estimates for the mean of the first and second
components appear to be lower than the actual data for the last quarter of the time
period, which is similar to the results of the independent approach (Figure 8.6).
Component 1
1.4
1.8
2.2
2.6
0.35
0.45
0.55
0 20 40 60 80 100
0.1
0.2
0.3
0.4
0.5
Component 2
3.25
3.35
3.45
3.55
0.40
0.50
0.60
0.70
0 20 40 60 80 100
0.4
0.5
0.6
0.7
0.8
Component 3
4.8
4.9
5.0
5.1
0.35
0.40
0.45
0.50
0.55
0 20 40 60 80 100
0.10
0.15
0.20
0.25
0.30
Figure 8.7: Plot of estimated parameters over time for simulated dataset D2. µ (top panels), σ
(middle panels), λ (bottom panels). Actual data (Black), Hierachical approach for µ (Dark Green),φ (Blue)
Figure 8.8 provides the results of the hierarchical model for λ (green and blue)
and the actual data (black). For these results, V1 and V2 are fixed at 0.025 and
0.0075 respectively. The parameter estimates for the hierarchical approach appear
to more closely follow the actual data than for the independent approach.
153
Component 1
1.5
2.0
2.5
0.35
0.45
0.55
0.65
0 20 40 60 80 100
0.1
0.2
0.3
0.4
0.5
Component 23.
303.
403.
500.
450.
500.
550.
60
0 20 40 60 80 100
0.4
0.5
0.6
0.7
0.8
Component 3
4.85
4.95
5.05
5.15
0.35
0.40
0.45
0.50
0 20 40 60 80 100
0.10
0.15
0.20
0.25
Figure 8.8: Plot of estimated parameters over time for simulated dataset D2.µ (toppanels), σ (middle panels), λ (bottom panels). Actual data (Black), Hierachical approachfor λ (Dark Green), γ (Blue)
154
8.4.2 Case study
The data set studied here was taken from a measurement site at Hyytiala, Finland
and a plot of the measurements for the day selected is shown in Figure 8.2. This
particular day was selected as it shows a new particle formation event occurring,
whereby a new mode of aerosol particles appears with a significant influx of particles
(as high as 106cm3) with a geometric mean diameter (< 10 nm), growing later into
the Aitken (25-90nm) or accumulation modes (100+nm). In terms of a mixture
model setting, we will be able to assess the performance of the three approaches
outlined previously as new components are introduced and both a growth in the
mean and weight for those components are observed.
As outlined in Section 8.3, the first stage of our approach is to apply RJMCMC
to each time period. These results are then used to guide the choice of the number
of components and initial parameter estimates for the second stage analysis, in
which temporally correlated priors are used to model the evolution of the mixture
parameters over time. Figure 8.9 shows the results of the first stage of the algorithm,
with a plot of the posterior mean estimates for µjt at each time point t, with the
size of the circles indicating the corresponding weight λjt. The average number
of components estimated with the highest probability over the day was four; the
minimum number of components was one, and the maximum number of components
was five.
For the second stage, we fixed the number of components to be five. Figure 8.10
shows the results of estimation using the independent approach.
Figure 8.11 shows the results of estimation using the hierarchical model for the
weights (λ). For these results, V1 and V2 are 0.05 and 0.015 respectively. Compared
to the results from the independent approach (Figure 8.10), we see a noticeable
reduction in the noise surrounding λ and a clearer picture emerging of the pattern
of λ over the course of the day.
Alternatively, we could have used a hierarchical approach for µ alone, or for both
µ and λ. In results not shown, we found similar results for both approaches.
155
0 20 40 60 80 100
12
34
5
Time
Pos
terio
r m
ean
estim
ates
for
µ
Figure 8.9: Plot of posterior mean estimates for µj from RJMCMC algorithm for oneday (Hyytiala). Stage 1 of analysis for temporal evolution of parameters. Larger circlesindicate greater weight for that component
156
12
34
50.
00.
20.
40.
60.
81.
0
00:00 04:00 08:00 12:00 16:00 20:00 23:50
Figure 8.10: Plot of estimated parameters over time for actual data. Independentapproach. Posterior mean estimates for µ (top panel), and λ (bottom panel).
157
12
34
50.
00.
20.
40.
60.
81.
0
00:00 04:00 08:00 12:00 16:00 20:00 23:50
Figure 8.11: Plot of estimated parameters over time for actual data. Hierarchicalapproach for λ. Posterior estimates for µ (top panel), and γ (bottom panel).
158
8.5 Discussion
In this chapter, we explored the problem of estimating Bayesian mixture models at
multiple time points. In this setting, parameters of the mixture model at each time
point are likely to be correlated with neighbouring time points and useful information
about the parameters may be gained by incorporating this information in estimation.
We found that using a hierarchical approach to the estimation of parameters, where
an informative prior is placed at two different levels, offers considerable flexibility in
estimation for a mixture model setting.
Compared to placing an informative prior at a single level, a hierarchical ap-
proach allows for a separation of the underlying pattern of the parameter over time
(signal) from some of the noise surrounding the parameter at each time point. The
advantage of this is two fold. First, where inference is interested in the underlying
pattern of the parameters over time, we may be able to more clearly establish pat-
terns or identify anomalies from the data. Second, in light of the large degree of
dependency that exists between parameters of a mixture both within and between
components, we can impose an informative prior which may be less sensitive to
changes in the correlation structure of the data, and thereby reduce the influence of
adjustments to neighbouring parameters.
In the hierarchical approach outlined, the influence of the informative prior at
the two levels was specified by parameters V1 (low level) and V2 (high level), and
the values assigned to these parameters is critical in carrying information about
the correlation structure of the parameter of interest. In this paper, we decided
to choose parameter values based on prior belief in the correlation structure of the
data; alternatively these parameters could be estimated. To this effect, a number
of approaches are available for estimation (West and Harrison (1997); Fahrmeir
et al. (2004)). However, in order to estimate V1 and V2, we still face a choice as to
the degree of penalisation or smoothing of the parameter in light of the apparent
variability in the data. This is a common issue in temporal and spatial modelling
in general.
Although, we have focussed on developing a hierarchical approach for parameters
159
µ and λ we could equally apply the same approach to consider estimation of σ. Such
an approach may be to consider a half-t distribution which has previously been used
in similar hierarchical settings (Gelman (2006)). Faced with a number of parameters
to consider, the choice of parameter or parameters may depend on the objectives of
the analysis, the data context and the information available. As most interest for
PSD data is in the size and composition of particles over time, we found it useful to
concentrate on µ and λ over time; in other contexts this will change.
Although we have applied the hierarchical approach to PSD data, the approach
is generalisable to other contexts in which a mixture representation exists at multiple
time points. For example, in a disease mapping context interest may be in both the
mixture representation of the spatial surface and also in any temporal changes to
the mixture.
The hierarchical approach considered here can be readily generalised to include
covariates. Morever, through the flexibility of assuming a logistic normal distribution
on the weights we can better explore and estimate transitory movements between
components.
There are several limitations to the hierarchical approach considered. First, the
hierarchical approach relies on estimation of parameters under a fixed number of
components. In this chapter, we sought to fix the number of components based on
a first stage analysis in which we used results from the RJ approach as a guide to
the maximum number of components and for establishing hyperparameter values.
In some situations, where reliable prior information is available this first stage may
not be necessary. However, an alternative is to use a single approach and jointly
allow estimation of the parameters and the number of components (e.g RJMCMC).
This single modelling approach requires reversible moves not only within time pe-
riods but also across them. In our experience this was computationally very costly
and required substantial pre-processing to ensure good mixing, labelling and conver-
gence. Moreover, further post-processing was required to obtain adequate summary
statistics and between component mapping.
A further limitation of the approach outlined is that it is computationally ex-
pensive. Most of this expense is experienced in the first stage of the analysis. For
160
estimation of PSD data over one day using 144 time points, the running time of
the RJ approach with 200,000 iterations was about 27 hours. In comparison, the
second stage approach using 50,000 iterations took about an hour. Such computa-
tional expense quickly becomes burdensome if analyses is required for several days
or indeed several weeks. Of course, the use of the first stage for subsequent days
may not required, considerably reducing the computational time involved.
Chapter 9
Conclusions and further work
This chapter provides a brief overview of the thesis and some suggestions for further
work.
9.1 Conclusions
The primary aim of this thesis was to develop mixture model approaches to char-
acterise complex environmental exposures and outcomes. To address this primary
aim, we focussed on a number of applied problems in characterising complex en-
vironmental exposures and outcomes, including: assessing the interaction between
environmental exposures as risk factors for health outcomes; identifying differing
environmental outcomes across a region; and establishing patterns in the size and
concentration of aerosol particles over time. In this section, we discuss the four main
methodological contributions to address these problems and associated applied con-
tributions which have been made.
First, we explored the use of a mixture model in a meta-analysis setting to provide
for a joint assessment of the evidence for a number of hypothesised relationships in
the data. In Chapter 3, we examined the use of a multivariate meta-analysis to
describe the relationship between exposure to asbestos and smoking on the risk of
161
162
lung cancer. In particular, from a statistical perspective, interest was in whether
the risk from exposure to both asbestos and smoking is an additive, multiplicative
or other relation of the risk from exposure to each factor alone. In this analysis, we
considered the evidence for either relation using separate tests.
In Chapter 4, we extended the analysis in Chapter 3 and explored a mixture
model approach to assess the strength of evidence for either relation. In this ap-
proach, we moved away from separate tests for either an additive or multiplicative
relation and allowed the data to choose between both models. The approach allowed
both relations to be considered at the same time, and an advantage for inference is
that we can say with some probability whether the data belongs to one relation or
another. This type of inference may be more informative than information provided
from significance tests on each relation separately.
Second, we developed a simple mixture model approach to classify cases of a
disease over time into a number of groups. In Chapter 5, we examined a mixture
model approach to characterise the risk of Ross River virus (RRv) in Queensland.
This approach built on the approach adopted by Gatton et al. (2004) and considered
that the weekly cases of RRv could be attributed to more than two hypothesised
periods (an outbreak period or no outbreak period), and also extended the analysis
to compare the number and timing of the periods across the spatial region of QLD.
In this approach, we may be able to better identify outbreak periods when they
occur and also provide a more detailed characterisation of the data, which can be
used as a basis for association of explanatory variables.
Third, we developed and examined an informative prior approach for estimation
of mixture model parameters for multiple time points. A mixture model approach
to estimate aerosol particle size distribution (PSD) data over time was introduced
in Chapters 6, 7 and 8. In Chapter 6, we compared the results of using a Bayesian
mixture model approach to estimating PSD data with a commonly used estima-
tion method in the aerosol physics literature. In using a Bayesian mixture model
approach we were able to improve upon previous approaches by providing a better
exploration of the parameter space, and also allow the data to better choose between
alternative representations without the use of subjective decisions. As PSD data is
163
often measured over time at small time intervals, we also examined the use of an
informative prior for estimation of the mixture parameters which takes into account
the correlated nature of the parameters.
In Chapter 7, we examined in some detail the issue of using informative priors for
estimation of mixtures at multiple time points. In this analysis, the use of two dif-
ferent informative priors, and an independent prior were compared using simulated
and actual data. In general, we found that approaches that employ information
about neighbouring time points compared favourably to results based on an inde-
pendent approach. We found that by using informative priors about parameters for
correlated time periods we may be able to better identify individual components at
each time point. As an aid for inference, we may also be able to obtain smoother pa-
rameter estimates over time and from this be able to more clearly establish patterns
or identify anomalies from the data.
Analysis of the evolution of parameters of a mixture over multiple time points
also highlighted the large degree of dependency that exists between component pa-
rameters. A possible effect of using informative priors in this context is to impose a
prior not supported by the data or to impose a temporal correlation structure where
such a structure does not exist, and thereby cause unnecessary adjustments to other
parameters.
Fourth, we introduced a hierarchical approach to estimate mixture model pa-
rameters for multiple time points. In this approach (Chapter 8), we addressed some
of the issues associated with using an informative prior at a single level found in
Chapter 7, and allowed an informative prior to be placed at two different levels.
Compared to placing an informative prior at a single level, a hierarchical approach
allows for a separation of the underlying pattern of the parameter over time (signal)
from some of the noise surrounding the parameter at each time point. In this case,
we may be able to more clearly establish patterns or identify anomalies in the data.
We can also impose an informative prior which is less sensitive to changes in the
correlation structure of the data, and thereby reduce the influence of adjustments
to neighbouring parameters.
In summary, we have demonstrated that a mixture model approach can be used
164
to better understand and describe features/relationships within environmental ex-
posure data. The approach is not without significant computational and estimation
issues, and thus considerable care must be taken in using the approach for inference.
These issues, however, are likely to be outweighed by the additional information this
approach can provide to understand complex environmental exposure and outcome
data.
9.2 Future Work
The mixture models and analysis in this thesis could be extended in a number of
ways.
A mixture model approach to provide an assessment of interaction or relation-
ship between risk factors in a meta-analysis context could be extended to include
alternative relations or be used to assign preference to more than two relations. The
number of relations and the nature of the hypothesised relations being dependent
on the context of the study.
We could extend the mixture model to characterise the risk of RRv over time
to formally include a spatial dimension, where mixture model parameters for each
zone are able to borrow strength from parameter estimates of neighbouring zones.
This is similar to the approach adopted in Fernandez and Green (2002) for a single
time point, in which the weight parameter is spatially related by neighbouring sites.
Further analysis of the RRv data would be needed, however, to investigate which
parameters may be spatially related, including the timing of components.
For estimation of mixture models over time (Chapters 6, 7 and 8), a number of
extensions are possible. First, improvements to parameter estimation may be gained
by reducing the influence of the truncated nature of the size data (i.e the effect of
binning on the size of the particles). In estimation, we could take into account
the ordering of the size bins. In this case, we recognize that observations within
neighbouring size bins are more likely to be allocated to the same component. A
natural approach would be to then use a spatial prior on the allocation variable (z)
(similar to Alston et al. (2005)), and depending on the strength of prior information,
165
this could reduce the number of components covering only a small number of size
bins.
To reduce the influence of the truncated nature of the size data, we could also
expand the number of size bins used in estimation. In this approach, a number of
extra size bins are created between the original size bins and handled in estimation
as missing data. The effect of which is likely to lead to a smoother mixture repre-
sentation of the data. The tradeoff is more computational time by a factor of the
number of additional size bins created, and this would need to be evaluated against
potential improvements in estimation.
Within the MCMC framework, block updating rather than sequential updating
could be used in the hierarchical approach to minimise the effect of correlation
between parameters leading to improved convergence and mixing. This is likely to be
of most benefit for dependencies which are apparent between µ and φ (Equation 8.2)
or λ and γ (Equation 8.5)
Any improvements to computational time are worthy of investigation. Analyses
requiring several weeks or months will require significant computational demands.
Population monte carlo (Celeux et al. (2003)), or perfect sampling (Casella et al.
(2004)) could be investigated and developed to allow for estimation of a mixture
over multiple time points.
The hierarchical approach to estimation could also be extended to include a hi-
erarchical structure for the variance (σ2), and alternative correlation structures. For
the variance, a flexible prior such as a truncated t-distribution could be investigated
(Gelman (2006)). The correlation structure could also be extended to include covari-
ates, which could provide further information to aid in identification of components
at each time point.
Further analysis could also be undertaken to associate the components (modal
structure) of the mixture model with health outcomes. Evidence on the association
of air pollution particles with a number of respiratory related diseases is growing
(Osunsanya et al. (2001); Chen et al. (2006)). Such a detailed characterisation of the
data would enable a more representative association to be obtained with either the
source of the particles or a range of particles of a particular size and concentration.
Appendix A
Appendices
A.1 Calculations for the variance of S and V (Ch.3)
Variance of S
We calculated the variance of S based on Rothman (1976). A large sample interval
estimator for S based on a log-Gaussian sampling distribution would be
SL = exp(ln(S)− Z1−α/2SE(ln(S)))
SU = exp(ln(S) + Z1−α/2SE(ln(S)))(A-1)
The evaluation of SE(ln(S)) depends upon the type of study. For case-control
studies,
SE(ln(S)) =
[ˆvar( ˆRRAS)
( ˆRRAS − 1)2+
ˆvar( ˆRRS) + ˆvar( ˆRRA) + 2 ˆcov( ˆRRS, ˆRRA)
( ˆRRS + ˆRRA − 2)2
− 2 ˆcov( ˆRRAS, ˆRRS + ˆRRA)
( ˆRRAS − 1)( ˆRRS + ˆRRA − 2)
]1/2
,
(A-2)
where
ˆvar( ˆRRij) = ˆRRij2(
1
aij
+1
cij
+1
b+
1
d
)(A-3)
166
167
ˆcov( ˆRRS, ˆRRA) = ˆRRSˆRRA
(1
b+
1
d
)(A-4)
ˆcov( ˆRRAS, ˆRRS + ˆRRA) = ˆRRAS( ˆRRS + ˆRRA)
(1
b+
1
d
), (A-5)
and b and d denote the frequencies of cases and controls in the low-risk category for
both risk indicators, and aij and cij denote the frequencies of cases and controls in
(non-referent) risk category i, j.
For cohort studies (with small effects), using first order Taylor series approximations,
SE(ln(S)) =
[ˆvar( ˆRAS) + ˆvar(R00)
( ˆRAS − R00)2+
ˆvar(RS) + ˆvar(RA) + 4 ˆvar(R00)
(RS + RA − 2R00)2
− 4 ˆvar(R00)
( ˆRAS − R00)(RS + RA − 2R00)
]1/2(A-6)
where ˆvar(Rij) can be taken as Rij/Mij with Mij denoting the total number of
observations in the joint risk indicator category i, j.
Variance of V
For case-control studies, V can also be expressed as RRAS compared to RRS(X2)
divided by RRA(X1). V = X2/X1.
var(log(X1)) = 1/a + 1/b + 1/c + 1/d
var(log(X2)) = 1/e + 1/f + 1/g + 1/h
var(log(V )) = var(log(X1)) + var(log(X2))
(A-7)
where a to h denote the frequency of cases and controls for each risk category, and
X1 and X2 are assumed to be independent.
For cohort studies with background risk not externally referenced,
var(log(V )) = 1/a + 1/b + 1/e + 1/f (A-8)
For cohort studies with background risk externally referenced, we used the variance
for the ratio of two standardised ratios found in Gardner and Altman (1989).
168
A.2 Reversible Jump Markov Chain Monte Carlo
(RJMCMC) (Ch.6)
In this section, we outline details of the RJMCMC algorithm used in this chapter.
An important feature of the algorithm is the variable dimension of the state spaces.
The change of dimension for the mixture model using Reversible Jump Markov Chain
Monte Carlo (RJMCMC) is achieved by either splitting an existing component into
two separate components (increasing the dimension of the model by one component)
or merging two existing components into a single component, commonly known as
the split/merge step of the algorithm.
To split a component, a vector of continuous random variables (u), which are
independent of the current model, are drawn and applied in an invertible determin-
istic function to propose a new model. The proposal is designed to be deterministic
in order that the reverse of the split move, the corresponding merge move, can be
obtained through the inverse transformation of the function.
The other dimension changing moves proposed in RJMCMC are the addition of
a new component or the removal of an empty component which is currently in the
model. These proposals are referred to as births and deaths, respectively.
The Normal mixture model is computed iteratively as follows;
1. Given (λ,µ,σ), update allocation vector z,
2. Given (k, µ,σ), update estimates of the weights λ,
3. Given (k, λ), update Normal component parameters µj and σ2j , j ∈ {1, · · · , k}
4. Update hyperparameters as required,
5. Propose a split or merge for the components in the current model, and accept
with probability
In this scheme, steps 1-4 do not involve changes in dimension and are updated
using standard Gibbs moves outlined below, with the following conjugate priors;
169
p(µj) ∼ N(ξ, κ−1) (A-9)
p(σ−2j ) ∼ Gamma(δ, β) (A-10)
p(β) ∼ Gamma(g, h) (A-11)
p(λ) ∼ Dirichlet(α1, α2, · · · , αk) (A-12)
p(k) ∼ Uniform(kmin, kmax) (A-13)
where ξ, κ, δ, α, η, g, h, kmin and kmax are fixed hyperparameters. Note, in this
case all µj follow a universal prior.
We can construct these prior distributions to be weakly informative and use their
conjugacy to obtain proper posterior distributions for the unknown mixture model
parameters.
Step 5 requires the reversible jump mechanism of the algorithm. The choice
between a split or merge move is made with equal probability, with the only exception
being at the extremes of the allowable range for k (if k = kmin, the probability of
proposing a split move is 1, and if k = kmax, the probability of a split move is 0).
To propose a split move, Richardson and Green (1997) generate a 3 dimensional
random vector u using beta distributions;
u1 ∼ beta(2, 2), u2 ∼ beta(2, 2), u3 ∼ beta(1, 1)
and randomly choose one of the current k components to be split. For example, we
will assume component j is chosen for a split move. The proposed transformation
of variables ((θ(n), uθn) = Tm→n(θ(m), uθm)) is;
λj1 = u1λj and λj2 = (1− u1)λj
µj1 = µj − u2σj
√λj2
λj1
and µj2 = µj + u2σj
√λj1
λj2
σ2j1 = u3
(1− u2
2
)σ2
j
(λj
λj1
)and σ2
j2 = (1− u3)(1− u2
2
)σ2
j
(λj
λj2
)
170
where dim(n) > dim(m). The allocation vector zi, where zij = 1, is redrawn so that
the data which is currently allocated to component j is now reallocated to either
component j1 or j2.
The split proposal is the reverse of the merge proposal for components j1 and
j2. To propose the merger of 2 components, the parameters of the mixture model
for these components are reassigned by matching the 0th, 1st and 2nd moments for
the distribution;
λj = λj1 + λj2
λjµj = λj1µj1 + λj2µj2
λj
(µ2
j + σ2j
)= λj1
(µ2
j1 + σ2j1
)+ λj2
(µ2
j2 + σ2j2
)
The allocation vector zi, where zij1 = 1 or zij2 = 1 is amalgamated so that the
allocation becomes zij = 1.
In the case of a split move, the probability of acceptance for the move from model
Mm to Mn is
min
(π(n, θ(n))
π(m, θ(m))
πnm
πmn
g(uθn)
g(uθm)
∣∣∣∣∂Tm→n(θ(m), uθm)
∂(θ(m), uθm)
∣∣∣∣, 1)
(A-14)
involving the Jacobian of the transform Tm→n, the probability πmn of choosing a
jump to Mm while in Mn, and g, the density of u. The acceptance probability for
the merge move is the inverse ratio of a split.
171
A.3 Penalised Prior (Ch.6)
In this section we outline the rejection sampling algorithm for λ proposed by Gustafson
and Walker (2003) for the penalised prior approach.
Prior
p(λ) ∝ Dirichlet(1, . . . , 1) exp
(T∑
t=2
‖λt,j − λt−1,j‖2
φ
)(A-15)
Posterior
p(λ|φ,m) ∝k∏
j=1
{ T∏t=1
f(λjt|mjt + 1)I(λjt)}
exp
(T∑
t=2
‖λt,j − λt−1,j‖2
φ
)(A-16)
Gustafson and Walker (2003) suggest sampling λjt/s from a Beta(mjt+1,mkt+1)
and accepting when U ≤ g1(λjt)/g2(λjt) (U ∼ U(0, 1)), where