-
Received: Added at production Revised: Added at production
Accepted: Added at production
DOI: xxx/xxxx
RESEARCH ARTICLE
Bayesian inference with subset simulation in varying
dimensionsapplied to the Karhunen–Loève expansion
Felipe Uribe*1 | Iason Papaioannou1 | Jonas Latz2 | Wolfgang
Betz1 | Elisabeth Ullmann2 | DanielStraub1
1Engineering Risk Analysis Group,Technische Universität
München,Arcisstraße 21, 80333 München, Germany
2Chair of Numerical Analysis, TechnischeUniversität München,
Boltzmannstraße 3,85748 Garching b.M., Germany
Correspondence*Corresponding author.
Email:[email protected]
Summary
Uncertainties associated with spatially varying parameters are
modeled through ran-dom fields discretized into a finite number of
random variables. Standard discretiza-tion methods, such as the
Karhunen–Loève expansion, use series representationsfor which the
truncation order is specified a priori. However, when data is used
toupdate random fields through Bayesian inference, a different
truncation order mightbe necessary to adequately represent the
posterior random field. This is an infer-ence problem that not only
requires the determination of the often high-dimensionalset of
coefficients, but also their dimension. In this paper, we develop a
sequentialalgorithm to handle such inference settings and propose a
penalizing prior distribu-tion for the dimension parameter. The
method is a variable-dimensional extension ofBUS (Bayesian Updating
with Structural reliability methods), combined with sub-set
simulation (SuS). The key idea is to replace the standard Markov
Chain MonteCarlo (MCMC) algorithm within SuS by a trans-dimensional
MCMC sampler thatis able to populate the discrete-continuous
parameter space. To address this task, weconsider two types of MCMC
algorithms that operate in a fixed-dimensional satu-rated space.
The performance of the proposed method with both MCMC variants
isassessed numerically for two examples: a 1D cantilever beam with
spatially varyingflexibility and a 2D groundwater flow problem with
uncertain permeability field.
KEYWORDS:uncertainty quantification, inverse problems, Bayesian
model choice, trans-dimensional MCMC, randomfields, Karhunen–Loève
expansion.
1 INTRODUCTION
Numerical approximations to partial differential equations
(PDEs) used in engineering and science require the specification
ofinput parameters that are typically unknown and/or intrinsically
random. Uncertainties in the values of these quantities can
bereduced by incorporating observations or measurements of the
physical system into the numerical model. This represents aninverse
problem in which the objective is to identify the model parameters
that are compatible with the available information.The complexity
of inverse problems increase in model choice situations, whereby a
single model needs to be selected froma predefined collection of
plausible models. Each model in this set can have different
parameters or can represent different
-
2 F. Uribe ET AL
mathematical assumptions. The uncertainty in the model and its
parameters can be treated in a unified manner within the
Bayesianinference framework1,2,3.In the Bayesian approach to solve
inverse problems, all uncertain parameters are modeled by random
variables. The idea is
to update the (prior) probability distribution of the parameters
by including information about the PDE model and observeddata
(likelihood). Solving the Bayesian inverse problem then amounts to
estimating or characterizing this updated (posterior)distribution.
In case of model choice, inference involves the estimation of a
conditional posterior distribution induced by themodel class and
determining the plausibility of the respective model class. The
prior distribution is then hierarchically structuredinto one prior
for the parameters conditioned on the model and a second prior for
the model itself4,2.
Closed-form expressions of the model and parameter posterior are
oftentimes cumbersome to obtain. As a result, approximatesolutions
are computed in practice. One common approach in model choice
situations is to use methods that are based on thelikelihood
function of individual models, these include, Akaike and Bayesian
information (see, e.g.,5,6). Another flexible way toapproach the
model choice problem is via sampling methods that are based on
Markov Chain Monte Carlo (MCMC). In this case,the characterization
of the posterior requires the exploration of a discrete-continuous
space. This task can be performed by twomain strategies: (i) fixing
the model and solving the inverse problem for each case (see,
e.g.,7); an approach referred to as within-model simulation
(several methods are described in the seminal work8); and (ii)
performing simultaneous inference on both, modeland parameters,
using MCMC algorithms that explore the parameter space by moving
between different models; this approachis called across-model
simulation, where the standard algorithm is the reversible jump
MCMC algorithm9,10 (other MCMCapproaches for across-model
simulation include11,12,13,14). Most of the disadvantages of
standard MCMC, such as convergencerate deterioration with
increasing dimensions and burn-in/thinning periods requirements,
are also present in the MCMC samplersused in across-model
simulation15. In standard inference settings, specialized
sequential algorithms that gradually approach theposterior
distribution alleviate several of these issues16,17. Some of these
algorithms have been adapted to perform across-modelsimulation,
e.g., the sequential importance sampling with reversible jump
MCMC18 and the population-based reversible jumpMCMC19.The problem
of inferring both, models and parameters, also applies to cases
involving a single mathematical model that has
parameters with variable dimension. Common examples include,
mixture models with an unknown number of components20,polynomial
regression where the degree of the polynomial is variable17, or
general functional representations that use seriesexpansions for
which the number of terms is unknown. The latter is of particular
relevance in the context of learning spatiallyvarying parameters
represented via random fields21. Random fields increase the
complexity of the inverse problem since theposterior distribution
is defined over an infinite-dimensional space. Series
representations are typically applied in order to projectthe random
field to a finite-dimensional space. For instance, the
Karhunen–Loève (KL)22,23 expansion discretizes the field usingthe
eigenvalues and eigenfunctions of its autocovariance operator to
construct a series expansion with random coefficients24. It
iscommon practice to truncate the KL expansion after a finite
number of terms based on some variance-representation
criterion.This heuristic is generally valid in prior situations
when no information or observations about the field are available.
In theinversion case, the optimal number of terms in the series
expansion is unknown and it is controlled by the data25,26.
In this paper, we propose an efficient sequential methodology
that is able to perform inference in parameter spaces of
differentdimension. The method is an extension of the classical BUS
(Bayesian updating with structural reliability methods)
framework,which expresses a Bayesian inverse problem as an
equivalent rare event simulation task27. For an efficient and
sequential solutionof the inverse problem, BUS is combined with
subset simulation (SuS)28 (this approach is called BUS-SuS),
although otherrare event estimation methods can be employed (see,
e.g.,29). The main idea is to incorporate the discrete dimension
randomvariable into the parameter space defined by BUS, such that
the resulting sequence of intermediate distributions also dependon
the dimension. The associated intermediate densities become
trans-dimensional and the standard MCMC algorithms usedwithin
BUS-SuS are no longer valid. Therefore, we investigate a class of
trans-dimensional and dimension-independent MCMCalgorithms that
explore a so-called saturated or composite parameter space in an
alternating manner. In this space, the dimensionis fixed to a
maximum upper value, which is selected conservatively based on
prior information. Particularly, we discuss
aMetropolis-within-Gibbs algorithm and develop a step-wise sampler
as a simplified reversible jump MCMC in the saturated space.Since
this space is typically high-dimensional, the core of these
algorithms is the preconditioned Crank–Nicolson sampler30.The
efficiency and accuracy of the method is tested on engineering
models involving random field parameters represented withthe KL
expansion: one example for which reference solution of the
dimension posterior is available, and a second example thatrequires
the estimation of the posterior at some reference dimensions using
within-model simulation runs, to verify the dimensionposterior
estimated by our algorithm.
-
F. Uribe ET AL 3
We also address the specification of the model/dimension prior
by defining a discrete distribution that penalizes
increasingdimensionality. In model choice problems, imposing a
penalty to complicated models is necessary (see1,5,31 for a
discussion).A model with more parameters usually fits the data
better than a model with less parameters. However, the actual
modelingimprovement might be negligible or possible over-fitting of
the data can arise25,26. The proposed dimension prior is defined
toavoid such situations and we build it based on the geometry of
the parameter space and prior information about the random
fields.The organization of the paper is as follows: in section 2,
we present fundamental concepts of random field modeling and
the
KL representation; we also formulate the Bayesian inversion
problem in the fixed- and variable-dimensional settings. At theend
of the section, we propose a prior for the specification of the
dimension parameter. The major contribution of this work
isintroduced in section 3, where we explain the trans-dimensional
BUS algorithm. This methodology is based on trans-dimensionalMCMC
samplers, which are discussed in section 4. Next, the proposed
method is demonstrated by means of two numericalexperiments in
section 5. The paper finalizes with a discussion of results in
section 6 and a summary of the work in section 7.
2 MATHEMATICAL FORMULATION
2.1 Random fields and the Karhunen–Loève expansionLet (Ω,F ,P)
be a probability space and D ⊆ Rd an index set representing a
physical domain, and L2(Ω,P) the Hilbert spaceof second-order
random variables. A real-valued random field is a functionH(x, !) ∶
D × Ω → R, with arguments x ∈ D aspatial coordinate and ! ∈ Ω an
outcome of the sample space32,33. Intuitively, a random field can
be interpreted either as a singlerandom variable that takes values
in a function space, or as a collection of random variables indexed
in space.
Random fields are represented in terms of a finite set of random
variables using stochastic discretization algorithms.
Populardimensionality reduction techniques are based on finite
expansions of random variables and deterministic functions.
Theseinclude the Karhunen–Loève expansion22,23, which expresses a
random field as a linear combination of orthogonal functionschosen
as the eigenfunctions resulting from the spectral decomposition of
the covariance operator. Since all positive-definitefunctions have
an unique spectral representation (see Bochner’s Theorem32, section
3), one can define an orthonormal basis, uniqueand optimal in the
mean-squared sense, consisting of the eigenfunctions of the
covariance operator together with a sequence ofreal and
non-negative eigenvalues34, p.248. One can use such basis to
represent a second-order random field as
H(x, !) ≈ Ĥ(x; k,�(!)) ∶= �(x) +∞∑
i=11(i ≤ k)
√
�i�i(x) �i(!), (1)
where Ĥ(x; k,�(!)) is the approximated field, 1(⋅) denotes the
indicator function, k is the truncation order of the
expansion,�i(!) ∶ Ω → R is a set of mutually uncorrelated random
variables with mean zero and unit variance, �i ∈ [0,∞) are
theeigenvalues of the covariance operator, satisfying �i ≥ �i+1,
limi→∞ �i = 0,
∑∞i=1 �i < ∞, and �i(x) ∶ D → R are the
eigenfunctions of the covariance operator, with �i(x) ∈ L2(D).
For Gaussian random fields, the variables �i(!) are
independentstandard Gaussian. In the general case, the distribution
of �i(!) is cumbersome to estimate. The series expansion in (1)
followsfrom Mercer’s Theorem (details are provided in35) and it is
referred to as the Karhunen–Loève (KL) expansion. We remark thatthe
set of eigenpairs {�i, �i} is computed through the solution of an
homogeneous Fredholm integral equation of the secondkind24, which
can be solved using different approaches, such as projection
(collocation, Galerkin)36 or Nyström methods37.The KL expansion is
often employed to reduce the dimensionality and parameterize random
fields. Consider the square-
integrable random vector �(!) ∈ Θk ⊆ Rk resulting from
truncating the KL series expansion (1) at the k-th term.
Thistruncation yields an approximate field, which is optimal in the
mean-squared-error sense as compared to any other
spectralprojection algorithm24. Since the eigenpairs associated to
the covariance operator are deterministic quantities, the
parameter�(!) characterizes the randomness of the field. Hence, the
KL construction only depends on the vector of random coefficients
�and the truncation order k.
2.2 Bayesian inverse problems in fixed dimensionWe begin by
considering the forward problem y = G(H(x, !)), where G ∶ L2(D)→
L2(D) is a solution operator expressingthe relationship between the
input parameters and the model response. We are interested in
models where the operator G impliesthe solution of a PDE that has
random fields as parameters. G operates on the function space L2(D)
since both, input and output,are random field realizations of two
different quantities on the physical domain D. The dimensionality
of the forward problem
-
4 F. Uribe ET AL
can be reduced using the parameterized random field in the
expansion (1), such that the input parameter space is now given
byΘk. Since k is fixed, we write the approximated field as Ĥ(x;�)
and the parameter space as Θ ⊆ Rk.
In inverse problems, the aim is to infer the parameters � ∈ Θ
given noisy observations of the system response ỹ ∈ ∶= Rm,with m
denoting the number of observations and the data space. Assuming an
additive observation error, the objective is:
find � ∈ Θ ∶ ỹ = (
Ĥ(x;�))
+ �, (2)
where = G◦ ∶ Θ→ is the forward response operator, defined as the
composition of the solution operator G ∶ Θ→ L2(D)with an
observation operator ∶ L2(D)→ that maps the forward solution to the
data space; and � ∈ Rm is the observationnoise which is typically
assumed to be Gaussian distributed with mean zero and non-singular
covariance matrix �obs ∈ Rm×m.
The inverse problem (2) is generally ill-posed. Bayesian
statistical methods offer a framework that integrates the
observationswith prior information, providing a mechanism of
regularization. In Bayesian inverse problems, the components of the
parametervector � are modeled as random variables and are assumed
to have an initial prior density �pr (�). The likelihood
functionL(
�; ỹ)
= �like(
ỹ | �)
is a density on the data space and provides a link between the
model and data. After including observations,the updated belief
about � is represented by the posterior density �pos(� | ỹ).
Through Bayes’ Theorem this conditional density is
�pos(
� | ỹ)
= 1Zỹ
�pr (�) L(
�; ỹ)
∝ exp(
−12‖
‖
‖
�−1∕2pr(
� − �pr)
‖
‖
‖
2
2+ ln L
(
�; ỹ)
)
, (3)
where Zỹ = ∫Θ �pr(�) L(
�; ỹ)
d� is the normalizing constant of �pos(
� | ỹ)
, called the model evidence.
Remark 1. Since we employ the KL expansion to represent random
fields, the prior distribution is Gaussian � ∼ (�pr ,�pr).This is
reflected in the right-hand side of (3). For the KL coefficients,
the prior mean and covariance are given by �pr = 0 and�pr = Ik (Ik
∈ Rk×k denotes the identity matrix). The information about the
second-order properties of the random field entersdirectly in the
definition of the log-likelihood function via the forward
operator.
2.3 Bayesian inverse problems in varying dimensionConsider a
more general inference setting for which the set of observed data
ỹ is associated not only with one, but with a finitecollection of
plausible models = {1,… ,k,… ,kmax}, where k ∈ is a model indicator
index, and kmax < ∞ is aprescribed limit on the collection. The
resulting discrete-continuous parameter space can be written as =
∪k∈
(
{k} × Θk)
.Observe that there exist different uncertain parameter vectors
�k ∈ Θk ⊆ Rk for each particular model k, and thus, the goal is
toextract information from the data to infer jointly the pairs
(
k,�k)
∈ . For the sake of simplicity in notation, we shall
henceforthuse the model indicator index k to denote the modelk.
Let �pr(�k | k) be a first-level prior density imposed on the
parameter �k given the model k, and �pr(k) a second-level
discreteprior mass specified over the models k (we use the notation
� to indicate probability mass functions). The joint posterior
densityover both, model and parameters, is computed based on Bayes’
Theorem as
�pos(k,�k | ỹ) =1Z ỹ
�pr(k)�pr(�k | k)L(
k,�k; ỹ)
∝ �pr(k) exp(
−12‖
‖
‖
�−1∕2pr,k(
�k − �pr,k)
‖
‖
‖
2
2+ ln L
(
k,�k; ỹ)
)
, (4)
wherein the prior parameters depend on k, and the evidence is
given by the law of total probability:
Z ỹ =∑
k′∈�pr(k′)Zỹ(k′) =
∑
k′∈�pr(k′)∫
Θk′
�pr(�k′ | k′)L(
k′,�k′ ; ỹ)
d�k′ (5)
and Zỹ(k) is the evidence of the individual model k. The
posterior density of the models is obtained by integrating out
theparameters in (4) as
�pos(k | ỹ) =�pr(k) ∫Θk �pr(�k | k)L
(
k,�k; ỹ)
d�k∑
k′∈ �pr(k′) ∫Θk′ �pr(�k′ | k′)L
(
k′,�k′ ; ỹ)
d�k′= �pr(k)
Zỹ(k)
Z ỹ. (6)
The model posterior in (6) can be used to perform (i) model
choice or selection, which requires the computation of themaximum a
posteriori probability (MAP) estimator, kMAP = argmaxk∈ �pos(k |
ỹ), or (ii) model mixing or averaging whichrequires the
consideration of the whole collection of parameters weighted by
�pos(k | ỹ). Model choice is used as indicator ofmodel complexity,
i.e., the model that provides the best alignment with the observed
data should be preferred over unnecessarilycomplicated ones. The
model mixing solution consists of the model posterior predictive
distribution. In this case, all the collection
-
F. Uribe ET AL 5
of models is used for future decisions and avoids the
underestimation of uncertainty resulting from choosing only a
single model.Since this process leads to a higher computational
cost, only models that are sufficiently likely compared to the MAP
estimatormay be considered in the analysis. Occam’s window and
Bayes factors are used to perform such a model reduction (2,
p.368).
The formulations in (4) and (6) are also applicable to Bayesian
non-parametric settings where in fact there exists only a
singlemathematical model, but one with variable-dimension
parameter38. We are interested in the latter since this corresponds
toBayesian inverse problems involving random fields represented by
a series expansion whose number of terms is not fixed. Inthe KL
expansion, the set of models is defined by the model indicator
indices = {1, 2,… , k,… kmax}, where each elementdefines a
truncation order. This truncation specifies the dimensionality of
the standard Gaussian random coefficients of the KLexpansion, thus,
each particular model/dimension k involves a vector of uncertain
parameters �k ∈ Θk. The aim is to performsimultaneous inference on
the discrete random variable k (dimension), and the associated
random vector �k (coefficients) of theKL expansion. In the
following, we often use the terms model and dimension
interchangeably.
2.4 Selection of the prior distribution for the KL truncationThe
model evidence associated to the dimensions of the KL
discretization does not reveal the classical Bayesian
penalizationbehavior appearing for example in regression models,
where constantly increasing the polynomial order, eventually
reduces themodel evidence values (revealing potential
over-fitting). In the KL expansion, the model evidence keeps
increasing as one addsmore terms26. This is directly related to the
representation of the posterior covariance, since its approximation
improves as k→∞.Nevertheless, it has been shown in26 that the
information gained by continuously increasing KL terms becomes
negligible oncean optimal truncation is achieved. The model
evidence keeps increasing, but very slowly after such optimal
number of terms isreached. This behavior motivates the definition
of a prior for the dimension parameter k that penalizes increasing
dimensionality.We employ a truncated geometric distribution (more
details concerning this choice are given in the appendix)
�pr(k) =(1 − p)k−1p1 − (1 − p)kmax
k = 1,… , kmax, (7)
where kmax is the upper truncation level, and the success
probability p ∈ (0, 1) marks the decay rate of the probability
mass. Thisparameter allows us to control the shape of the
distribution.In practice, the parameter space is bounded and
typically some prior knowledge about these bounds is available. We
select
the parameter p by regulating the behavior of the distribution
at the tails, such that P[
k ≤ �u]
= �, where �u is a prescribedthreshold which is associated with
probability � that k is smaller than �u. Based on our experiments,
�u is chosen as the numberof terms in the KL expansion that retains
50% of the variability in the prior random field, and we assign to
that event a probabilityof � = 0.10. Furthermore, the truncation
value kmax is selected as the number of terms in the expansion that
retains 99% ofthe variability. By doing so, approximately 90% of
the probability mass is concentrated on truncation orders higher
than thoseyielding the 50% variability and smaller that those
producing the 99% variability. We found that this heuristic
produces a decay pthat does not excessively penalize high-order KL
terms.We remark that other prior models for the dimension parameter
have been derived in the context of random fields, e.g., an
exponential prior with fixed decay rate for the truncation order
in the KL expansion30, and a penalized complexity prior
fordifferent values of a re-parameterized Matérn kernel39.
3 BAYESIAN INFERENCEWITH SUBSET SIMULATION IN SPACES OF
VARYINGDIMENSION
The characterization of the posterior distribution using
standard MCMC algorithms can be inefficient not only because
severaliterations are required to compute accurate statistics, but
also tuning and post-processing steps need to be implemented
(e.g.,burn-in and lag periods). The task is even more complicated
when the posterior distribution is high- and
trans-dimensional.Therefore, a common approach is to embed standard
MCMC samplers into algorithms that start from the prior and
sequentiallyapproach the posterior distribution40. The idea is to
explore the posterior on-the-fly by constructing a set of
intermediate measuresthat converge to the full posterior. An
approach that belongs to this class of algorithms is BUS (Bayesian
Updating with Structuralreliability methods), which reformulates
the Bayesian inverse problem as a classical reliability analysis
(rare event estimation)problem. This construction allows one to
employ efficient reliability estimation algorithms to sample from
the posterior. In thissection, we discuss the combination of BUS
with subset simulation and its extension to variable-dimensional
inference problems.
-
6 F. Uribe ET AL
3.1 The BUS formulationWhen the computation of the model
evidence is intractable, the posterior density is only known up to
its scaling constant, that is,�pos
(
� | ỹ)
∝ �pr(�)L(
�; ỹ)
= �(�). The posterior density can be characterized by drawing
samples from this unnormalizedtarget density. Particularly, the
rejection sampling algorithm generates samples from �(�) using a
proposal density q(�). Theproposal is selected such that it
dominates the target function. This means that q(�) must have equal
or heavier tails than those of�(�). Therefore, the proposal
satisfies the relation
supΘ
(
�(�)q(�)
)
≤ c̄ < ∞ for some covering constant c̄ ∈ R>1 (8)
and supp (�(�)) ⊆ supp (q(�)). Thereafter, samples drawn from
q(�) are rejected strategically to make the resulting
acceptedsamples distributed according to �(�). A simple choice for
the proposal density is the prior distribution �pr(�). In this
case, theacceptance probability � in rejection sampling16
becomes
� =�(�)c̄ ⋅ q(�)
=�pr (�) L
(
�; ỹ)
c̄ ⋅ �pr(�)= c ⋅ L
(
�; ỹ)
, (9)
where c = 1∕c̄ ∈ R>0 and the covering constant is selected
such that c̄ ≥ Lmax = max(L(
�; ỹ)
). Rejection sampling then amountsto (i) drawing a standard
uniform random number � ∼ Unif[0, 1], (ii) sampling a candidate
from the prior � ∼ �pr(�), and (iii)accepting the candidate if � ≤
� = c ⋅ L
(
�; ỹ)
. This particular acceptance-rejection mechanism allows us to
generate the space
= {(�, �) ∈ � ∶ ℎ(�, �) ≤ 0} , where ℎ(�, �) = � − c ⋅ L(
�; ỹ)
(10)
and � = [Θ,Υ] is an augmented parameter space (� ∈ Θ ⊆ Rk and �
∈ Υ ⊆ R[0,1]).In the context of reliability analysis and rare event
simulation, the space defines a failure domain with limit-state
function
(LSF) ℎ(�, �). Samples drawn from the prior that fall into are
distributed according to the posterior. This connection is
thefoundation of the BUS approach, since one can employ existing
methods from rare event simulation to perform Bayesian
inference.Indeed, the previous rejection sampling algorithm
corresponds to applying standard Monte Carlo simulation for the
solution of arare event estimation problem defined by the LSF ℎ(�,
�) over the space �.The main objective in reliability analysis is
to estimate of the probability of failure. When employing the BUS
framework,
this value is associated to the probability that the samples
belong to the domain , i.e., p = P[] = P[ℎ(�, �) ≤ 0].
Thisprobability, which is obtained as a by-product of BUS, is used
to estimate the model evidence as27
Zỹ = c−1 ⋅ p = c̄ ⋅ p . (11)
Note that the application of BUS requires the knowledge of the
constant c = 1∕c̄. From (9), it is seen that the covering
constantis optimally chosen as the maximum of the likelihood
function c̄ = Lmax. If c̄ < Lmax, the resulting samples will be
distributedaccording to a truncated posterior distribution41.
Conversely, if c̄ > Lmax, the efficiency of BUS decreases
because the value ofp will be small and more samples are required
for its estimation. Since in many cases Lmax is not known in
advance and itscomputation poses an additional cost, we employ the
strategy introduced in41 for which the constant c is adaptively
computed ateach step of the simulation. We discuss this method in
the variable-dimensional context in subsection 3.3.
3.2 BUS with subset simulation in fixed dimensionsIn order to
efficiently compute samples from the posterior distribution, BUS is
often combined with subset simulation (SuS)28.The combination of
BUS with SuS (called BUS-SuS), performs Bayesian inversion
sequentially. This is because SuS transformsthe task of estimating
the rare event {ℎ(�, �) ≤ 0} into a sequence of problems involving
more frequent events.In BUS-SuS, the parameter space � is divided
into a decreasing sequence of nested subsets or intermediate
levels, starting
from the whole space and narrowing down to the target posterior
space, i.e., � = 0 ⊃ 1 ⊃ ⋯ ⊃ Nlv = , such that =
⋂Nlvj=0j , whereNlv is the number of intermediate levels. Based
on the general product rule of probability, the probability
that the prior samples fall into the posterior space, p is given
by
p = P[
∩Nlvj=0j]
= P[
Nlv | ∩Nlv−1j=0 j
]
P[
∩Nlv−1j=0 j]
=Nlv∏
j=1P[
j | j−1]
, (12)
-
F. Uribe ET AL 7
where P[
j | j−1]
represents the conditional probability at level (j − 1). Each
intermediate level is defined as the set j ={(�, �) ∈ � ∶ ℎ(�, �) ≤
�j}, where∞ = �0 > �1 > ⋯ > �j > ⋯ > �Nlv = 0, is a
decreasing sequence of threshold levels. Inpractice, it is not
possible to make an optimal a priori selection of the sequence
{�j}
Nlvj=0. Therefore, they are adaptively selected
as the p0-percentile of the LSF values of the samples simulated
at intermediate level j−1 28. This implies fixing the
conditionalprobabilities to a common value p0 = P
[
j ∣ j−1]
(with p0 ∈ [0.1, 0.3]).At the first level, 0 samples are
generated using standard Monte Carlo simulation. Thereafter,
BUS-SuS employs a modified
MCMC algorithm to draw samples from each intermediate
conditional density �(�, � | j). The Markov chains are
initializedfromNs = N ⋅ p0 samples conditional on j−1 for which
ℎ(�, �) ≤ �j . The process is repeated until the target posterior
domainis reached (see, e.g.,42). At the last level, the probability
p in (12) is estimated as p̂ = p
Nlv−10 ⋅ p̂Nlv , where p̂Nlv represents the
last conditional probability which is estimated by Monte Carlo
as the ratio of the number of samples that lie in and the numberof
samples per levelN . The p̂Nlv ⋅N samples that lie in are used as
seeds to generate the final batch ofN samples conditionalon . The
resulting samples are uniformly weighted but correlated samples of
the posterior distribution, and the probabilityestimate p̂ is used
to compute the model evidence via (11).
Remark 2. It is common practice to solve reliability problems in
the standard Gaussian space. Due to the BUS formulation, thisalso
translates to the Bayesian inversion. Hence, a new standard
Gaussian parameter vector # = [�, ��]T ∈ Rk+1 is created
bycombining the KL coefficients � and the transformed auxiliary
uniform variable �� = �−1(�), where �(⋅) denotes the
standardGaussian cumulative distribution. Furthermore, in order to
guarantee a smooth transition between the intermediate levels, as
wellas for numerical stability, the LSF (10) is expressed in terms
of the log-likelihood. Applying the natural logarithm to each
termof ℎ(�, �) in (10) yields43
ℎln(#) = ln(�(��)) − ln(c ⋅ L(
�; ỹ)
) = ln(�(��)) + ln(c̄) − ln L(
�; ỹ)
. (13)
3.3 BUS with subset simulation in varying dimensionsWe now
extend the concepts of subsections 3.1 and 3.2 to the
variable-dimensional case. The basic idea is to re-augmentthe
parameter space by including the discrete dimension variable. This
requires minor modifications of the target LSF, andthe application
of trans-dimensional MCMC algorithms to sample the intermediate
conditional densities. We denote thistrans-dimensional BUS-SuS
methodology as tBUS-SuS.Consider the general Bayesian inverse
problem (4). The joint posterior distribution is characterized by a
target function in a
discrete-continuous space, �(k,�) = �pr(k)�pr(�k | k)L(
k,�k; ỹ)
∝ �pos(
k,�k | ỹ)
. We choose the proposal distribution to beequal to the full
prior q(k,�k) = �pr(k)�pr(�k | k). The acceptance probability in
rejection sampling becomes
� =�(k,�)r ⋅ q(k,�)
=�pr(k)�pr(�k | k)L
(
k,�k; ỹ)
r ⋅ �pr(k)�pr(�k | k)= r ⋅ L
(
k,�k; ỹ)
, (14)
where r = 1∕r ∈ R>0. By analogy with the fixed-dimensional
setting, the covering constant r can be optimally chosen asLmax,all
= max(L
(
k,�k; ỹ)
), i.e., as the maximum of the likelihood function across
different dimensions. Thereafter, samplesdrawn from the priors k ∼
�pr(⋅) and �k ∼ �pr(⋅ | k) are accepted if � ≤ � = r ⋅ L
(
k,�k; ỹ)
, otherwise they are rejected. In thiscase, the -space and the
LSF in (10) are re-defined as
={
(k,�k, �) ∶ ℎ(k,�k, �) ≤ 0}
, where ℎ(k,�k, �) = � − r ⋅ L(
k,�k; ỹ)
(15)
and� = [K,Θk,Υ] is the re-augmented discrete-continuous
parameter space (k ∈ K ⊆ Z>0, �k ∈ Θk ⊆ Rk and � ∈ Υ ⊆
R[0,1]).As it will be seen in section 4, we employ
trans-dimensional MCMC algorithms that work in a saturated space
for which� ∈ Θ ⊆ Rkmax , and thus we write � = [K,Θ,Υ].
The BUS-SuS algorithm can be extended analogously to solve the
Bayesian inverse problem (15). Each intermediate domainis now
defined as the set j = {(k,�, �) ∈ � ∶ ℎ(k,�, �) ≤ �j}, with the
threshold level sequence {�j}
Nlvj=0 adaptively selected as
in the fixed-dimensional case. Under the LSF (15), the standard
MCMC algorithms used within BUS-SuS are no longer suitablefor
sampling the intermediate densities �(k,�, � | j) and
trans-dimensional MCMC methods are required. We modify
thesealgorithms to sample the intermediate densities conditional on
events defined by the sequence of levels {�j}; this will be shownin
section 4. Moreover, instead of the LSF in (13), we employ its
variable dimensional extension
ℎln(k,#) = ln(�(��)) + r − ln L(k,�; ỹ) , (16)
-
8 F. Uribe ET AL
where # = [�, ��] ∈ Rkmax+1, and r = ln(r) is optimally the
maximum of the log-likelihood function across the
differentdimensions. We summarize the tBUS-SuS method in Algorithm
1.
Algorithm 1 tBUS-SuS.1: Input: number of samples per level N ,
conditional probability p0, covering constant r , maximum dimension
kmax, log-
likelihood function ln L(
⋅, ⋅; ỹ)
, dimension prior �pr(k)2: DrawN samples from the dimension
prior, k0 ∼ �pr(⋅)3: DrawN samples from the standard Gaussian, #0 =
[�0, ��,0] ∼ (0, Ikmax+1)4: Compute the initial log-likelihood
function values, Leval ← ln L
(
k0,�0; ỹ)
5: Set j ← 0 and �0 ←∞6: while �j > 0 do7: Increase
intermediate level counter, j ← j + 18: Compute LSF values, ℎeval ←
ln(Φ(��,j−1)) + r − Leval9: Sort ℎeval in ascending order and
create a vector idx to store the indices of this sorting10: Create
ksort ,#sort as the dimension and parameter samples kj−1,#j−1
sorted according to idx11: Set the intermediate threshold level �j
as the p0-percentile of the values in ℎeval12: Compute the number
of samples in the j-th intermediate level,Nj ←
∑Ni=1(ℎ
(i)eval ≤ max(0, �j))
13: if �j > 0 then14: pj−1 ← p015: else16: �j ← 0 and pj−1 ←
Nj∕N17: end if18: Select seeds for the MCMC step, (kseed,#seed)←
{k
(i)sort ,#
(i)sort}
Nji=1
19: Generate next level values {k(i)j ,#(i)j , L
(i)eval}
Ni=1 from the seeds (kseed,#seed) and intermediate level �j
using a trans-
dimensional MCMC algorithm. Here, each seed is used to construct
a chain with Nc = floor(N∕Ns) states, whereNs = Nj is the number of
seeds
20: end while21: Set the posterior samples, kpos ← kj and #pos ←
#j22: for k← 1 to kmax do23: Find the number of posterior samples
that lie in dimension k,Nk ←
∑Ni=1(k
(i)pos = k)
24: Estimate the model posterior, �̂(k)
pos ← Nk∕N25: end for26: Output: sequence of posterior dimension
samples {k(i)pos}Ni=1 and parameter samples {#
(i)pos}
Ni=1, and model posterior �̂pos.
Since finding the constant r = ln(r) poses an additional
computational cost, it is convenient to introduce a tBUS-SuS
algorithmfor which the covering constant r is not required as an
input. We employ the adaptive BUS-SuS methodology proposed in41
forthe trans-dimensional setting. In this case, the covering
constant is updated at each level, leading to a set of values
{rj}
Nlvj=0. In
order to guarantee the nestedness of the intermediate domains,
the threshold levels �j are corrected after updating the value rj
.First note that from (16), a j-th intermediate domain is defined
as the set
j = {(k,#) ∶ ln(�(��)) + r j − ln L(k,�; ỹ) ≤ �j} (17)where r j
= ln(rj) is the maximum of the log-likelihood function observed at
the j-th sampling level. The idea of41 is that theevent j
associated with the log-scaling constant r j and the threshold �j ,
can be equivalently expressed by a scaling r ′j and amodified
threshold �′j selected as �
′j = �j − r j + r ′j . This allows one to sequentially update
the covering constant r j to a new
value r ′j , without compromising the distribution of the
samples: adjusting the threshold value from �j to �′j (after
updating thescaling from r j for r ′j) still defines the same
intermediate domain, as j =
{
(k,#) ∶ ln(�(��)) ≤ �′j − r ′j + ln L(k,�; ỹ)}
isequivalent to (17). In principle, �′j corrects the level �j
using the residual of maximum log-likelihood values observed at
differentlevels. At the last simulation level, when r ′j = r j ,
the covering reaches a value that is close or equal to the actual
maximum
-
F. Uribe ET AL 9
log-likelihood, i.e., rNlv ≤ ln(Lmax,all). In the limitN → ∞,
the value rNlv converges to ln(Lmax,all). Despite the fact that
rNlv islikely to be smaller than ln(Lmax,all), the samples
generated by the algorithm follow the posterior distribution, as
shown in41. Theadaptive tBUS-SuS method is described in Algorithm
2.
Algorithm 2 Adaptive tBUS-SuS.1: Input: number of samples per
level N , conditional probability p0, maximum dimension kmax,
log-likelihood functionln L
(
⋅, ⋅; ỹ)
, dimension prior �pr(k)2: Repeat Lines 2-4 of Algorithm 13:
Compute the initial maximum log-likelihood, r 0 ← max(Leval)4: Set
j ← 0 and �0 ←∞5: while �j > 0 do6: Increase intermediate level
counter, j ← j + 17: Compute LSF values, ℎeval ← ln(Φ(��,j−1)) + r
j − Leval8: Repeat Lines 9-19 of Algorithm 19: Compute a new value
of the maximum log-likelihood, r ′j ← max
(r j , {L(i)eval}Ni=1)
10: Compute the modified intermediate threshold level, �j ← �j −
r j + r ′j , and update r j ← r ′j11: end while12: Repeat Lines
21-25 and Output of Algorithm 1
4 MCMC ALGORITHMS IN SPACES OF VARYING DIMENSION
In variable-dimensional problems, MCMC methods must explore a
discrete-continuous parameter space. In this section, wepresent an
overview of such algorithms and discuss two special MCMC samplers
that are used in combination with the proposedtBUS-SuS algorithm.
We note that the methods discussed here are applicable to general
model updating problems whenever thevariables of the different
models have a nested structure (see, e.g.,13,14).
4.1 General remarksIn across-model simulation, the standard
algorithm to sample from the joint posterior in (4) is the
reversible jump MCMC(RJMCMC) method9. The idea is to generate a
Markov chain that is able to jump between models with parameter
spaces ofdifferent dimension. If the current and proposed states
have the same dimension, the proposal move explores different
locationswithin the same parameter space. In this case, the
so-called detailed balance condition is guaranteed by a standard
MCMCsampler9. If the current and proposed dimensions are different,
the detailed balance holds by defining a proposal move
thatsatisfies a dimension matching condition. This is achieved by
constructing a one-to-one deterministic transformation
(jumpingfunction) ensuring that the image and the domain of the
transformation have the same dimension. The acceptance probability
inRJMCMC resembles the one of the classical Metropolis–Hastings
algorithm, where the proposal distribution is decomposed intoa
discrete density for the dimension, a continuous density for the
parameters, and the Jacobian of the jumping transformation isalso
taken into account (see38 for further details). The RJMCMC can
suffer from poor sampling performance associated to thedefinition
of the jumping function and the proposal distribution. The
potential inefficiency of the method has motivated severaltuning
step procedures (see, e.g.,44,45).
Another class of algorithms are the saturated space
approaches11,13,14,44 (also referred to as product or composite
space) . Themain characteristic is that the parameter space is not
particularized to a given dimension k, instead the parameters lie
in a spacewhose dimension contains all dimensions of interest, say
kmax. The joint posterior in the saturated space is14
�pos(k,� | ỹ) =1Z ỹ
�pr(k)�pr(�k | k)�pr(�∼k | k,�k)L(
k,�k; ỹ)
, (18)
-
10 F. Uribe ET AL
where � = [�k,�∼k] and the additional component is the so-called
linking density or pseudo-prior �pr(�∼k | k,�k), where the�∼k
denotes the parameters that are not used by the model k. This
formulation allows us to employ standard MCMC proceduresto
variable-dimensional problems. We now motivate these techniques
from the viewpoint of nested models.
4.2 Nested modelsFor problems involving nested models, the
dimension change is related to the addition or deletion of a
component in the parametervector, i.e., exclusion of a component is
equivalent to setting a parameter to zero. This is the case of the
KL expansion, where thevariable k has the effect of switching on
and off coefficients in the series. Under the KL expansion (1), the
likelihood in (4) isindependent of �i when i > k, and the prior
of parameter vector � in the saturated space becomes independent of
k. Therefore,the (saturated) joint posterior in (18) can be written
as
�pos(k,� | ỹ) ∝ �pr(k)�pr(�)L(
k,�; ỹ)
, (19)
with �pr(�) denoting the prior of the parameters in the
saturated space Θ ⊆ Rkmax . In the context of tBUS-SuS, the
posteriordistribution (19) can be re-written by conditioning on the
region in (15) and marginalizing over the auxiliary uniform
randomvariable �,
�pos(k,� | ỹ) ∝ �pr(k)�pr(�)
1
∫0
1(k,�, �)d� (20)
where 1 denotes the indicator function, which is equal to one,
if (k,�, �) ∈ , and zero otherwise. Due to the sequentialstructure
of tBUS-SuS, we require MCMC algorithms that sample conditional
densities on each intermediate domain j ,i.e., �(k,�, � | j) =
�pr(k)�pr(�)1j (k,�, �). Particularly, we work in a saturated
standard Gaussian space in which the KLcoefficients and auxiliary
variable can be grouped to define the parameter vector # = [�, ��]
(cf. Remark 2). As a result, theintermediate densities are defined
as �(k,# | j), the discrete dimension space is K ⊆ Z[1,kmax], the
saturated parameter space isΘ ⊆ Rkmax+1, and the full
discrete-continuous space becomes � = [K,Θ].
The saturated space is oftentimes high-dimensional when dealing
with random field applications. In order to avoid
convergencedeterioration with increasing k, dimension-independent
MCMC algorithms are applied. These samplers are based on
numericaldiscretizations to stochastic differential equations (SDE)
that preserve the reference prior or posterior measures. A
mainrequirement for an MCMC algorithm to be dimension-independent
is that of being well-defined in function spaces. For instance,the
preconditioned Crank–Nicolson (pCN) algorithm is derived in30 by
discretizing a prior-preconditioned overdamped Langevindynamic SDE
using a Crank–Nicolson scheme. Given the high-dimensional nature of
random fields and the structure of theKL expansion, we focus on
saturated space approaches for which the pCN algorithm can be
utilized, namely: the step-wise andMetropolis-within-Gibbs
algorithms.
4.2.1 Step-wise samplerWe construct a step-wise algorithm based
on the pCN proposal that only requires one acceptance probability
step for both,dimension and parameters. The foundations and
converge properties of this sampler follow from13,14. Consider a
proposal densityq(k,#) = q1(k) q2(#) across the full state space �.
This density takes into account the proposal for the dimension
q1(k) and theproposal for the parameters in the saturated space
q2(#). Under these assumptions the acceptance probability of the
standardMetropolis–Hastings algorithm becomes14
�(k,#; k⋆,#⋆) = min{
1,q1(k⋆) q2(#
⋆) �(k⋆,#⋆ | ỹ)q1(k) q2(#) �(k,# | ỹ)
}
. (21)
We employ the pCN proposal for the parameter vector # in the
saturated space. In this case, the proposal q2(#) cancels out
withthe saturated parameter prior in the target posterior (see,
e.g.,30). Moreover, since k is a discrete variable, the proposal
distributionfor the dimension q1(k) can be represented as a
proposal matrix Q ∈ Rkmax×kmax . This is a right-stochastic matrix
containing theprobabilities of the moves. Such probabilities can be
assigned using a discrete probability law controlled by a spread
parameter� ∈ [1, kmax], defining the width or jump lengths of the
proposal. The resulting acceptance probability simplifies to
�(k,#; k⋆,#⋆) = min
{
1,L(
k⋆,#⋆; ỹ)
L(
k,#; ỹ) ⋅
�pr(k⋆) Q(k⋆, k)�pr(k) Q(k, k⋆)
}
, (22)
-
F. Uribe ET AL 11
which in the context of tBUS-SuS is equivalent to
�(k,#; k⋆,#⋆) = min
{
1,1j(
k⋆,#⋆)�pr(k⋆) Q(k⋆, k)�pr(k) Q(k, k⋆)
}
= 1j(
k⋆,#⋆)
min
{
1,�pr(k⋆) Q(k⋆, k)�pr(k) Q(k, k⋆)
}
⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟(∗)
. (23)
This Metropolis–Hastings implementation on the saturated space
proceeds in a step-wise fashion as follows: in the first step,
acandidate dimension k⋆ is proposed according to the matrix Q. In
the second step, a candidate parameter #⋆ is proposed usingthe pCN
proposal30. Afterwards, the candidate pair (k⋆,#⋆) is rejected or
accepted jointly according to the probability (23).Algorithm 3
describes this procedure in detail. Note that we can alternatively
implement the right term of (23), such that: (i) amodel k⋆ is
proposed and accepted with probability (∗) in (23), (ii) �⋆ is
drawn from a pCN proposal, and (iii) the pair (k⋆,�⋆)is accepted,
if it lies in the domain j (using the indicator function).
Remark 3. From the RJMCMC viewpoint, the jumps in the step-wise
sampler take place between nested models differing indimension
according to the proposal matrix Q. Because of the nested structure
of the KL expansion, a natural jumping functionlinking the
parameter spaces is the identity; this makes the determinant of the
Jacobian of the jumping function in RJMCMCequal to one44.
Algorithm 3 State update in the step-wise sampler for tBUS-SuS
in the standard Gaussian space.1: Input: Let (k,#) be the current
state of the Markov chain and � the pCN proposal scaling2: /* Step
1: sample the dimension */3: Draw a candidate dimension k⋆ ∼ Q(k,
∶)4: /* Step 2: sample the coefficients using pCN */5: Draw
candidate parameters, #⋆ ←
√
1 − �2 # + ��, where � ∼ (0, Ikmax+1)6: Compute the acceptance
probability �k,# as per Eq. (23)7: Sample, Uk,# ∼ Unif(0, 1)8: if
Uk,# < �k,# then9: knext ← k⋆ and #next ← #
⋆
10: else11: knext ← k and #next ← #12: end if13: Output: (knext
,#next)
4.2.2 Metropolis-within-Gibbs samplerThe Metropolis-within-Gibbs
(MwG) algorithm46 updates the parameters # and the dimension k in
an alternating manner. In thesaturated space, MwG explores the
joint posterior using a Gibbs sampling version of the algorithm11,
after including Metropolis–Hastings steps (details are provided
in14). The algorithm can also be derived by writing the posterior
(20) as the product of thedimension posterior and the
dimension-specific parameter posterior9
�pos(k,# | ỹ) = �(k | ỹ)�(# | k, ỹ). (24)
The densities �(# | k, ỹ) may differ abruptly for small changes
in the variable k and thus, the chain might always remain insome
state. However, under the KL formulation (1), the coefficients and
the dimension are independent a priori. This propertyalleviates
potential poor mixing properties in MwG47.The idea of MwG is to
sample each conditional density in (24) by applying two different
steps. Recall that for tBUS-SuS,
these densities need to be defined with respect to the
intermediate levels j . In the first step, we fix the parameter #
and samplethe conditional distribution �(k | ⋅) to propose a
candidate dimension k⋆ using a standard Metropolis–Hastings
sampler. In thesecond step, we fix the variable k (accepted in the
first step), and sample the conditional distribution �(# | ⋅) to
obtain a candidate
-
12 F. Uribe ET AL
parameter #⋆ using the pCN proposal30. The state update in MwG
for tBUS-SuS is formally described in Algorithm 4. Observethat this
approach requires two LSF (likelihood) evaluations for the
generation of one state of the chain.
Algorithm 4 State update in the MwG sampler for tBUS-SuS in the
standard Gaussian space.1: Input: Let (k,#) be the current state of
the Markov chain and � the pCN proposal scaling2: /* Step 1: for
fixed #, sample the conditional distribution �(k | ỹ) */3: Sample
the dimension, k⋆ ∼ Q(k, ∶)4: Compute the acceptance
probability
�k ← 1j(
k⋆,#)
min
{
1,�pr(k⋆) Q(k⋆, k)�pr(k) Q(k, k⋆)
}
5: Sample, Uk ∼ Unif(0, 1)6: if Uk < �k then7: knext ← k⋆
8: else9: knext ← k10: end if11: /* Step 2: for fixed knext,
sample the conditional distribution �(# | knext , ỹ) */12: Sample
the parameters using pCN, #⋆ ←
√
1 − �2 # + ��, where � ∼ (0, Ikmax+1)13: Compute the acceptance
probability �# ← 1j
(
knext ,#⋆)
14: if �# = 1 then15: #next ← #
⋆
16: else17: #next ← #18: end if19: Output: (knext ,#next)
Remark 4. We apply an adaptive version of the pCN algorithm used
within the trans-dimensional Algorithms 3 and 4. The ideais to
control the pCN scaling � to keep the acceptance rate around a
near-optimal value through the simulation. The optimality isdefined
in terms of the smallest error in the approximation of the model
posterior. The adaptation procedure follows from42,41.
5 NUMERICAL EXAMPLES
We test the proposed method on two examples. The first problem
allows to verify the approximations performed by tBUS-SuS,since a
reference model posterior can be computed analytically. In the
second example, a closed-form expression is not available.Thus, we
compute several posterior dimension snapshots using a within-model
BUS-SuS approach to verify the solution estimatedby tBUS-SuS. In
all cases, the intermediate conditional probabilities are fixed at
p0 = 0.1.
5.1 1D cantilever beamThe first example is an inverse problem
involving an ordinary differential equation (ODE) that describes
the equilibrium of acantilever beam. In this case, the solution of
the Bayesian inverse problem can be derived analytically26. The
physical domain isthe interval D = [0, L], where L = 5 m is the
length of the beam. The beam is subjected to a deterministic point
load P = 20 kNat its free right end. The vertical displacements are
constrained at the left edge of the beam. Let F (x) = (IE(x))−1
denote theflexibility of the beam (with x ∈ D), where E is the
elastic modulus, and I the moment of inertia. The deflection
response w(x),for a given flexibility and load configuration, is
governed by the Euler–Bernoulli ODE:
M(x) = −F −1(x)d2w(x)dx2
⇐⇒ w(x) = −P
x
∫0
s
∫0
(L − t)F (t) dtds, (25)
-
F. Uribe ET AL 13
here we use the fact that the bending moment of a cantilever
beam is given byM(x) = (L − x)P .
FIGURE 1 Cantilever beam problem: model description, true values
and set of deflection measurements.
The flexibility is modeled by a Gaussian random field prior (�pr
,�pr), with constant mean �pr = 1 × 10−4 (kN−1m−2)and covariance
matrix �pr defined through a Matérn kernel with smoothing parameter
� = 0.5, which yields the followingexponential autocovariance
function, C(x, x′) = �2pr ⋅ exp
(
−|x − x′|∕l)
, for x, x′ ∈ D. We set the prior standard deviation to�pr =
0.35�pr = 3.5 × 10−5 and perform a parameter study on the
correlation length l. Note that for this type of
covarianceoperator, the KL eigenvalue problem has an analytical
solution24.
The true flexibility field is a realization from the prior
random field (with correlation length ltrue = 2 m). Partial
observationsof the deflection field are generated by simulating the
ODE (25) using this underlying realization. The data is collected
at m = 10equally-spaced points of the domain D (Figure 1). This
generates a measurement vector ỹ ∈ Rm×1 with additive and
spatiallycorrelated error described by a Gaussian PDF, � ∼
(0,�obs), where the covariance structure of the error is
constructed from anexponential kernel with standard deviation �obs
= 1 × 10−3 and correlation length lobs = 1 m.
For this example, closed-form expressions of the model evidence
for each dimension k are available (see, e.g.,26), this allowsus to
derive the model posterior analytically. We consider different
correlation lengths to evaluate its influence on the modelposterior
estimation. Each correlation length also defines different
dimension priors as follows:
■ for l = 0.1, the truncation parameter is kmax = 1014 and the
kmin = 17. This yields p = 6.166 × 10−3.
■ for l = 0.5, the truncation parameter is kmax = 204 and the
kmin = 4. This yields p = 2.586 × 10−2.
■ for l = 0.9, the truncation parameter is kmax = 114 and the
kmin = 2. This yields p = 5.118 × 10−2.
The priors together with the analytical model posterior and
evidence are shown in Figure 2. We employ these
closed-formsolutions as reference to test the performance of the
proposed tBUS-SuS approach.We first evaluate the performance of
tBUS-SuS for different proposal scalings, this includes the
parameter � of the jump
proposal Q and the parameter � of the pCN proposal (the results
are omitted here but are included as Supporting Information).The
studies are performed by monitoring three posterior quantities of
interest (QoIs), namely, the dimension parameter k, theflexibility
random field at the middle of the beam Fmid, and the deflection
random field at the tip of the beamwtip. This allows us tofind
appropriate tuning of the tBUS-SuS algorithm. From these studies we
found that: (i) sampling from the prior instead of usingthe matrixQ
is beneficial, when the change from prior to posterior update is
small (cf. Figure 2), and (ii) adapting the pCN scaling� such that
a target acceptance rate value of � ∈ [0.2, 0.4] is maintained
through the simulation is a good choice. For this example,we found
that � = 0.4 keeps a good approximation error in both, the
monitored random field QoIs and the dimension parameter.
5.1.1 Computational efficiencyWe investigate the computational
gain of using the across-model tBUS-SuS, compared to individual
within-model runs of BUS-SuS. The objective is twofold, (i)
identify the number of effective independent samples in the
resulting set of posterior samples,
-
14 F. Uribe ET AL
0 10 20 30 40 50Dimension k
0
1
2
3
Zỹ(k
)×1025
0 10 20 30 40 50Dimension k
0.00
0.02
0.04
0.06
π̄p
r(k
)
0 10 20 30 40 50Dimension k
0.00
0.02
0.04
0.06
π̄p
os(k|ỹ
)
0 20 400.00
0.02
0.04
ℓ = 0.1 ℓ = 0.5 ℓ = 0.9
FIGURE 2 Closed-form solutions for different correlation lengths
in the prior flexibility random field. Left: model evidence.Center:
model/dimension prior. Right: model/dimension posterior.
and (ii) define a metric that is equivalent in within-model and
across-model simulation approaches. The efficiency metrics
aredefined in the appendix; they are expressed as the ratio between
the effective number of independent samples and the number ofmodels
calls. The results are computed for an average ofNsim = 100
independent simulations usingN = 5 × 103 samples perlevel in
BUS-SuS andN = 104 samples per level in tBUS-SuS.For each
correlation length l ∈ {0.1, 0.5, 0.9}, we use the reference
variances of the QoIs (computed from the closed-form
solution) for the estimation of the effective number of samples
in (B9):
■ �2k ∈ {24161.22, 1250.80, 328.56} for the dimension,
■ �2Fmid ∈ {1.143 × 10−9, 7.484 × 10−10, 5.364 × 10−10} for the
flexibility, and
■ �2wtip ∈ {8.446 × 10−7, 9.042 × 10−7, 9.159 × 10−7} for the
deflection.
Standard tBUS-SuS vs. Adaptive tBUS-SuS: we compare the
approximation of the model posterior between standard tBUS-SuS(with
pre-defined constant r , Algorithm 1) and adaptive tBUS-SuS
(Algorithm 2). The adaptive version is a more general methodsince
it is not always possible to define the constant r a priori.
Moreover, it can produce similar or better results than the
standardtBUS-SuS depending on the accuracy of the constant r in
standard tBUS-SuS, as shown in Table 1.
TABLE 1 Efficiency metric (B10) of the dimension parameter k for
adaptive and standard tBUS-SuS.
ladaptive: eff tBUS(k) standard: eff tBUS(k)MwG step-wise MwG
step-wise
0.1 8.01 × 10−2 4.24 × 10−2 4.00 × 10−2 1.79 × 10−2
0.5 5.49 × 10−2 2.38 × 10−2 6.13 × 10−3 1.85 × 10−2
0.9 3.59 × 10−2 2.41 × 10−2 4.03 × 10−3 2.08 × 10−3
The results show that overall the efficiency of adaptive
tBUS-SuS is larger or comparable to the efficiency provided by
standardtBUS-SuS. This is related to the way we estimate r = ln(r)
in tBUS-SuS. The constant is chosen as the maximum of
105independent log-likelihood evaluations; this value is also
increased by 25% such that r ≈ Lmax,all. However, it is possible
this wayof selecting r is too conservative and more levels in
tBUS-SuS are required to estimate the solution. This is not an
issue of themethod, but of course will lead to a reduced efficiency
since more model evaluations are required. Therefore, we employ
theadaptive tBUS-SuS algorithm for the solution of the Bayesian
model choice problem in the remainder of the paper.
Within-model BUS-SuS runs vs. tBUS-SuS: we compare the
efficiencies in the estimation of the QoIs related to the
randomfields (namely, the mean values of Fmid and wtip) produced by
within-model runs of adaptive BUS-SuS and adaptive tBUS-SuS.Figure
3 shows the efficiencies computed with individual adaptive BUS-SuS
runs (1st column) and adaptive tBUS-SuS using thetwo
trans-dimensional MCMC algorithms (2nd and 3rd columns).
-
F. Uribe ET AL 15
BUS-SuS tBUS-SuS+MwG tBUS-SuS+step-wise
0 10 20 30 40 50Dimension, k
1.40
1.75
2.10
2.45
2.80N̄
(k)
call,B
US
×106
0 10 20 30 40 50Dimension k
0
1
2
3
4
5
N̄ca
ll,t
BU
S·π̄
pos
(k|ỹ
) ×105
0 10 20 30 40 50Dimension k
0
1
2
3
4
5
N̄ca
ll,t
BU
S·π̄
pos
(k|ỹ
) ×105
0 10 20 30 40 50Dimension k
10−4
10−3
10−2
10−1
100
eff(k
)B
US(F
mid
)
0 10 20 30 40 50Dimension k
10−4
10−3
10−2
10−1
100
eff(k
)tB
US(F
mid
)
0 10 20 30 40 50Dimension k
10−4
10−3
10−2
10−1
100
eff(k
)tB
US(F
mid
)0 10 20 30 40 50
Dimension k
10−3
10−2
10−1
100
eff(k
)B
US(w
tip)
0 10 20 30 40 50Dimension k
10−3
10−2
10−1
100
eff(k
)tB
US(w
tip)
0 10 20 30 40 50Dimension k
10−3
10−2
10−1
100
eff(k
)tB
US(w
tip)
20 40
101
102
103
104
ℓ = 0.1 ℓ = 0.5 ℓ = 0.9
FIGURE 3 Comparison between within-model BUS-SuS and
across-model tBUS-SuS: number of model calls and efficiencymetrics
(B10) for different correlation lengths and random field QoIs �Fmid
and �wtip . Adaptive BUS-SuS (1st col), adaptive tBUSwith MwG
sampler (2nd col), and adaptive tBUS with step-wise sampler (3rd
col).
In the first row, we plot the total number of calls per
dimension. In the fixed-dimensional BUS-SuS, the cost increases
with thedimension since more intermediate levels are required to
reach the posterior. Conversely, the cost is a single value for all
thedimensions in across-model tBUS-SuS, and thus we need to
distribute it according to the model posterior. Note in the first
row ofFigure 3 that the total number of calls in within-model
BUS-SuS is larger than tBUS-SuS, even when using a larger number
ofsamples per level in tBUS-SuS; also tBUS-SuS with MwG almost
doubles the cost compared to tBUS-SuS with the step-wisesampler.
Moreover, in the second and third rows of Figure 3, the
efficiencies of the random field QoIs are shown per dimension,since
different KL truncation orders yield different random field
approximations. We remark that the number of effective
samplesobtained with MwG is larger than using the step-wise method.
Nevertheless, the efficiencies in both approaches are similar.This
is because the computational cost normalizes the number of
effective number of samples in the efficiency metric (B10),and MwG
has almost twice the cost of the step-wise sampler. Finally, we
clearly see the advantage of employing across-modelsimulation
algorithms for the solution of Bayesian model choice problems, as
compared to single model runs.
5.1.2 Approximation of the posterior for the dimension and the
random fieldsWe employ the adaptive version of tBUS-SuS for the
estimation of the model and random field posteriors. The
approximatedmodel posteriors are shown in Figure 4 using the MwG
and step-wise samplers. In this case, we plot the mean and
standarddeviation bounds of the approximation. The shape of the
reference model posterior is well-captured for all investigated
correlationlength cases. The variability of the approximation using
the MwG sampler is smaller than the one computed by the
step-wise
-
16 F. Uribe ET AL
algorithm. The differences are larger for smaller correlation
lengths. The tBUS-SuS simulations require in average Nlv =
5intermediate levels to reach the posterior for all the
investigated correlation lengths.
MwG
10 20 30 40 50Dimension k
0.00
0.25
0.50
0.75
1.00
π̄p
os(k|ỹ
)
×10−2tBUS (µk ± σk)Reference
10 20 30 40 50Dimension k
0.00
0.01
0.02
0.03
π̄p
os(k|ỹ
)
tBUS (µk ± σk)Reference
10 20 30 40 50Dimension k
0.00
0.02
0.04
0.06
π̄p
os(k|ỹ
)
tBUS (µk ± σk)Reference
step-w
ise
10 20 30 40 50Dimension k
0.00
0.25
0.50
0.75
1.00
π̄p
os(k|ỹ
)
×10−2tBUS (µk ± σk)Reference
10 20 30 40 50Dimension k
0.00
0.01
0.02
0.03π̄
pos
(k|ỹ
)
tBUS (µk ± σk)Reference
10 20 30 40 50Dimension k
0.00
0.02
0.04
0.06
π̄p
os(k|ỹ
)
tBUS (µk ± σk)Reference
0 20 400.00
0.02
0.04
ℓ = 0.1 ℓ = 0.5 ℓ = 0.9
FIGURE 4 Estimation of the model posterior using adaptive
tBUS-SuS for different correlation lengths in the prior
flexibilityrandom field: MwG (1st row); step-wise (2nd row).
0 1 2 3 4 5x [m]
0.0
0.3
0.6
0.9
1.2
1.5
1.8
Fle
xib
ility
[1/(
kN·m
2 )] ×10−4 ` = 0.1
0 1 2 3 4 5x [m]
0.0
0.3
0.6
0.9
1.2
1.5
1.8
Fle
xib
ility
[1/(
kN·m
2 )] ×10−4 ` = 0.5
0 1 2 3 4 5x [m]
0.0
0.3
0.6
0.9
1.2
1.5
1.8
Fle
xib
ility
[1/(
kN·m
2 )] ×10−4 ` = 0.9
0 1 2 3 4 5x [m]
−1.6−1.4−1.2−1.0−0.8−0.6−0.4−0.2
0.0
Diff
eren
tial
defl
ecti
on[m
] ×10−2 ` = 0.1
0 1 2 3 4 5x [m]
−1.6−1.4−1.2−1.0−0.8−0.6−0.4−0.2
0.0
Diff
eren
tial
defl
ecti
on[m
] ×10−2 ` = 0.5
0 1 2 3 4 5x [m]
−1.6−1.4−1.2−1.0−0.8−0.6−0.4−0.2
0.0
Diff
eren
tial
defl
ecti
on[m
] ×10−2 ` = 0.9
0 2 4
0.00005
0.00010
0.00015
Model mixing Model choice True 95% CI Ref.
FIGURE 5 Posterior flexibility and deflection random fields for
different correlation lengths in the prior flexibility field:
estimatedmean and 95% CI of the best model (model choice) and the
averaging of models (model mixing) using adaptive tBUS-SuS withMwG.
The reference 95% CI is highlighted in gray.
-
F. Uribe ET AL 17
We also estimate the posterior flexibility and deflection random
fields for different correlation lengths. We use as referencethe
closed-form expressions of the posterior random fields26. Figure 5
shows the model choice and model mixing solutions interms of the
posterior mean and posterior 95% credible intervals (CI); this CI
is defined as the region between the 0.025 and0.975 quantiles of
the posterior. For the deflection response field, we compute the
difference between the prior mean and the 95%posterior CIs (called
differential deflection), in order to differentiate the
approximations. The model choice estimate is given bythe truncation
order that yields the maximum model posterior (Figure 4), in this
case kbest ∈ {10, 3, 3} for the correlation lengthsl ∈ {0.1, 0.5,
0.9}, respectively. The model mixing estimate takes into account
the whole dimension spectrum (up to kmax). Notethat the reference
CIs agree closely with the model mixing estimates since we use all
the KL expansions associated to the modelposterior for the random
field representation. For the larger correlation length, the model
choice solution fails to capture theassumed true flexibility in
different intervals of the domain.We conclude this subsection by
illustrating the evolution of the samples in tBUS-SuS and its
relation to the model posterior.
The results are shown for the prior correlation length l = 0.5.
Figure 6 shows the prior, second intermediate level, and
posteriorsamples obtained from a single simulation of adaptive
tBUS-SuS with MwG. The process of sequentially approximating
theposterior is shown by the distribution of the samples, starting
from the prior and narrowing down to the target posterior. Fork =
1, we plot the samples that contribute to the model posterior at
the first dimension, i.e., the one-dimensional KL
coefficientagainst the auxiliary standard uniform random variable.
The tBUS-SuS simulation requiredNlv = 5 levels to reach the
posteriorregion (highlighted in gray), � = [14.74, 5.04, 2.67,
0.31, 0]. Note that the value of the model posterior at k = 1 is
almost zero(cf. Figure 4), and hence the amount of samples is
considerably reduced as the algorithm evolves from the prior to the
posteriormeasure. Moreover, the maximum log-likelihood at dimension
kmax = 204 is c̄204 = 68.170, and at dimension k = 1 it isc̄1 =
56.529. Due to the nested structure of the KL expansion, the
constant r = ln(r) in the LSF (16) is equal to the
maximumlog-likelihood at the largest dimension. In this case, the
scaling r is significantly larger than the value of the covering
constant atdimension 1. Thus, we observe that the posterior samples
at k = 1 are located in a small region of the two-dimensional
parameterspace. This occurs mainly at lower dimensions, since there
exist significant differences between lower- and
higher-dimensionallikelihood values. We remark that there is an
associated reduction of the efficiency, but this does not prevent
the algorithm fromcomputing accurate posterior samples. Moreover,
with increasing k the values of c̄k are closer to r and the
efficiency loss becomesnegligible. For k = 2, Figure 6 plots the
components of the two-dimensional KL coefficients; we also show the
contours of thelog-likelihood function with fixed dimension k = 2.
In this case, the reduction in the number of samples when updating
fromprior to posterior is smaller than at dimension k = 1. Note
that the value of the model posterior at k = 2 is larger than zero,
andthe difference in the probability mass between prior and
posterior at k = 2 is less substantial (cf. Figure 4).
−3 −2 −1 0 1 2 3 4θ
(k=1)1
0.0
0.2
0.4
0.6
0.8
1.0
υ(k
=1)
k = 1
Prior1st level
Posterior
−2 −1 00.0101.673
3.337
5.000 ×10−6
−2 −1 0 1θ
(k=2)1
−1
0
1
2
θ(k
=2)
2
k = 2
Prior1st levelPosterior
FIGURE 6 tBUS-SuS samples of the KL coefficients at dimensions k
= 1 (left) and k = 2 (right). For k = 1, the posteriorregion is
highlighted in gray. For k = 2, the contours of the log-likelihood
function are also plotted.
-
18 F. Uribe ET AL
5.2 2D groundwater flowWe consider inference of the hydraulic
conductivity field of an aquifer using observations of the
hydraulic head measured atspecific boreholes (see, e.g.,48). We
define an aquifer on the square domain D = [0, 1] × [0, 1] km2 with
boundary )D. Spatialcoordinates are denoted by x = [x1, x2] ∈ D.
The steady-state Fick’s second law of diffusion is used to describe
the spatialvariation of the hydraulic head inside the aquifer.
Hence, for a given hydraulic conductivity of the soil �(x, !) and
sink or sourceterms J (x), the hydraulic head u(x) follows the
elliptic PDE
−∇ ⋅ [�(x, !) ∇u(x)] = J (x), (26)
with Dirichlet boundary condition, u(x) = 0 for x ∈ )D. The
source terms are defined as the superposition of nine
weightedGaussian plumes with standard width �J = 1 × 10−3 km. The
plumes have equal and unitary strengths, and are centered
atlocations �J = [0.25 ⋅ i, 0.25 ⋅ j] with i, j = 1,… , 3, that
is
J (x) =9∑
i=1 (x; �(i)J , �
2J I2). (27)
0.0 0.2 0.4 0.6 0.8 1.0x1
0.0
0.2
0.4
0.6
0.8
1.0
x2
κ(x)
0.0
1.1
2.3
3.4
4.5
5.6
6.8
0.0 0.2 0.4 0.6 0.8 1.0x1
0.0
0.2
0.4
0.6
0.8
1.0
x2
u(x)
0.0
0.6
1.2
1.8
2.4
3.0
3.7
0.0 0.2 0.4 0.6 0.8 1.0x1
0.0
0.2
0.4
0.6
0.8
1.0
x2
J(x)
0.0
0.3
0.5
0.8
1.1
1.3
1.6×102
1 2 3 4 5 6 7 8 9 10 11 12 13Sensor index
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
Pre
ssu
reh
ead
[m]
DataTrue
FIGURE 7 Groundwater flow problem: true hydraulic conductivity,
true hydrostatic pressure with measurement locations, sourceterm,
measured and true (noise-free) hydraulic head.
We employ the finite element method to solve the PDE (26) and 2
× 802 = 1.28 × 104 three-node triangular elements areused for the
discretization. The prior hydraulic conductivity field is modeled
as a log-normal random field, �(x) ∶= exp(�′(x)).The underlying
Gaussian field �′(x) has mean zero and standard deviation ��′ = 1
km/day. The covariance operator of theGaussian field is constructed
from a Matérn kernel32, with smoothing parameter � = 1.5. The
solution of the eigenvalue problemis computed with the Nyström
method using 100 Gauss–Legendre points in each direction37.
-
F. Uribe ET AL 19
The true conductivity �(x) is a realization of a random field
with characteristics similar to the prior. In this case, we set
theMatérn kernel parameters as �true = 2.0 and ltrue = 0.1. The
truncation of the KL expansion used to generate this realization
is312, which captures 99% of the prior variance. The hydraulic head
observations ỹ are obtained at m = 12 sensor locations. Theyare
computed from a PDE evaluation of the true conductivity field using
a finer finite element mesh. The measurement error ismodeled as
additive and mutually independent from the random field. It is
defined by a joint Gaussian PDF with mean zero andnoise covariance
matrix �obs = �2obsIm. The variance of the measurement noise is
prescribed such that the observations have asignal-to-noise ratio
V
[
ỹ]
∕�2obs = 120. The true hydraulic conductivity and hydraulic head
fields, together with the source termsand the synthetic data are
shown in Figure 7.
In this example, we evaluate the posterior for different
correlation lengths l ∈ {0.1, 0.2, 0.3}. Each correlation length
definesa dimension prior as follows:
■ for l = 0.1, the truncation parameter is kmax = 512 and the
kmin = 16. This yields p = 6.2914 × 10−3.
■ for l = 0.2, the truncation parameter is kmax = 138 and the
kmin = 5. This yields p = 1.9398 × 10−2.
■ for l = 0.3, the truncation parameter is kmax = 65 and the
kmin = 3. This yields p = 2.9396 × 10−2.
We employ the same proposal scaling settings investigated in the
previous example. Moreover, we use a proposal Q to samplek, in
addition to sample from the prior. The jump proposal matrix is
constructed from a discrete triangular distribution with jumplength
� = 0.25 ⋅ kmax, which appears to be a good choice for both MCMC
algorithms (cf. Supporting Information).
5.2.1 Approximation of the posterior for the dimension and the
random fieldsThe adaptive tBUS-SuS is used to estimate the
posterior of the dimension and the random field. The results are
shown foran average of Nsim = 60 independent simulation runs using
N = 1.5 × 104 samples per level. To compare the
tBUS-SuSapproximations, we compute the reference solution from
model evidences estimated by within-model runs of adaptive
BUS-SuSusingN = 5 × 103 samples per level and averaged overNsim =
90 simulations.For this example, it is not feasible to compute the
full reference solution for the posterior of the dimension by means
of
within-model simulation algorithms. Thus, we estimate the
reference at 6 dimension snapshots for each correlation
length:ksnap ∈ {40, 45, 50, 55, 60, 70} for l = 0.1, ksnap ∈ {20,
30, 35, 40, 50, 60} for l = 0.2, and ksnap ∈ {20, 25, 30, 32, 35,
40} forl = 0.3. Since the reference solutions are given in terms of
the model evidencesZỹ(ksnap), we transform them to model
posteriorsusing (6). This requires the knowledge of the evidence of
all model classes, which is not available in this case. Instead, we
applythe normalization
�pos(ksnap | ỹ) =�pr(ksnap)Zỹ(ksnap)∑
k∈ksnap�pr(k)Zỹ(k)
∑
k∈ksnap
�̂pos(k | ỹ), (28)
such that the sum of the reference �pos(ksnap | ỹ) match the
sum of the estimated model posteriors �̂pos(ksnap | ỹ) at the
givensnapshots. Using this approach, the reference solution is
limited to the tBUS-SuS solution. Nevertheless, it allow us to
check thecorrect shape of the dimension posterior.
The dimension posteriors estimated by adaptive tBUS-SuS with MwG
are shown in Figure 8. We plot the mean and standarddeviation
bounds of the approximations. The solutions are computed when the
dimension is sampled from the prior (1st row)and when it is sampled
from the proposal Q (2nd row). Both alternatives yield comparable
results since there are no significantdifferences between the
posterior approximations. In general, we observe an increase in the
variability around the MAP estimate.Note also that the dimension
posteriors have several modes which can be related to the
nonuniform distribution of the measurementlocations, together with
the symmetry of the KL eigenfunctions. For instance, when using l =
0.3 there is a jump in the valuesof the probability mass after the
15-th dimension, every 5 dimensions until the MAP estimate.
Furthermore, as an indicative ofthe algorithm performance, tBUS-SuS
requires on averageNlv = 12 intermediate levels and the proposal
scaling � changes from0.75 in the first level to 0.09 in the last
level (for the correlation length l = 0.1 and sampling from the
prior).The approximated dimension posteriors using adaptive
tBUS-SuS with the step-wise sampler are shown in Figure 9. In
this
example, the step-wise sampler is more sensitive to the
selection of the dimension proposal than MwG. We observed that
thedimension prior is not a good proposal choice to sample the
dimensions (the results are omitted). The main issue is that
theresulting posterior samples are highly correlated since the
values of the scaling � at the last level of the simulation are in
the orderof 10−4. Therefore, instead of showing a comparison
between the dimension proposal schemes, we employ the proposal
matrixQ with 2 different settings: usingN samples per level, and
using 2N samples per level. In both MCMC algorithms, increasing
-
20 F. Uribe ET AL
kfrom
prior
30 60 90 120Dimension k
0.000.250.500.751.001.251.50
π̄p
os(k|ỹ
)
×10−2Ref. (µk ± σk)tBUS (µk ± σk)
20 40 60 80 100Dimension k
0.00
0.75
1.50
2.25
3.00
π̄p
os(k|ỹ
)
×10−2Ref. (µk ± σk)tBUS (µk ± σk)
10 20 30 40 50 60Dimension k
0
1
2
3
4
π̄p
os(k|ỹ
)
×10−2Ref. (µk ± σk)tBUS (µk ± σk)
kfrom
Q
30 60 90 120Dimension k
0.000.250.500.751.001.251.50
π̄p
os(k|ỹ
)
×10−2Ref. (µk ± σk)tBUS (µk ± σk)
20 40 60 80 100Dimension k
0.00
0.75
1.50
2.25
3.00
π̄p
os(k|ỹ
)
×10−2Ref. (µk ± σk)tBUS (µk ± σk)
10 20 30 40 50 60Dimension k
0
1
2
3
4
π̄p
os(k|ỹ
)
×10−2Ref. (µk ± σk)tBUS (µk ± σk)
20 40
10−5
10−4
10−3
10−2
ℓ = 0.1 ℓ = 0.2 ℓ = 0.3
FIGURE 8 Diffusion example: model posterior using adaptive
tBUS-SuS withMwG sampler, sampling k from the prior (1strow) and
from proposal Q (2nd row).
Nsamples
30 60 90 120Dimension k
0.000.250.500.751.001.251.50
π̄p
os(k|ỹ
)
×10−2Ref. (µk ± σk)tBUS (µk ± σk)
20 40 60 80 100Dimension k
0.00
0.75
1.50
2.25
3.00
π̄p
os(k|ỹ
)
×10−2Ref. (µk ± σk)tBUS (µk ± σk)
10 20 30 40 50 60Dimension k
0
1
2
3
4
π̄p
os(k|ỹ
)
×10−2Ref. (µk ± σk)tBUS (µk ± σk)
2Nsamples
30 60 90 120Dimension k
0.000.250.500.751.001.251.50
π̄p
os(k|ỹ
)
×10−2Ref. (µk ± σk)tBUS (µk ± σk)
20 40 60 80 100Dimension k
0.00
0.75
1.50
2.25
3.00
π̄p
os(k|ỹ
)
×10−2Ref. (µk ± σk)tBUS (µk ± σk)
10 20 30 40 50 60Dimension k
0
1
2
3
4
π̄p
os(k|ỹ
)
×10−2Ref. (µk ± σk)tBUS (µk ± σk)
20 40
10−5
10−4
10−3
10−2
ℓ = 0.1 ℓ = 0.2 ℓ = 0.3
FIGURE 9 Diffusion example: model posterior using adaptive
tBUS-SuS with step-wise sampler and sampling k from proposalQ.
UsingN samples per level (1st row), using 2N samples per level (2nd
row).
the number of samples per level considerably improves the
variability of the estimates. Particularly, using 2N samples per
levelin the step-wise sampler yields comparable results to those of
MwG since we are evaluating the PDE model approximately thesame
number of times. The resulting dimension posteriors are able to
capture the trend of the reference solutions. In both cases,the
estimation is very close to the reference mean value.
-
F. Uribe ET AL 21
Model choice Model mixing
l=0.1
0.0 0.2 0.4 0.6 0.8 1.0x1
0.0
0.2
0.4
0.6
0.8
1.0
x2
µκ(x)
0.1
0.4
0.6
0.9
1.1
1.4
1.6
0.0 0.2 0.4 0.6 0.8 1.0x1
0.0
0.2
0.4
0.6
0.8
1.0
x2
σκ(x)
0.0
0.3
0.7
1.0
1.3
1.7
2.0
0.0 0.2 0.4 0.6 0.8 1.0x1
0.0
0.2
0.4
0.6
0.8
1.0
x2
µκ(x)
0.1
0.4
0.7
0.9
1.1
1.4
1.6
0.0 0.2 0.4 0.6 0.8 1.0x1
0.0
0.2
0.4
0.6
0.8
1.0
x2
σκ(x)
0.0
0.7
1.4
2.0
2.7
3.4
4.1
l=0.3
0.0 0.2 0.4 0.6 0.8 1.0x1
0.0
0.2
0.4
0.6
0.8
1.0
x2
µκ(x)
0.1
0.4
0.6
0.9
1.1
1.4
1.6
0.0 0.2 0.4 0.6 0.8 1.0x1
0.0
0.2
0.4
0.6
0.8
1.0
x2
σκ(x)
0.0
0.3
0.7
1.0
1.3
1.7
2.0
0.0 0.2 0.4 0.6 0.8 1.0x1
0.0
0.2
0.4
0.6
0.8
1.0
x2
µκ(x)
0.1
0.4
0.7
0.9
1.1
1.4
1.6
0.0 0.2 0.4 0.6 0.8 1.0x1
0.0
0.2
0.4
0.6
0.8
1.0
x2
σκ(x)
0.0
0.7
1.4
2.0
2.7
3.4
4.1
FIGURE 10 Diffusion example: posterior mean (1st and 3rd cols)
and standard deviation (2nd and 4th cols) of the
hydraulicconductivity random field using adaptive tBUS-SuS with the
step-wise sampler for different prior correlation lengths (rows)
andemploying model choice or model mixing.
Model choice Model mixing
l=0.1
0.0 0.2 0.4 0.6 0.8 1.0x1
0.0
0.2
0.4
0.6
0.8
1.0
x2
µu(x)
0.0
0.6
1.2
1.8
2.4
3.0
3.6
0.0 0.2 0.4 0.6 0.8 1.0x1
0.0
0.2
0.4
0.6
0.8
1.0
x2
σu(x)
0.0
0.2
0.3
0.5
0.7
0.8
1.0
0.0 0.2 0.4 0.6 0.8 1.0x1
0.0
0.2
0.4
0.6
0.8
1.0
x2
µu(x)
0.0
0.6
1.2
1.8
2.4
3.0
3.6
0.0 0.2 0.4 0.6 0.8 1.0x1
0.0
0.2
0.4
0.6
0.8
1.0
x2
σu(x)
0.0
0.2
0.4
0.6
0.8
1.0
1.1
l=0.3
0.0 0.2 0.4 0.6 0.8 1.0x1
0.0
0.2
0.4
0.6
0.8
1.0
x2
µu(x)
0.0
0.6
1.2
1.8
2.4
3.0
3.6
0.0 0.2 0.4 0.6 0.8 1.0x1
0.0
0.2
0.4
0.6
0.8
1.0
x2
σu(x)
0.0
0.2
0.3
0.5
0.7
0.8
1.0
0.0 0.2 0.4 0.6 0.8 1.0x1
0.0
0.2
0.4
0.6
0.8
1.0
x2
µu(x)
0.0
0.6
1.2
1.8
2.4
3.0
3.6
0.0 0.2 0.4 0.6 0.8 1.0x1
0.0
0.2
0.4
0.6
0.8
1.0
x2
σu(x)
0.0
0.2
0.4
0.6
0.8
1.0
1.1
FIGURE 11 Diffusion example: posterior mean (1st and 3rd cols)
and standard deviation (2nd and 4th cols) of the
hydrostaticpressure random field using adaptive tBUS-SuS with the
step-wise sampler for different prior correlation lengths (rows)
andemploying model choice or model mixing.
Finally, we estimate the posterior hydraulic conductivity and
hydraulic head random fields for the investigated
correlationlengths. Figure 10 shows the model choice and model
mixing solutions in terms of the posterior mean and standard
deviation ofthe hydraulic conductivity (only for l ∈ {0.1, 0.3}).
The model choice estimate is given by the truncation order that
yields themaximum model posterior, in this case kbest ∈ {43, 43,
33} for each investigated correlation length (Figure 9, 2nd row).
The
-
22 F. Uribe ET AL
model mixing estimate takes into account the whole dimension
spectrum. Note that the values in the posterior mean are
smallerthan those of the assumed truth. In this case, most of the
measurements are concentrated in the lower left corner of the
aquifer. Inthis area, the values of the true hydraulic conductivity
are very small, and are influencing the posterior solution.
Nevertheless, thestatistics are revealing the locations of lower
and higher permeability values. The field modeled with the smaller
correlationlength is able to represent the small fluctuations
better than those with larger correlation lengths, for which the
resulting randomfield realizations are smoother. In contrast to the
hydraulic conductivity field, the differences in the model choice
and mixingsolutions for the hydraulic head random field are minimal
(Figure 11). This quantity is computed by integrating the PDE
modelwhich can be seen as an averaging operation that reduces the
effect of the spatial variability, similar to Example 1.
6 DISCUSSION OF RESULTS
The proposed method is an extension of the BUS formulation for
the solution of Bayesian inverse problems where the dimensionof the
parameter space is variable. Such type of inferences are common in
random field updating tasks, since the optimal numberof terms in
the random field series expansion is unknown a priori, and hence,
it can be modeled probabilistically.The main findings of this
contribution in terms of the random field modeling are: (i) If one
employs a uniform prior for the
dimension, the values of the model evidence define the model
posterior themselves. In this case, visits to models with
highdimension have the same probability of occurrence, e.g., in the
beam example for the correlation length l = 0.5, a model with
10terms in the KL expansion is evaluated as many times as the model
with 50 terms, despite the fact that the quality of both modelsis
essentially the same (Figure 2). Thus, the inclusion of a prior
that penalizes models with increased number of parameters
isbeneficial in the context of the KL expansion. We use the
proposed dimension prior to achieve a trade-off between the
dimensionswith large model evidence and the computational cost, in
order to reduce the evaluation of unnecessary high-dimensional
models.(ii) In the context of random fields, our examples show that
the model choice solution is not