Top Banner
Received: Added at production Revised: Added at production Accepted: Added at production DOI: xxx/xxxx RESEARCH ARTICLE Bayesian inference with subset simulation in varying dimensions applied to the Karhunen–Loève expansion Felipe Uribe* 1 | Iason Papaioannou 1 | Jonas Latz 2 | Wolfgang Betz 1 | Elisabeth Ullmann 2 | Daniel Straub 1 1 Engineering Risk Analysis Group, Technische Universität München, Arcisstraße 21, 80333 München, Germany 2 Chair of Numerical Analysis, Technische Universität München, Boltzmannstraße 3, 85748 Garching b.M., Germany Correspondence *Corresponding author. Email: [email protected] Summary Uncertainties associated with spatially varying parameters are modeled through ran- dom fields discretized into a finite number of random variables. Standard discretiza- tion methods, such as the Karhunen–Loève expansion, use series representations for which the truncation order is specified a priori. However, when data is used to update random fields through Bayesian inference, a different truncation order might be necessary to adequately represent the posterior random field. This is an infer- ence problem that not only requires the determination of the often high-dimensional set of coefficients, but also their dimension. In this paper, we develop a sequential algorithm to handle such inference settings and propose a penalizing prior distribu- tion for the dimension parameter. The method is a variable-dimensional extension of BUS (Bayesian Updating with Structural reliability methods), combined with sub- set simulation (SuS). The key idea is to replace the standard Markov Chain Monte Carlo (MCMC) algorithm within SuS by a trans-dimensional MCMC sampler that is able to populate the discrete-continuous parameter space. To address this task, we consider two types of MCMC algorithms that operate in a fixed-dimensional satu- rated space. The performance of the proposed method with both MCMC variants is assessed numerically for two examples: a 1D cantilever beam with spatially varying flexibility and a 2D groundwater flow problem with uncertain permeability field. KEYWORDS: uncertainty quantification, inverse problems, Bayesian model choice, trans-dimensional MCMC, random fields, Karhunen–Loève expansion. 1 INTRODUCTION Numerical approximations to partial differential equations (PDEs) used in engineering and science require the specification of input parameters that are typically unknown and/or intrinsically random. Uncertainties in the values of these quantities can be reduced by incorporating observations or measurements of the physical system into the numerical model. This represents an inverse problem in which the objective is to identify the model parameters that are compatible with the available information. The complexity of inverse problems increase in model choice situations, whereby a single model needs to be selected from a predefined collection of plausible models. Each model in this set can have different parameters or can represent different
27

RESEARCHARTICLE ...€¦ · 6 F.UribeETAL 3.1 TheBUSformulation Whenthecomputationofthemodelevidenceisintractable,theposteriordensityisonlyknownuptoitsscalingconstant,thatis, ˇ …

Oct 22, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • Received: Added at production Revised: Added at production Accepted: Added at production

    DOI: xxx/xxxx

    RESEARCH ARTICLE

    Bayesian inference with subset simulation in varying dimensionsapplied to the Karhunen–Loève expansion

    Felipe Uribe*1 | Iason Papaioannou1 | Jonas Latz2 | Wolfgang Betz1 | Elisabeth Ullmann2 | DanielStraub1

    1Engineering Risk Analysis Group,Technische Universität München,Arcisstraße 21, 80333 München, Germany

    2Chair of Numerical Analysis, TechnischeUniversität München, Boltzmannstraße 3,85748 Garching b.M., Germany

    Correspondence*Corresponding author. Email:[email protected]

    Summary

    Uncertainties associated with spatially varying parameters are modeled through ran-dom fields discretized into a finite number of random variables. Standard discretiza-tion methods, such as the Karhunen–Loève expansion, use series representationsfor which the truncation order is specified a priori. However, when data is used toupdate random fields through Bayesian inference, a different truncation order mightbe necessary to adequately represent the posterior random field. This is an infer-ence problem that not only requires the determination of the often high-dimensionalset of coefficients, but also their dimension. In this paper, we develop a sequentialalgorithm to handle such inference settings and propose a penalizing prior distribu-tion for the dimension parameter. The method is a variable-dimensional extension ofBUS (Bayesian Updating with Structural reliability methods), combined with sub-set simulation (SuS). The key idea is to replace the standard Markov Chain MonteCarlo (MCMC) algorithm within SuS by a trans-dimensional MCMC sampler thatis able to populate the discrete-continuous parameter space. To address this task, weconsider two types of MCMC algorithms that operate in a fixed-dimensional satu-rated space. The performance of the proposed method with both MCMC variants isassessed numerically for two examples: a 1D cantilever beam with spatially varyingflexibility and a 2D groundwater flow problem with uncertain permeability field.

    KEYWORDS:uncertainty quantification, inverse problems, Bayesian model choice, trans-dimensional MCMC, randomfields, Karhunen–Loève expansion.

    1 INTRODUCTION

    Numerical approximations to partial differential equations (PDEs) used in engineering and science require the specification ofinput parameters that are typically unknown and/or intrinsically random. Uncertainties in the values of these quantities can bereduced by incorporating observations or measurements of the physical system into the numerical model. This represents aninverse problem in which the objective is to identify the model parameters that are compatible with the available information.The complexity of inverse problems increase in model choice situations, whereby a single model needs to be selected froma predefined collection of plausible models. Each model in this set can have different parameters or can represent different

  • 2 F. Uribe ET AL

    mathematical assumptions. The uncertainty in the model and its parameters can be treated in a unified manner within the Bayesianinference framework1,2,3.In the Bayesian approach to solve inverse problems, all uncertain parameters are modeled by random variables. The idea is

    to update the (prior) probability distribution of the parameters by including information about the PDE model and observeddata (likelihood). Solving the Bayesian inverse problem then amounts to estimating or characterizing this updated (posterior)distribution. In case of model choice, inference involves the estimation of a conditional posterior distribution induced by themodel class and determining the plausibility of the respective model class. The prior distribution is then hierarchically structuredinto one prior for the parameters conditioned on the model and a second prior for the model itself4,2.

    Closed-form expressions of the model and parameter posterior are oftentimes cumbersome to obtain. As a result, approximatesolutions are computed in practice. One common approach in model choice situations is to use methods that are based on thelikelihood function of individual models, these include, Akaike and Bayesian information (see, e.g.,5,6). Another flexible way toapproach the model choice problem is via sampling methods that are based on Markov Chain Monte Carlo (MCMC). In this case,the characterization of the posterior requires the exploration of a discrete-continuous space. This task can be performed by twomain strategies: (i) fixing the model and solving the inverse problem for each case (see, e.g.,7); an approach referred to as within-model simulation (several methods are described in the seminal work8); and (ii) performing simultaneous inference on both, modeland parameters, using MCMC algorithms that explore the parameter space by moving between different models; this approachis called across-model simulation, where the standard algorithm is the reversible jump MCMC algorithm9,10 (other MCMCapproaches for across-model simulation include11,12,13,14). Most of the disadvantages of standard MCMC, such as convergencerate deterioration with increasing dimensions and burn-in/thinning periods requirements, are also present in the MCMC samplersused in across-model simulation15. In standard inference settings, specialized sequential algorithms that gradually approach theposterior distribution alleviate several of these issues16,17. Some of these algorithms have been adapted to perform across-modelsimulation, e.g., the sequential importance sampling with reversible jump MCMC18 and the population-based reversible jumpMCMC19.The problem of inferring both, models and parameters, also applies to cases involving a single mathematical model that has

    parameters with variable dimension. Common examples include, mixture models with an unknown number of components20,polynomial regression where the degree of the polynomial is variable17, or general functional representations that use seriesexpansions for which the number of terms is unknown. The latter is of particular relevance in the context of learning spatiallyvarying parameters represented via random fields21. Random fields increase the complexity of the inverse problem since theposterior distribution is defined over an infinite-dimensional space. Series representations are typically applied in order to projectthe random field to a finite-dimensional space. For instance, the Karhunen–Loève (KL)22,23 expansion discretizes the field usingthe eigenvalues and eigenfunctions of its autocovariance operator to construct a series expansion with random coefficients24. It iscommon practice to truncate the KL expansion after a finite number of terms based on some variance-representation criterion.This heuristic is generally valid in prior situations when no information or observations about the field are available. In theinversion case, the optimal number of terms in the series expansion is unknown and it is controlled by the data25,26.

    In this paper, we propose an efficient sequential methodology that is able to perform inference in parameter spaces of differentdimension. The method is an extension of the classical BUS (Bayesian updating with structural reliability methods) framework,which expresses a Bayesian inverse problem as an equivalent rare event simulation task27. For an efficient and sequential solutionof the inverse problem, BUS is combined with subset simulation (SuS)28 (this approach is called BUS-SuS), although otherrare event estimation methods can be employed (see, e.g.,29). The main idea is to incorporate the discrete dimension randomvariable into the parameter space defined by BUS, such that the resulting sequence of intermediate distributions also dependon the dimension. The associated intermediate densities become trans-dimensional and the standard MCMC algorithms usedwithin BUS-SuS are no longer valid. Therefore, we investigate a class of trans-dimensional and dimension-independent MCMCalgorithms that explore a so-called saturated or composite parameter space in an alternating manner. In this space, the dimensionis fixed to a maximum upper value, which is selected conservatively based on prior information. Particularly, we discuss aMetropolis-within-Gibbs algorithm and develop a step-wise sampler as a simplified reversible jump MCMC in the saturated space.Since this space is typically high-dimensional, the core of these algorithms is the preconditioned Crank–Nicolson sampler30.The efficiency and accuracy of the method is tested on engineering models involving random field parameters represented withthe KL expansion: one example for which reference solution of the dimension posterior is available, and a second example thatrequires the estimation of the posterior at some reference dimensions using within-model simulation runs, to verify the dimensionposterior estimated by our algorithm.

  • F. Uribe ET AL 3

    We also address the specification of the model/dimension prior by defining a discrete distribution that penalizes increasingdimensionality. In model choice problems, imposing a penalty to complicated models is necessary (see1,5,31 for a discussion).A model with more parameters usually fits the data better than a model with less parameters. However, the actual modelingimprovement might be negligible or possible over-fitting of the data can arise25,26. The proposed dimension prior is defined toavoid such situations and we build it based on the geometry of the parameter space and prior information about the random fields.The organization of the paper is as follows: in section 2, we present fundamental concepts of random field modeling and the

    KL representation; we also formulate the Bayesian inversion problem in the fixed- and variable-dimensional settings. At theend of the section, we propose a prior for the specification of the dimension parameter. The major contribution of this work isintroduced in section 3, where we explain the trans-dimensional BUS algorithm. This methodology is based on trans-dimensionalMCMC samplers, which are discussed in section 4. Next, the proposed method is demonstrated by means of two numericalexperiments in section 5. The paper finalizes with a discussion of results in section 6 and a summary of the work in section 7.

    2 MATHEMATICAL FORMULATION

    2.1 Random fields and the Karhunen–Loève expansionLet (Ω,F ,P) be a probability space and D ⊆ Rd an index set representing a physical domain, and L2(Ω,P) the Hilbert spaceof second-order random variables. A real-valued random field is a functionH(x, !) ∶ D × Ω → R, with arguments x ∈ D aspatial coordinate and ! ∈ Ω an outcome of the sample space32,33. Intuitively, a random field can be interpreted either as a singlerandom variable that takes values in a function space, or as a collection of random variables indexed in space.

    Random fields are represented in terms of a finite set of random variables using stochastic discretization algorithms. Populardimensionality reduction techniques are based on finite expansions of random variables and deterministic functions. Theseinclude the Karhunen–Loève expansion22,23, which expresses a random field as a linear combination of orthogonal functionschosen as the eigenfunctions resulting from the spectral decomposition of the covariance operator. Since all positive-definitefunctions have an unique spectral representation (see Bochner’s Theorem32, section 3), one can define an orthonormal basis, uniqueand optimal in the mean-squared sense, consisting of the eigenfunctions of the covariance operator together with a sequence ofreal and non-negative eigenvalues34, p.248. One can use such basis to represent a second-order random field as

    H(x, !) ≈ Ĥ(x; k,�(!)) ∶= �(x) +∞∑

    i=11(i ≤ k)

    �i�i(x) �i(!), (1)

    where Ĥ(x; k,�(!)) is the approximated field, 1(⋅) denotes the indicator function, k is the truncation order of the expansion,�i(!) ∶ Ω → R is a set of mutually uncorrelated random variables with mean zero and unit variance, �i ∈ [0,∞) are theeigenvalues of the covariance operator, satisfying �i ≥ �i+1, limi→∞ �i = 0,

    ∑∞i=1 �i < ∞, and �i(x) ∶ D → R are the

    eigenfunctions of the covariance operator, with �i(x) ∈ L2(D). For Gaussian random fields, the variables �i(!) are independentstandard Gaussian. In the general case, the distribution of �i(!) is cumbersome to estimate. The series expansion in (1) followsfrom Mercer’s Theorem (details are provided in35) and it is referred to as the Karhunen–Loève (KL) expansion. We remark thatthe set of eigenpairs {�i, �i} is computed through the solution of an homogeneous Fredholm integral equation of the secondkind24, which can be solved using different approaches, such as projection (collocation, Galerkin)36 or Nyström methods37.The KL expansion is often employed to reduce the dimensionality and parameterize random fields. Consider the square-

    integrable random vector �(!) ∈ Θk ⊆ Rk resulting from truncating the KL series expansion (1) at the k-th term. Thistruncation yields an approximate field, which is optimal in the mean-squared-error sense as compared to any other spectralprojection algorithm24. Since the eigenpairs associated to the covariance operator are deterministic quantities, the parameter�(!) characterizes the randomness of the field. Hence, the KL construction only depends on the vector of random coefficients �and the truncation order k.

    2.2 Bayesian inverse problems in fixed dimensionWe begin by considering the forward problem y = G(H(x, !)), where G ∶ L2(D)→ L2(D) is a solution operator expressingthe relationship between the input parameters and the model response. We are interested in models where the operator G impliesthe solution of a PDE that has random fields as parameters. G operates on the function space L2(D) since both, input and output,are random field realizations of two different quantities on the physical domain D. The dimensionality of the forward problem

  • 4 F. Uribe ET AL

    can be reduced using the parameterized random field in the expansion (1), such that the input parameter space is now given byΘk. Since k is fixed, we write the approximated field as Ĥ(x;�) and the parameter space as Θ ⊆ Rk.

    In inverse problems, the aim is to infer the parameters � ∈ Θ given noisy observations of the system response ỹ ∈ ∶= Rm,with m denoting the number of observations and the data space. Assuming an additive observation error, the objective is:

    find � ∈ Θ ∶ ỹ = (

    Ĥ(x;�))

    + �, (2)

    where = G◦ ∶ Θ→ is the forward response operator, defined as the composition of the solution operator G ∶ Θ→ L2(D)with an observation operator ∶ L2(D)→ that maps the forward solution to the data space; and � ∈ Rm is the observationnoise which is typically assumed to be Gaussian distributed with mean zero and non-singular covariance matrix �obs ∈ Rm×m.

    The inverse problem (2) is generally ill-posed. Bayesian statistical methods offer a framework that integrates the observationswith prior information, providing a mechanism of regularization. In Bayesian inverse problems, the components of the parametervector � are modeled as random variables and are assumed to have an initial prior density �pr (�). The likelihood functionL(

    �; ỹ)

    = �like(

    ỹ | �)

    is a density on the data space and provides a link between the model and data. After including observations,the updated belief about � is represented by the posterior density �pos(� | ỹ). Through Bayes’ Theorem this conditional density is

    �pos(

    � | ỹ)

    = 1Zỹ

    �pr (�) L(

    �; ỹ)

    ∝ exp(

    −12‖

    �−1∕2pr(

    � − �pr)

    2

    2+ ln L

    (

    �; ỹ)

    )

    , (3)

    where Zỹ = ∫Θ �pr(�) L(

    �; ỹ)

    d� is the normalizing constant of �pos(

    � | ỹ)

    , called the model evidence.

    Remark 1. Since we employ the KL expansion to represent random fields, the prior distribution is Gaussian � ∼ (�pr ,�pr).This is reflected in the right-hand side of (3). For the KL coefficients, the prior mean and covariance are given by �pr = 0 and�pr = Ik (Ik ∈ Rk×k denotes the identity matrix). The information about the second-order properties of the random field entersdirectly in the definition of the log-likelihood function via the forward operator.

    2.3 Bayesian inverse problems in varying dimensionConsider a more general inference setting for which the set of observed data ỹ is associated not only with one, but with a finitecollection of plausible models = {1,… ,k,… ,kmax}, where k ∈ is a model indicator index, and kmax < ∞ is aprescribed limit on the collection. The resulting discrete-continuous parameter space can be written as = ∪k∈

    (

    {k} × Θk)

    .Observe that there exist different uncertain parameter vectors �k ∈ Θk ⊆ Rk for each particular model k, and thus, the goal is toextract information from the data to infer jointly the pairs

    (

    k,�k)

    ∈ . For the sake of simplicity in notation, we shall henceforthuse the model indicator index k to denote the modelk.

    Let �pr(�k | k) be a first-level prior density imposed on the parameter �k given the model k, and �pr(k) a second-level discreteprior mass specified over the models k (we use the notation � to indicate probability mass functions). The joint posterior densityover both, model and parameters, is computed based on Bayes’ Theorem as

    �pos(k,�k | ỹ) =1Z ỹ

    �pr(k)�pr(�k | k)L(

    k,�k; ỹ)

    ∝ �pr(k) exp(

    −12‖

    �−1∕2pr,k(

    �k − �pr,k)

    2

    2+ ln L

    (

    k,�k; ỹ)

    )

    , (4)

    wherein the prior parameters depend on k, and the evidence is given by the law of total probability:

    Z ỹ =∑

    k′∈�pr(k′)Zỹ(k′) =

    k′∈�pr(k′)∫

    Θk′

    �pr(�k′ | k′)L(

    k′,�k′ ; ỹ)

    d�k′ (5)

    and Zỹ(k) is the evidence of the individual model k. The posterior density of the models is obtained by integrating out theparameters in (4) as

    �pos(k | ỹ) =�pr(k) ∫Θk �pr(�k | k)L

    (

    k,�k; ỹ)

    d�k∑

    k′∈ �pr(k′) ∫Θk′ �pr(�k′ | k′)L

    (

    k′,�k′ ; ỹ)

    d�k′= �pr(k)

    Zỹ(k)

    Z ỹ. (6)

    The model posterior in (6) can be used to perform (i) model choice or selection, which requires the computation of themaximum a posteriori probability (MAP) estimator, kMAP = argmaxk∈ �pos(k | ỹ), or (ii) model mixing or averaging whichrequires the consideration of the whole collection of parameters weighted by �pos(k | ỹ). Model choice is used as indicator ofmodel complexity, i.e., the model that provides the best alignment with the observed data should be preferred over unnecessarilycomplicated ones. The model mixing solution consists of the model posterior predictive distribution. In this case, all the collection

  • F. Uribe ET AL 5

    of models is used for future decisions and avoids the underestimation of uncertainty resulting from choosing only a single model.Since this process leads to a higher computational cost, only models that are sufficiently likely compared to the MAP estimatormay be considered in the analysis. Occam’s window and Bayes factors are used to perform such a model reduction (2, p.368).

    The formulations in (4) and (6) are also applicable to Bayesian non-parametric settings where in fact there exists only a singlemathematical model, but one with variable-dimension parameter38. We are interested in the latter since this corresponds toBayesian inverse problems involving random fields represented by a series expansion whose number of terms is not fixed. Inthe KL expansion, the set of models is defined by the model indicator indices = {1, 2,… , k,… kmax}, where each elementdefines a truncation order. This truncation specifies the dimensionality of the standard Gaussian random coefficients of the KLexpansion, thus, each particular model/dimension k involves a vector of uncertain parameters �k ∈ Θk. The aim is to performsimultaneous inference on the discrete random variable k (dimension), and the associated random vector �k (coefficients) of theKL expansion. In the following, we often use the terms model and dimension interchangeably.

    2.4 Selection of the prior distribution for the KL truncationThe model evidence associated to the dimensions of the KL discretization does not reveal the classical Bayesian penalizationbehavior appearing for example in regression models, where constantly increasing the polynomial order, eventually reduces themodel evidence values (revealing potential over-fitting). In the KL expansion, the model evidence keeps increasing as one addsmore terms26. This is directly related to the representation of the posterior covariance, since its approximation improves as k→∞.Nevertheless, it has been shown in26 that the information gained by continuously increasing KL terms becomes negligible oncean optimal truncation is achieved. The model evidence keeps increasing, but very slowly after such optimal number of terms isreached. This behavior motivates the definition of a prior for the dimension parameter k that penalizes increasing dimensionality.We employ a truncated geometric distribution (more details concerning this choice are given in the appendix)

    �pr(k) =(1 − p)k−1p1 − (1 − p)kmax

    k = 1,… , kmax, (7)

    where kmax is the upper truncation level, and the success probability p ∈ (0, 1) marks the decay rate of the probability mass. Thisparameter allows us to control the shape of the distribution.In practice, the parameter space is bounded and typically some prior knowledge about these bounds is available. We select

    the parameter p by regulating the behavior of the distribution at the tails, such that P[

    k ≤ �u]

    = �, where �u is a prescribedthreshold which is associated with probability � that k is smaller than �u. Based on our experiments, �u is chosen as the numberof terms in the KL expansion that retains 50% of the variability in the prior random field, and we assign to that event a probabilityof � = 0.10. Furthermore, the truncation value kmax is selected as the number of terms in the expansion that retains 99% ofthe variability. By doing so, approximately 90% of the probability mass is concentrated on truncation orders higher than thoseyielding the 50% variability and smaller that those producing the 99% variability. We found that this heuristic produces a decay pthat does not excessively penalize high-order KL terms.We remark that other prior models for the dimension parameter have been derived in the context of random fields, e.g., an

    exponential prior with fixed decay rate for the truncation order in the KL expansion30, and a penalized complexity prior fordifferent values of a re-parameterized Matérn kernel39.

    3 BAYESIAN INFERENCEWITH SUBSET SIMULATION IN SPACES OF VARYINGDIMENSION

    The characterization of the posterior distribution using standard MCMC algorithms can be inefficient not only because severaliterations are required to compute accurate statistics, but also tuning and post-processing steps need to be implemented (e.g.,burn-in and lag periods). The task is even more complicated when the posterior distribution is high- and trans-dimensional.Therefore, a common approach is to embed standard MCMC samplers into algorithms that start from the prior and sequentiallyapproach the posterior distribution40. The idea is to explore the posterior on-the-fly by constructing a set of intermediate measuresthat converge to the full posterior. An approach that belongs to this class of algorithms is BUS (Bayesian Updating with Structuralreliability methods), which reformulates the Bayesian inverse problem as a classical reliability analysis (rare event estimation)problem. This construction allows one to employ efficient reliability estimation algorithms to sample from the posterior. In thissection, we discuss the combination of BUS with subset simulation and its extension to variable-dimensional inference problems.

  • 6 F. Uribe ET AL

    3.1 The BUS formulationWhen the computation of the model evidence is intractable, the posterior density is only known up to its scaling constant, that is,�pos

    (

    � | ỹ)

    ∝ �pr(�)L(

    �; ỹ)

    = �(�). The posterior density can be characterized by drawing samples from this unnormalizedtarget density. Particularly, the rejection sampling algorithm generates samples from �(�) using a proposal density q(�). Theproposal is selected such that it dominates the target function. This means that q(�) must have equal or heavier tails than those of�(�). Therefore, the proposal satisfies the relation

    supΘ

    (

    �(�)q(�)

    )

    ≤ c̄ < ∞ for some covering constant c̄ ∈ R>1 (8)

    and supp (�(�)) ⊆ supp (q(�)). Thereafter, samples drawn from q(�) are rejected strategically to make the resulting acceptedsamples distributed according to �(�). A simple choice for the proposal density is the prior distribution �pr(�). In this case, theacceptance probability � in rejection sampling16 becomes

    � =�(�)c̄ ⋅ q(�)

    =�pr (�) L

    (

    �; ỹ)

    c̄ ⋅ �pr(�)= c ⋅ L

    (

    �; ỹ)

    , (9)

    where c = 1∕c̄ ∈ R>0 and the covering constant is selected such that c̄ ≥ Lmax = max(L(

    �; ỹ)

    ). Rejection sampling then amountsto (i) drawing a standard uniform random number � ∼ Unif[0, 1], (ii) sampling a candidate from the prior � ∼ �pr(�), and (iii)accepting the candidate if � ≤ � = c ⋅ L

    (

    �; ỹ)

    . This particular acceptance-rejection mechanism allows us to generate the space

    = {(�, �) ∈ � ∶ ℎ(�, �) ≤ 0} , where ℎ(�, �) = � − c ⋅ L(

    �; ỹ)

    (10)

    and � = [Θ,Υ] is an augmented parameter space (� ∈ Θ ⊆ Rk and � ∈ Υ ⊆ R[0,1]).In the context of reliability analysis and rare event simulation, the space defines a failure domain with limit-state function

    (LSF) ℎ(�, �). Samples drawn from the prior that fall into are distributed according to the posterior. This connection is thefoundation of the BUS approach, since one can employ existing methods from rare event simulation to perform Bayesian inference.Indeed, the previous rejection sampling algorithm corresponds to applying standard Monte Carlo simulation for the solution of arare event estimation problem defined by the LSF ℎ(�, �) over the space �.The main objective in reliability analysis is to estimate of the probability of failure. When employing the BUS framework,

    this value is associated to the probability that the samples belong to the domain , i.e., p = P[] = P[ℎ(�, �) ≤ 0]. Thisprobability, which is obtained as a by-product of BUS, is used to estimate the model evidence as27

    Zỹ = c−1 ⋅ p = c̄ ⋅ p . (11)

    Note that the application of BUS requires the knowledge of the constant c = 1∕c̄. From (9), it is seen that the covering constantis optimally chosen as the maximum of the likelihood function c̄ = Lmax. If c̄ < Lmax, the resulting samples will be distributedaccording to a truncated posterior distribution41. Conversely, if c̄ > Lmax, the efficiency of BUS decreases because the value ofp will be small and more samples are required for its estimation. Since in many cases Lmax is not known in advance and itscomputation poses an additional cost, we employ the strategy introduced in41 for which the constant c is adaptively computed ateach step of the simulation. We discuss this method in the variable-dimensional context in subsection 3.3.

    3.2 BUS with subset simulation in fixed dimensionsIn order to efficiently compute samples from the posterior distribution, BUS is often combined with subset simulation (SuS)28.The combination of BUS with SuS (called BUS-SuS), performs Bayesian inversion sequentially. This is because SuS transformsthe task of estimating the rare event {ℎ(�, �) ≤ 0} into a sequence of problems involving more frequent events.In BUS-SuS, the parameter space � is divided into a decreasing sequence of nested subsets or intermediate levels, starting

    from the whole space and narrowing down to the target posterior space, i.e., � = 0 ⊃ 1 ⊃ ⋯ ⊃ Nlv = , such that =

    ⋂Nlvj=0j , whereNlv is the number of intermediate levels. Based on the general product rule of probability, the probability

    that the prior samples fall into the posterior space, p is given by

    p = P[

    ∩Nlvj=0j]

    = P[

    Nlv | ∩Nlv−1j=0 j

    ]

    P[

    ∩Nlv−1j=0 j]

    =Nlv∏

    j=1P[

    j | j−1]

    , (12)

  • F. Uribe ET AL 7

    where P[

    j | j−1]

    represents the conditional probability at level (j − 1). Each intermediate level is defined as the set j ={(�, �) ∈ � ∶ ℎ(�, �) ≤ �j}, where∞ = �0 > �1 > ⋯ > �j > ⋯ > �Nlv = 0, is a decreasing sequence of threshold levels. Inpractice, it is not possible to make an optimal a priori selection of the sequence {�j}

    Nlvj=0. Therefore, they are adaptively selected

    as the p0-percentile of the LSF values of the samples simulated at intermediate level j−1 28. This implies fixing the conditionalprobabilities to a common value p0 = P

    [

    j ∣ j−1]

    (with p0 ∈ [0.1, 0.3]).At the first level, 0 samples are generated using standard Monte Carlo simulation. Thereafter, BUS-SuS employs a modified

    MCMC algorithm to draw samples from each intermediate conditional density �(�, � | j). The Markov chains are initializedfromNs = N ⋅ p0 samples conditional on j−1 for which ℎ(�, �) ≤ �j . The process is repeated until the target posterior domainis reached (see, e.g.,42). At the last level, the probability p in (12) is estimated as p̂ = p

    Nlv−10 ⋅ p̂Nlv , where p̂Nlv represents the

    last conditional probability which is estimated by Monte Carlo as the ratio of the number of samples that lie in and the numberof samples per levelN . The p̂Nlv ⋅N samples that lie in are used as seeds to generate the final batch ofN samples conditionalon . The resulting samples are uniformly weighted but correlated samples of the posterior distribution, and the probabilityestimate p̂ is used to compute the model evidence via (11).

    Remark 2. It is common practice to solve reliability problems in the standard Gaussian space. Due to the BUS formulation, thisalso translates to the Bayesian inversion. Hence, a new standard Gaussian parameter vector # = [�, ��]T ∈ Rk+1 is created bycombining the KL coefficients � and the transformed auxiliary uniform variable �� = �−1(�), where �(⋅) denotes the standardGaussian cumulative distribution. Furthermore, in order to guarantee a smooth transition between the intermediate levels, as wellas for numerical stability, the LSF (10) is expressed in terms of the log-likelihood. Applying the natural logarithm to each termof ℎ(�, �) in (10) yields43

    ℎln(#) = ln(�(��)) − ln(c ⋅ L(

    �; ỹ)

    ) = ln(�(��)) + ln(c̄) − ln L(

    �; ỹ)

    . (13)

    3.3 BUS with subset simulation in varying dimensionsWe now extend the concepts of subsections 3.1 and 3.2 to the variable-dimensional case. The basic idea is to re-augmentthe parameter space by including the discrete dimension variable. This requires minor modifications of the target LSF, andthe application of trans-dimensional MCMC algorithms to sample the intermediate conditional densities. We denote thistrans-dimensional BUS-SuS methodology as tBUS-SuS.Consider the general Bayesian inverse problem (4). The joint posterior distribution is characterized by a target function in a

    discrete-continuous space, �(k,�) = �pr(k)�pr(�k | k)L(

    k,�k; ỹ)

    ∝ �pos(

    k,�k | ỹ)

    . We choose the proposal distribution to beequal to the full prior q(k,�k) = �pr(k)�pr(�k | k). The acceptance probability in rejection sampling becomes

    � =�(k,�)r ⋅ q(k,�)

    =�pr(k)�pr(�k | k)L

    (

    k,�k; ỹ)

    r ⋅ �pr(k)�pr(�k | k)= r ⋅ L

    (

    k,�k; ỹ)

    , (14)

    where r = 1∕r ∈ R>0. By analogy with the fixed-dimensional setting, the covering constant r can be optimally chosen asLmax,all = max(L

    (

    k,�k; ỹ)

    ), i.e., as the maximum of the likelihood function across different dimensions. Thereafter, samplesdrawn from the priors k ∼ �pr(⋅) and �k ∼ �pr(⋅ | k) are accepted if � ≤ � = r ⋅ L

    (

    k,�k; ỹ)

    , otherwise they are rejected. In thiscase, the -space and the LSF in (10) are re-defined as

    ={

    (k,�k, �) ∶ ℎ(k,�k, �) ≤ 0}

    , where ℎ(k,�k, �) = � − r ⋅ L(

    k,�k; ỹ)

    (15)

    and� = [K,Θk,Υ] is the re-augmented discrete-continuous parameter space (k ∈ K ⊆ Z>0, �k ∈ Θk ⊆ Rk and � ∈ Υ ⊆ R[0,1]).As it will be seen in section 4, we employ trans-dimensional MCMC algorithms that work in a saturated space for which� ∈ Θ ⊆ Rkmax , and thus we write � = [K,Θ,Υ].

    The BUS-SuS algorithm can be extended analogously to solve the Bayesian inverse problem (15). Each intermediate domainis now defined as the set j = {(k,�, �) ∈ � ∶ ℎ(k,�, �) ≤ �j}, with the threshold level sequence {�j}

    Nlvj=0 adaptively selected as

    in the fixed-dimensional case. Under the LSF (15), the standard MCMC algorithms used within BUS-SuS are no longer suitablefor sampling the intermediate densities �(k,�, � | j) and trans-dimensional MCMC methods are required. We modify thesealgorithms to sample the intermediate densities conditional on events defined by the sequence of levels {�j}; this will be shownin section 4. Moreover, instead of the LSF in (13), we employ its variable dimensional extension

    ℎln(k,#) = ln(�(��)) + r − ln L(k,�; ỹ) , (16)

  • 8 F. Uribe ET AL

    where # = [�, ��] ∈ Rkmax+1, and r = ln(r) is optimally the maximum of the log-likelihood function across the differentdimensions. We summarize the tBUS-SuS method in Algorithm 1.

    Algorithm 1 tBUS-SuS.1: Input: number of samples per level N , conditional probability p0, covering constant r , maximum dimension kmax, log-

    likelihood function ln L(

    ⋅, ⋅; ỹ)

    , dimension prior �pr(k)2: DrawN samples from the dimension prior, k0 ∼ �pr(⋅)3: DrawN samples from the standard Gaussian, #0 = [�0, ��,0] ∼ (0, Ikmax+1)4: Compute the initial log-likelihood function values, Leval ← ln L

    (

    k0,�0; ỹ)

    5: Set j ← 0 and �0 ←∞6: while �j > 0 do7: Increase intermediate level counter, j ← j + 18: Compute LSF values, ℎeval ← ln(Φ(��,j−1)) + r − Leval9: Sort ℎeval in ascending order and create a vector idx to store the indices of this sorting10: Create ksort ,#sort as the dimension and parameter samples kj−1,#j−1 sorted according to idx11: Set the intermediate threshold level �j as the p0-percentile of the values in ℎeval12: Compute the number of samples in the j-th intermediate level,Nj ←

    ∑Ni=1(ℎ

    (i)eval ≤ max(0, �j))

    13: if �j > 0 then14: pj−1 ← p015: else16: �j ← 0 and pj−1 ← Nj∕N17: end if18: Select seeds for the MCMC step, (kseed,#seed)← {k

    (i)sort ,#

    (i)sort}

    Nji=1

    19: Generate next level values {k(i)j ,#(i)j , L

    (i)eval}

    Ni=1 from the seeds (kseed,#seed) and intermediate level �j using a trans-

    dimensional MCMC algorithm. Here, each seed is used to construct a chain with Nc = floor(N∕Ns) states, whereNs = Nj is the number of seeds

    20: end while21: Set the posterior samples, kpos ← kj and #pos ← #j22: for k← 1 to kmax do23: Find the number of posterior samples that lie in dimension k,Nk ←

    ∑Ni=1(k

    (i)pos = k)

    24: Estimate the model posterior, �̂(k)

    pos ← Nk∕N25: end for26: Output: sequence of posterior dimension samples {k(i)pos}Ni=1 and parameter samples {#

    (i)pos}

    Ni=1, and model posterior �̂pos.

    Since finding the constant r = ln(r) poses an additional computational cost, it is convenient to introduce a tBUS-SuS algorithmfor which the covering constant r is not required as an input. We employ the adaptive BUS-SuS methodology proposed in41 forthe trans-dimensional setting. In this case, the covering constant is updated at each level, leading to a set of values {rj}

    Nlvj=0. In

    order to guarantee the nestedness of the intermediate domains, the threshold levels �j are corrected after updating the value rj .First note that from (16), a j-th intermediate domain is defined as the set

    j = {(k,#) ∶ ln(�(��)) + r j − ln L(k,�; ỹ) ≤ �j} (17)where r j = ln(rj) is the maximum of the log-likelihood function observed at the j-th sampling level. The idea of41 is that theevent j associated with the log-scaling constant r j and the threshold �j , can be equivalently expressed by a scaling r ′j and amodified threshold �′j selected as �

    ′j = �j − r j + r ′j . This allows one to sequentially update the covering constant r j to a new

    value r ′j , without compromising the distribution of the samples: adjusting the threshold value from �j to �′j (after updating thescaling from r j for r ′j) still defines the same intermediate domain, as j =

    {

    (k,#) ∶ ln(�(��)) ≤ �′j − r ′j + ln L(k,�; ỹ)}

    isequivalent to (17). In principle, �′j corrects the level �j using the residual of maximum log-likelihood values observed at differentlevels. At the last simulation level, when r ′j = r j , the covering reaches a value that is close or equal to the actual maximum

  • F. Uribe ET AL 9

    log-likelihood, i.e., rNlv ≤ ln(Lmax,all). In the limitN → ∞, the value rNlv converges to ln(Lmax,all). Despite the fact that rNlv islikely to be smaller than ln(Lmax,all), the samples generated by the algorithm follow the posterior distribution, as shown in41. Theadaptive tBUS-SuS method is described in Algorithm 2.

    Algorithm 2 Adaptive tBUS-SuS.1: Input: number of samples per level N , conditional probability p0, maximum dimension kmax, log-likelihood functionln L

    (

    ⋅, ⋅; ỹ)

    , dimension prior �pr(k)2: Repeat Lines 2-4 of Algorithm 13: Compute the initial maximum log-likelihood, r 0 ← max(Leval)4: Set j ← 0 and �0 ←∞5: while �j > 0 do6: Increase intermediate level counter, j ← j + 17: Compute LSF values, ℎeval ← ln(Φ(��,j−1)) + r j − Leval8: Repeat Lines 9-19 of Algorithm 19: Compute a new value of the maximum log-likelihood, r ′j ← max

    (r j , {L(i)eval}Ni=1)

    10: Compute the modified intermediate threshold level, �j ← �j − r j + r ′j , and update r j ← r ′j11: end while12: Repeat Lines 21-25 and Output of Algorithm 1

    4 MCMC ALGORITHMS IN SPACES OF VARYING DIMENSION

    In variable-dimensional problems, MCMC methods must explore a discrete-continuous parameter space. In this section, wepresent an overview of such algorithms and discuss two special MCMC samplers that are used in combination with the proposedtBUS-SuS algorithm. We note that the methods discussed here are applicable to general model updating problems whenever thevariables of the different models have a nested structure (see, e.g.,13,14).

    4.1 General remarksIn across-model simulation, the standard algorithm to sample from the joint posterior in (4) is the reversible jump MCMC(RJMCMC) method9. The idea is to generate a Markov chain that is able to jump between models with parameter spaces ofdifferent dimension. If the current and proposed states have the same dimension, the proposal move explores different locationswithin the same parameter space. In this case, the so-called detailed balance condition is guaranteed by a standard MCMCsampler9. If the current and proposed dimensions are different, the detailed balance holds by defining a proposal move thatsatisfies a dimension matching condition. This is achieved by constructing a one-to-one deterministic transformation (jumpingfunction) ensuring that the image and the domain of the transformation have the same dimension. The acceptance probability inRJMCMC resembles the one of the classical Metropolis–Hastings algorithm, where the proposal distribution is decomposed intoa discrete density for the dimension, a continuous density for the parameters, and the Jacobian of the jumping transformation isalso taken into account (see38 for further details). The RJMCMC can suffer from poor sampling performance associated to thedefinition of the jumping function and the proposal distribution. The potential inefficiency of the method has motivated severaltuning step procedures (see, e.g.,44,45).

    Another class of algorithms are the saturated space approaches11,13,14,44 (also referred to as product or composite space) . Themain characteristic is that the parameter space is not particularized to a given dimension k, instead the parameters lie in a spacewhose dimension contains all dimensions of interest, say kmax. The joint posterior in the saturated space is14

    �pos(k,� | ỹ) =1Z ỹ

    �pr(k)�pr(�k | k)�pr(�∼k | k,�k)L(

    k,�k; ỹ)

    , (18)

  • 10 F. Uribe ET AL

    where � = [�k,�∼k] and the additional component is the so-called linking density or pseudo-prior �pr(�∼k | k,�k), where the�∼k denotes the parameters that are not used by the model k. This formulation allows us to employ standard MCMC proceduresto variable-dimensional problems. We now motivate these techniques from the viewpoint of nested models.

    4.2 Nested modelsFor problems involving nested models, the dimension change is related to the addition or deletion of a component in the parametervector, i.e., exclusion of a component is equivalent to setting a parameter to zero. This is the case of the KL expansion, where thevariable k has the effect of switching on and off coefficients in the series. Under the KL expansion (1), the likelihood in (4) isindependent of �i when i > k, and the prior of parameter vector � in the saturated space becomes independent of k. Therefore,the (saturated) joint posterior in (18) can be written as

    �pos(k,� | ỹ) ∝ �pr(k)�pr(�)L(

    k,�; ỹ)

    , (19)

    with �pr(�) denoting the prior of the parameters in the saturated space Θ ⊆ Rkmax . In the context of tBUS-SuS, the posteriordistribution (19) can be re-written by conditioning on the region in (15) and marginalizing over the auxiliary uniform randomvariable �,

    �pos(k,� | ỹ) ∝ �pr(k)�pr(�)

    1

    ∫0

    1(k,�, �)d� (20)

    where 1 denotes the indicator function, which is equal to one, if (k,�, �) ∈ , and zero otherwise. Due to the sequentialstructure of tBUS-SuS, we require MCMC algorithms that sample conditional densities on each intermediate domain j ,i.e., �(k,�, � | j) = �pr(k)�pr(�)1j (k,�, �). Particularly, we work in a saturated standard Gaussian space in which the KLcoefficients and auxiliary variable can be grouped to define the parameter vector # = [�, ��] (cf. Remark 2). As a result, theintermediate densities are defined as �(k,# | j), the discrete dimension space is K ⊆ Z[1,kmax], the saturated parameter space isΘ ⊆ Rkmax+1, and the full discrete-continuous space becomes � = [K,Θ].

    The saturated space is oftentimes high-dimensional when dealing with random field applications. In order to avoid convergencedeterioration with increasing k, dimension-independent MCMC algorithms are applied. These samplers are based on numericaldiscretizations to stochastic differential equations (SDE) that preserve the reference prior or posterior measures. A mainrequirement for an MCMC algorithm to be dimension-independent is that of being well-defined in function spaces. For instance,the preconditioned Crank–Nicolson (pCN) algorithm is derived in30 by discretizing a prior-preconditioned overdamped Langevindynamic SDE using a Crank–Nicolson scheme. Given the high-dimensional nature of random fields and the structure of theKL expansion, we focus on saturated space approaches for which the pCN algorithm can be utilized, namely: the step-wise andMetropolis-within-Gibbs algorithms.

    4.2.1 Step-wise samplerWe construct a step-wise algorithm based on the pCN proposal that only requires one acceptance probability step for both,dimension and parameters. The foundations and converge properties of this sampler follow from13,14. Consider a proposal densityq(k,#) = q1(k) q2(#) across the full state space �. This density takes into account the proposal for the dimension q1(k) and theproposal for the parameters in the saturated space q2(#). Under these assumptions the acceptance probability of the standardMetropolis–Hastings algorithm becomes14

    �(k,#; k⋆,#⋆) = min{

    1,q1(k⋆) q2(#

    ⋆) �(k⋆,#⋆ | ỹ)q1(k) q2(#) �(k,# | ỹ)

    }

    . (21)

    We employ the pCN proposal for the parameter vector # in the saturated space. In this case, the proposal q2(#) cancels out withthe saturated parameter prior in the target posterior (see, e.g.,30). Moreover, since k is a discrete variable, the proposal distributionfor the dimension q1(k) can be represented as a proposal matrix Q ∈ Rkmax×kmax . This is a right-stochastic matrix containing theprobabilities of the moves. Such probabilities can be assigned using a discrete probability law controlled by a spread parameter� ∈ [1, kmax], defining the width or jump lengths of the proposal. The resulting acceptance probability simplifies to

    �(k,#; k⋆,#⋆) = min

    {

    1,L(

    k⋆,#⋆; ỹ)

    L(

    k,#; ỹ) ⋅

    �pr(k⋆) Q(k⋆, k)�pr(k) Q(k, k⋆)

    }

    , (22)

  • F. Uribe ET AL 11

    which in the context of tBUS-SuS is equivalent to

    �(k,#; k⋆,#⋆) = min

    {

    1,1j(

    k⋆,#⋆)�pr(k⋆) Q(k⋆, k)�pr(k) Q(k, k⋆)

    }

    = 1j(

    k⋆,#⋆)

    min

    {

    1,�pr(k⋆) Q(k⋆, k)�pr(k) Q(k, k⋆)

    }

    ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟(∗)

    . (23)

    This Metropolis–Hastings implementation on the saturated space proceeds in a step-wise fashion as follows: in the first step, acandidate dimension k⋆ is proposed according to the matrix Q. In the second step, a candidate parameter #⋆ is proposed usingthe pCN proposal30. Afterwards, the candidate pair (k⋆,#⋆) is rejected or accepted jointly according to the probability (23).Algorithm 3 describes this procedure in detail. Note that we can alternatively implement the right term of (23), such that: (i) amodel k⋆ is proposed and accepted with probability (∗) in (23), (ii) �⋆ is drawn from a pCN proposal, and (iii) the pair (k⋆,�⋆)is accepted, if it lies in the domain j (using the indicator function).

    Remark 3. From the RJMCMC viewpoint, the jumps in the step-wise sampler take place between nested models differing indimension according to the proposal matrix Q. Because of the nested structure of the KL expansion, a natural jumping functionlinking the parameter spaces is the identity; this makes the determinant of the Jacobian of the jumping function in RJMCMCequal to one44.

    Algorithm 3 State update in the step-wise sampler for tBUS-SuS in the standard Gaussian space.1: Input: Let (k,#) be the current state of the Markov chain and � the pCN proposal scaling2: /* Step 1: sample the dimension */3: Draw a candidate dimension k⋆ ∼ Q(k, ∶)4: /* Step 2: sample the coefficients using pCN */5: Draw candidate parameters, #⋆ ←

    1 − �2 # + ��, where � ∼ (0, Ikmax+1)6: Compute the acceptance probability �k,# as per Eq. (23)7: Sample, Uk,# ∼ Unif(0, 1)8: if Uk,# < �k,# then9: knext ← k⋆ and #next ← #

    10: else11: knext ← k and #next ← #12: end if13: Output: (knext ,#next)

    4.2.2 Metropolis-within-Gibbs samplerThe Metropolis-within-Gibbs (MwG) algorithm46 updates the parameters # and the dimension k in an alternating manner. In thesaturated space, MwG explores the joint posterior using a Gibbs sampling version of the algorithm11, after including Metropolis–Hastings steps (details are provided in14). The algorithm can also be derived by writing the posterior (20) as the product of thedimension posterior and the dimension-specific parameter posterior9

    �pos(k,# | ỹ) = �(k | ỹ)�(# | k, ỹ). (24)

    The densities �(# | k, ỹ) may differ abruptly for small changes in the variable k and thus, the chain might always remain insome state. However, under the KL formulation (1), the coefficients and the dimension are independent a priori. This propertyalleviates potential poor mixing properties in MwG47.The idea of MwG is to sample each conditional density in (24) by applying two different steps. Recall that for tBUS-SuS,

    these densities need to be defined with respect to the intermediate levels j . In the first step, we fix the parameter # and samplethe conditional distribution �(k | ⋅) to propose a candidate dimension k⋆ using a standard Metropolis–Hastings sampler. In thesecond step, we fix the variable k (accepted in the first step), and sample the conditional distribution �(# | ⋅) to obtain a candidate

  • 12 F. Uribe ET AL

    parameter #⋆ using the pCN proposal30. The state update in MwG for tBUS-SuS is formally described in Algorithm 4. Observethat this approach requires two LSF (likelihood) evaluations for the generation of one state of the chain.

    Algorithm 4 State update in the MwG sampler for tBUS-SuS in the standard Gaussian space.1: Input: Let (k,#) be the current state of the Markov chain and � the pCN proposal scaling2: /* Step 1: for fixed #, sample the conditional distribution �(k | ỹ) */3: Sample the dimension, k⋆ ∼ Q(k, ∶)4: Compute the acceptance probability

    �k ← 1j(

    k⋆,#)

    min

    {

    1,�pr(k⋆) Q(k⋆, k)�pr(k) Q(k, k⋆)

    }

    5: Sample, Uk ∼ Unif(0, 1)6: if Uk < �k then7: knext ← k⋆

    8: else9: knext ← k10: end if11: /* Step 2: for fixed knext, sample the conditional distribution �(# | knext , ỹ) */12: Sample the parameters using pCN, #⋆ ←

    1 − �2 # + ��, where � ∼ (0, Ikmax+1)13: Compute the acceptance probability �# ← 1j

    (

    knext ,#⋆)

    14: if �# = 1 then15: #next ← #

    16: else17: #next ← #18: end if19: Output: (knext ,#next)

    Remark 4. We apply an adaptive version of the pCN algorithm used within the trans-dimensional Algorithms 3 and 4. The ideais to control the pCN scaling � to keep the acceptance rate around a near-optimal value through the simulation. The optimality isdefined in terms of the smallest error in the approximation of the model posterior. The adaptation procedure follows from42,41.

    5 NUMERICAL EXAMPLES

    We test the proposed method on two examples. The first problem allows to verify the approximations performed by tBUS-SuS,since a reference model posterior can be computed analytically. In the second example, a closed-form expression is not available.Thus, we compute several posterior dimension snapshots using a within-model BUS-SuS approach to verify the solution estimatedby tBUS-SuS. In all cases, the intermediate conditional probabilities are fixed at p0 = 0.1.

    5.1 1D cantilever beamThe first example is an inverse problem involving an ordinary differential equation (ODE) that describes the equilibrium of acantilever beam. In this case, the solution of the Bayesian inverse problem can be derived analytically26. The physical domain isthe interval D = [0, L], where L = 5 m is the length of the beam. The beam is subjected to a deterministic point load P = 20 kNat its free right end. The vertical displacements are constrained at the left edge of the beam. Let F (x) = (IE(x))−1 denote theflexibility of the beam (with x ∈ D), where E is the elastic modulus, and I the moment of inertia. The deflection response w(x),for a given flexibility and load configuration, is governed by the Euler–Bernoulli ODE:

    M(x) = −F −1(x)d2w(x)dx2

    ⇐⇒ w(x) = −P

    x

    ∫0

    s

    ∫0

    (L − t)F (t) dtds, (25)

  • F. Uribe ET AL 13

    here we use the fact that the bending moment of a cantilever beam is given byM(x) = (L − x)P .

    FIGURE 1 Cantilever beam problem: model description, true values and set of deflection measurements.

    The flexibility is modeled by a Gaussian random field prior (�pr ,�pr), with constant mean �pr = 1 × 10−4 (kN−1m−2)and covariance matrix �pr defined through a Matérn kernel with smoothing parameter � = 0.5, which yields the followingexponential autocovariance function, C(x, x′) = �2pr ⋅ exp

    (

    −|x − x′|∕l)

    , for x, x′ ∈ D. We set the prior standard deviation to�pr = 0.35�pr = 3.5 × 10−5 and perform a parameter study on the correlation length l. Note that for this type of covarianceoperator, the KL eigenvalue problem has an analytical solution24.

    The true flexibility field is a realization from the prior random field (with correlation length ltrue = 2 m). Partial observationsof the deflection field are generated by simulating the ODE (25) using this underlying realization. The data is collected at m = 10equally-spaced points of the domain D (Figure 1). This generates a measurement vector ỹ ∈ Rm×1 with additive and spatiallycorrelated error described by a Gaussian PDF, � ∼ (0,�obs), where the covariance structure of the error is constructed from anexponential kernel with standard deviation �obs = 1 × 10−3 and correlation length lobs = 1 m.

    For this example, closed-form expressions of the model evidence for each dimension k are available (see, e.g.,26), this allowsus to derive the model posterior analytically. We consider different correlation lengths to evaluate its influence on the modelposterior estimation. Each correlation length also defines different dimension priors as follows:

    ■ for l = 0.1, the truncation parameter is kmax = 1014 and the kmin = 17. This yields p = 6.166 × 10−3.

    ■ for l = 0.5, the truncation parameter is kmax = 204 and the kmin = 4. This yields p = 2.586 × 10−2.

    ■ for l = 0.9, the truncation parameter is kmax = 114 and the kmin = 2. This yields p = 5.118 × 10−2.

    The priors together with the analytical model posterior and evidence are shown in Figure 2. We employ these closed-formsolutions as reference to test the performance of the proposed tBUS-SuS approach.We first evaluate the performance of tBUS-SuS for different proposal scalings, this includes the parameter � of the jump

    proposal Q and the parameter � of the pCN proposal (the results are omitted here but are included as Supporting Information).The studies are performed by monitoring three posterior quantities of interest (QoIs), namely, the dimension parameter k, theflexibility random field at the middle of the beam Fmid, and the deflection random field at the tip of the beamwtip. This allows us tofind appropriate tuning of the tBUS-SuS algorithm. From these studies we found that: (i) sampling from the prior instead of usingthe matrixQ is beneficial, when the change from prior to posterior update is small (cf. Figure 2), and (ii) adapting the pCN scaling� such that a target acceptance rate value of � ∈ [0.2, 0.4] is maintained through the simulation is a good choice. For this example,we found that � = 0.4 keeps a good approximation error in both, the monitored random field QoIs and the dimension parameter.

    5.1.1 Computational efficiencyWe investigate the computational gain of using the across-model tBUS-SuS, compared to individual within-model runs of BUS-SuS. The objective is twofold, (i) identify the number of effective independent samples in the resulting set of posterior samples,

  • 14 F. Uribe ET AL

    0 10 20 30 40 50Dimension k

    0

    1

    2

    3

    Zỹ(k

    )×1025

    0 10 20 30 40 50Dimension k

    0.00

    0.02

    0.04

    0.06

    π̄p

    r(k

    )

    0 10 20 30 40 50Dimension k

    0.00

    0.02

    0.04

    0.06

    π̄p

    os(k|ỹ

    )

    0 20 400.00

    0.02

    0.04

    ℓ = 0.1 ℓ = 0.5 ℓ = 0.9

    FIGURE 2 Closed-form solutions for different correlation lengths in the prior flexibility random field. Left: model evidence.Center: model/dimension prior. Right: model/dimension posterior.

    and (ii) define a metric that is equivalent in within-model and across-model simulation approaches. The efficiency metrics aredefined in the appendix; they are expressed as the ratio between the effective number of independent samples and the number ofmodels calls. The results are computed for an average ofNsim = 100 independent simulations usingN = 5 × 103 samples perlevel in BUS-SuS andN = 104 samples per level in tBUS-SuS.For each correlation length l ∈ {0.1, 0.5, 0.9}, we use the reference variances of the QoIs (computed from the closed-form

    solution) for the estimation of the effective number of samples in (B9):

    ■ �2k ∈ {24161.22, 1250.80, 328.56} for the dimension,

    ■ �2Fmid ∈ {1.143 × 10−9, 7.484 × 10−10, 5.364 × 10−10} for the flexibility, and

    ■ �2wtip ∈ {8.446 × 10−7, 9.042 × 10−7, 9.159 × 10−7} for the deflection.

    Standard tBUS-SuS vs. Adaptive tBUS-SuS: we compare the approximation of the model posterior between standard tBUS-SuS(with pre-defined constant r , Algorithm 1) and adaptive tBUS-SuS (Algorithm 2). The adaptive version is a more general methodsince it is not always possible to define the constant r a priori. Moreover, it can produce similar or better results than the standardtBUS-SuS depending on the accuracy of the constant r in standard tBUS-SuS, as shown in Table 1.

    TABLE 1 Efficiency metric (B10) of the dimension parameter k for adaptive and standard tBUS-SuS.

    ladaptive: eff tBUS(k) standard: eff tBUS(k)MwG step-wise MwG step-wise

    0.1 8.01 × 10−2 4.24 × 10−2 4.00 × 10−2 1.79 × 10−2

    0.5 5.49 × 10−2 2.38 × 10−2 6.13 × 10−3 1.85 × 10−2

    0.9 3.59 × 10−2 2.41 × 10−2 4.03 × 10−3 2.08 × 10−3

    The results show that overall the efficiency of adaptive tBUS-SuS is larger or comparable to the efficiency provided by standardtBUS-SuS. This is related to the way we estimate r = ln(r) in tBUS-SuS. The constant is chosen as the maximum of 105independent log-likelihood evaluations; this value is also increased by 25% such that r ≈ Lmax,all. However, it is possible this wayof selecting r is too conservative and more levels in tBUS-SuS are required to estimate the solution. This is not an issue of themethod, but of course will lead to a reduced efficiency since more model evaluations are required. Therefore, we employ theadaptive tBUS-SuS algorithm for the solution of the Bayesian model choice problem in the remainder of the paper.

    Within-model BUS-SuS runs vs. tBUS-SuS: we compare the efficiencies in the estimation of the QoIs related to the randomfields (namely, the mean values of Fmid and wtip) produced by within-model runs of adaptive BUS-SuS and adaptive tBUS-SuS.Figure 3 shows the efficiencies computed with individual adaptive BUS-SuS runs (1st column) and adaptive tBUS-SuS using thetwo trans-dimensional MCMC algorithms (2nd and 3rd columns).

  • F. Uribe ET AL 15

    BUS-SuS tBUS-SuS+MwG tBUS-SuS+step-wise

    0 10 20 30 40 50Dimension, k

    1.40

    1.75

    2.10

    2.45

    2.80N̄

    (k)

    call,B

    US

    ×106

    0 10 20 30 40 50Dimension k

    0

    1

    2

    3

    4

    5

    N̄ca

    ll,t

    BU

    S·π̄

    pos

    (k|ỹ

    ) ×105

    0 10 20 30 40 50Dimension k

    0

    1

    2

    3

    4

    5

    N̄ca

    ll,t

    BU

    S·π̄

    pos

    (k|ỹ

    ) ×105

    0 10 20 30 40 50Dimension k

    10−4

    10−3

    10−2

    10−1

    100

    eff(k

    )B

    US(F

    mid

    )

    0 10 20 30 40 50Dimension k

    10−4

    10−3

    10−2

    10−1

    100

    eff(k

    )tB

    US(F

    mid

    )

    0 10 20 30 40 50Dimension k

    10−4

    10−3

    10−2

    10−1

    100

    eff(k

    )tB

    US(F

    mid

    )0 10 20 30 40 50

    Dimension k

    10−3

    10−2

    10−1

    100

    eff(k

    )B

    US(w

    tip)

    0 10 20 30 40 50Dimension k

    10−3

    10−2

    10−1

    100

    eff(k

    )tB

    US(w

    tip)

    0 10 20 30 40 50Dimension k

    10−3

    10−2

    10−1

    100

    eff(k

    )tB

    US(w

    tip)

    20 40

    101

    102

    103

    104

    ℓ = 0.1 ℓ = 0.5 ℓ = 0.9

    FIGURE 3 Comparison between within-model BUS-SuS and across-model tBUS-SuS: number of model calls and efficiencymetrics (B10) for different correlation lengths and random field QoIs �Fmid and �wtip . Adaptive BUS-SuS (1st col), adaptive tBUSwith MwG sampler (2nd col), and adaptive tBUS with step-wise sampler (3rd col).

    In the first row, we plot the total number of calls per dimension. In the fixed-dimensional BUS-SuS, the cost increases with thedimension since more intermediate levels are required to reach the posterior. Conversely, the cost is a single value for all thedimensions in across-model tBUS-SuS, and thus we need to distribute it according to the model posterior. Note in the first row ofFigure 3 that the total number of calls in within-model BUS-SuS is larger than tBUS-SuS, even when using a larger number ofsamples per level in tBUS-SuS; also tBUS-SuS with MwG almost doubles the cost compared to tBUS-SuS with the step-wisesampler. Moreover, in the second and third rows of Figure 3, the efficiencies of the random field QoIs are shown per dimension,since different KL truncation orders yield different random field approximations. We remark that the number of effective samplesobtained with MwG is larger than using the step-wise method. Nevertheless, the efficiencies in both approaches are similar.This is because the computational cost normalizes the number of effective number of samples in the efficiency metric (B10),and MwG has almost twice the cost of the step-wise sampler. Finally, we clearly see the advantage of employing across-modelsimulation algorithms for the solution of Bayesian model choice problems, as compared to single model runs.

    5.1.2 Approximation of the posterior for the dimension and the random fieldsWe employ the adaptive version of tBUS-SuS for the estimation of the model and random field posteriors. The approximatedmodel posteriors are shown in Figure 4 using the MwG and step-wise samplers. In this case, we plot the mean and standarddeviation bounds of the approximation. The shape of the reference model posterior is well-captured for all investigated correlationlength cases. The variability of the approximation using the MwG sampler is smaller than the one computed by the step-wise

  • 16 F. Uribe ET AL

    algorithm. The differences are larger for smaller correlation lengths. The tBUS-SuS simulations require in average Nlv = 5intermediate levels to reach the posterior for all the investigated correlation lengths.

    MwG

    10 20 30 40 50Dimension k

    0.00

    0.25

    0.50

    0.75

    1.00

    π̄p

    os(k|ỹ

    )

    ×10−2tBUS (µk ± σk)Reference

    10 20 30 40 50Dimension k

    0.00

    0.01

    0.02

    0.03

    π̄p

    os(k|ỹ

    )

    tBUS (µk ± σk)Reference

    10 20 30 40 50Dimension k

    0.00

    0.02

    0.04

    0.06

    π̄p

    os(k|ỹ

    )

    tBUS (µk ± σk)Reference

    step-w

    ise

    10 20 30 40 50Dimension k

    0.00

    0.25

    0.50

    0.75

    1.00

    π̄p

    os(k|ỹ

    )

    ×10−2tBUS (µk ± σk)Reference

    10 20 30 40 50Dimension k

    0.00

    0.01

    0.02

    0.03π̄

    pos

    (k|ỹ

    )

    tBUS (µk ± σk)Reference

    10 20 30 40 50Dimension k

    0.00

    0.02

    0.04

    0.06

    π̄p

    os(k|ỹ

    )

    tBUS (µk ± σk)Reference

    0 20 400.00

    0.02

    0.04

    ℓ = 0.1 ℓ = 0.5 ℓ = 0.9

    FIGURE 4 Estimation of the model posterior using adaptive tBUS-SuS for different correlation lengths in the prior flexibilityrandom field: MwG (1st row); step-wise (2nd row).

    0 1 2 3 4 5x [m]

    0.0

    0.3

    0.6

    0.9

    1.2

    1.5

    1.8

    Fle

    xib

    ility

    [1/(

    kN·m

    2 )] ×10−4 ` = 0.1

    0 1 2 3 4 5x [m]

    0.0

    0.3

    0.6

    0.9

    1.2

    1.5

    1.8

    Fle

    xib

    ility

    [1/(

    kN·m

    2 )] ×10−4 ` = 0.5

    0 1 2 3 4 5x [m]

    0.0

    0.3

    0.6

    0.9

    1.2

    1.5

    1.8

    Fle

    xib

    ility

    [1/(

    kN·m

    2 )] ×10−4 ` = 0.9

    0 1 2 3 4 5x [m]

    −1.6−1.4−1.2−1.0−0.8−0.6−0.4−0.2

    0.0

    Diff

    eren

    tial

    defl

    ecti

    on[m

    ] ×10−2 ` = 0.1

    0 1 2 3 4 5x [m]

    −1.6−1.4−1.2−1.0−0.8−0.6−0.4−0.2

    0.0

    Diff

    eren

    tial

    defl

    ecti

    on[m

    ] ×10−2 ` = 0.5

    0 1 2 3 4 5x [m]

    −1.6−1.4−1.2−1.0−0.8−0.6−0.4−0.2

    0.0

    Diff

    eren

    tial

    defl

    ecti

    on[m

    ] ×10−2 ` = 0.9

    0 2 4

    0.00005

    0.00010

    0.00015

    Model mixing Model choice True 95% CI Ref.

    FIGURE 5 Posterior flexibility and deflection random fields for different correlation lengths in the prior flexibility field: estimatedmean and 95% CI of the best model (model choice) and the averaging of models (model mixing) using adaptive tBUS-SuS withMwG. The reference 95% CI is highlighted in gray.

  • F. Uribe ET AL 17

    We also estimate the posterior flexibility and deflection random fields for different correlation lengths. We use as referencethe closed-form expressions of the posterior random fields26. Figure 5 shows the model choice and model mixing solutions interms of the posterior mean and posterior 95% credible intervals (CI); this CI is defined as the region between the 0.025 and0.975 quantiles of the posterior. For the deflection response field, we compute the difference between the prior mean and the 95%posterior CIs (called differential deflection), in order to differentiate the approximations. The model choice estimate is given bythe truncation order that yields the maximum model posterior (Figure 4), in this case kbest ∈ {10, 3, 3} for the correlation lengthsl ∈ {0.1, 0.5, 0.9}, respectively. The model mixing estimate takes into account the whole dimension spectrum (up to kmax). Notethat the reference CIs agree closely with the model mixing estimates since we use all the KL expansions associated to the modelposterior for the random field representation. For the larger correlation length, the model choice solution fails to capture theassumed true flexibility in different intervals of the domain.We conclude this subsection by illustrating the evolution of the samples in tBUS-SuS and its relation to the model posterior.

    The results are shown for the prior correlation length l = 0.5. Figure 6 shows the prior, second intermediate level, and posteriorsamples obtained from a single simulation of adaptive tBUS-SuS with MwG. The process of sequentially approximating theposterior is shown by the distribution of the samples, starting from the prior and narrowing down to the target posterior. Fork = 1, we plot the samples that contribute to the model posterior at the first dimension, i.e., the one-dimensional KL coefficientagainst the auxiliary standard uniform random variable. The tBUS-SuS simulation requiredNlv = 5 levels to reach the posteriorregion (highlighted in gray), � = [14.74, 5.04, 2.67, 0.31, 0]. Note that the value of the model posterior at k = 1 is almost zero(cf. Figure 4), and hence the amount of samples is considerably reduced as the algorithm evolves from the prior to the posteriormeasure. Moreover, the maximum log-likelihood at dimension kmax = 204 is c̄204 = 68.170, and at dimension k = 1 it isc̄1 = 56.529. Due to the nested structure of the KL expansion, the constant r = ln(r) in the LSF (16) is equal to the maximumlog-likelihood at the largest dimension. In this case, the scaling r is significantly larger than the value of the covering constant atdimension 1. Thus, we observe that the posterior samples at k = 1 are located in a small region of the two-dimensional parameterspace. This occurs mainly at lower dimensions, since there exist significant differences between lower- and higher-dimensionallikelihood values. We remark that there is an associated reduction of the efficiency, but this does not prevent the algorithm fromcomputing accurate posterior samples. Moreover, with increasing k the values of c̄k are closer to r and the efficiency loss becomesnegligible. For k = 2, Figure 6 plots the components of the two-dimensional KL coefficients; we also show the contours of thelog-likelihood function with fixed dimension k = 2. In this case, the reduction in the number of samples when updating fromprior to posterior is smaller than at dimension k = 1. Note that the value of the model posterior at k = 2 is larger than zero, andthe difference in the probability mass between prior and posterior at k = 2 is less substantial (cf. Figure 4).

    −3 −2 −1 0 1 2 3 4θ

    (k=1)1

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    υ(k

    =1)

    k = 1

    Prior1st level

    Posterior

    −2 −1 00.0101.673

    3.337

    5.000 ×10−6

    −2 −1 0 1θ

    (k=2)1

    −1

    0

    1

    2

    θ(k

    =2)

    2

    k = 2

    Prior1st levelPosterior

    FIGURE 6 tBUS-SuS samples of the KL coefficients at dimensions k = 1 (left) and k = 2 (right). For k = 1, the posteriorregion is highlighted in gray. For k = 2, the contours of the log-likelihood function are also plotted.

  • 18 F. Uribe ET AL

    5.2 2D groundwater flowWe consider inference of the hydraulic conductivity field of an aquifer using observations of the hydraulic head measured atspecific boreholes (see, e.g.,48). We define an aquifer on the square domain D = [0, 1] × [0, 1] km2 with boundary )D. Spatialcoordinates are denoted by x = [x1, x2] ∈ D. The steady-state Fick’s second law of diffusion is used to describe the spatialvariation of the hydraulic head inside the aquifer. Hence, for a given hydraulic conductivity of the soil �(x, !) and sink or sourceterms J (x), the hydraulic head u(x) follows the elliptic PDE

    −∇ ⋅ [�(x, !) ∇u(x)] = J (x), (26)

    with Dirichlet boundary condition, u(x) = 0 for x ∈ )D. The source terms are defined as the superposition of nine weightedGaussian plumes with standard width �J = 1 × 10−3 km. The plumes have equal and unitary strengths, and are centered atlocations �J = [0.25 ⋅ i, 0.25 ⋅ j] with i, j = 1,… , 3, that is

    J (x) =9∑

    i=1 (x; �(i)J , �

    2J I2). (27)

    0.0 0.2 0.4 0.6 0.8 1.0x1

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    x2

    κ(x)

    0.0

    1.1

    2.3

    3.4

    4.5

    5.6

    6.8

    0.0 0.2 0.4 0.6 0.8 1.0x1

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    x2

    u(x)

    0.0

    0.6

    1.2

    1.8

    2.4

    3.0

    3.7

    0.0 0.2 0.4 0.6 0.8 1.0x1

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    x2

    J(x)

    0.0

    0.3

    0.5

    0.8

    1.1

    1.3

    1.6×102

    1 2 3 4 5 6 7 8 9 10 11 12 13Sensor index

    0.0

    0.5

    1.0

    1.5

    2.0

    2.5

    3.0

    3.5

    Pre

    ssu

    reh

    ead

    [m]

    DataTrue

    FIGURE 7 Groundwater flow problem: true hydraulic conductivity, true hydrostatic pressure with measurement locations, sourceterm, measured and true (noise-free) hydraulic head.

    We employ the finite element method to solve the PDE (26) and 2 × 802 = 1.28 × 104 three-node triangular elements areused for the discretization. The prior hydraulic conductivity field is modeled as a log-normal random field, �(x) ∶= exp(�′(x)).The underlying Gaussian field �′(x) has mean zero and standard deviation ��′ = 1 km/day. The covariance operator of theGaussian field is constructed from a Matérn kernel32, with smoothing parameter � = 1.5. The solution of the eigenvalue problemis computed with the Nyström method using 100 Gauss–Legendre points in each direction37.

  • F. Uribe ET AL 19

    The true conductivity �(x) is a realization of a random field with characteristics similar to the prior. In this case, we set theMatérn kernel parameters as �true = 2.0 and ltrue = 0.1. The truncation of the KL expansion used to generate this realization is312, which captures 99% of the prior variance. The hydraulic head observations ỹ are obtained at m = 12 sensor locations. Theyare computed from a PDE evaluation of the true conductivity field using a finer finite element mesh. The measurement error ismodeled as additive and mutually independent from the random field. It is defined by a joint Gaussian PDF with mean zero andnoise covariance matrix �obs = �2obsIm. The variance of the measurement noise is prescribed such that the observations have asignal-to-noise ratio V

    [

    ỹ]

    ∕�2obs = 120. The true hydraulic conductivity and hydraulic head fields, together with the source termsand the synthetic data are shown in Figure 7.

    In this example, we evaluate the posterior for different correlation lengths l ∈ {0.1, 0.2, 0.3}. Each correlation length definesa dimension prior as follows:

    ■ for l = 0.1, the truncation parameter is kmax = 512 and the kmin = 16. This yields p = 6.2914 × 10−3.

    ■ for l = 0.2, the truncation parameter is kmax = 138 and the kmin = 5. This yields p = 1.9398 × 10−2.

    ■ for l = 0.3, the truncation parameter is kmax = 65 and the kmin = 3. This yields p = 2.9396 × 10−2.

    We employ the same proposal scaling settings investigated in the previous example. Moreover, we use a proposal Q to samplek, in addition to sample from the prior. The jump proposal matrix is constructed from a discrete triangular distribution with jumplength � = 0.25 ⋅ kmax, which appears to be a good choice for both MCMC algorithms (cf. Supporting Information).

    5.2.1 Approximation of the posterior for the dimension and the random fieldsThe adaptive tBUS-SuS is used to estimate the posterior of the dimension and the random field. The results are shown foran average of Nsim = 60 independent simulation runs using N = 1.5 × 104 samples per level. To compare the tBUS-SuSapproximations, we compute the reference solution from model evidences estimated by within-model runs of adaptive BUS-SuSusingN = 5 × 103 samples per level and averaged overNsim = 90 simulations.For this example, it is not feasible to compute the full reference solution for the posterior of the dimension by means of

    within-model simulation algorithms. Thus, we estimate the reference at 6 dimension snapshots for each correlation length:ksnap ∈ {40, 45, 50, 55, 60, 70} for l = 0.1, ksnap ∈ {20, 30, 35, 40, 50, 60} for l = 0.2, and ksnap ∈ {20, 25, 30, 32, 35, 40} forl = 0.3. Since the reference solutions are given in terms of the model evidencesZỹ(ksnap), we transform them to model posteriorsusing (6). This requires the knowledge of the evidence of all model classes, which is not available in this case. Instead, we applythe normalization

    �pos(ksnap | ỹ) =�pr(ksnap)Zỹ(ksnap)∑

    k∈ksnap�pr(k)Zỹ(k)

    k∈ksnap

    �̂pos(k | ỹ), (28)

    such that the sum of the reference �pos(ksnap | ỹ) match the sum of the estimated model posteriors �̂pos(ksnap | ỹ) at the givensnapshots. Using this approach, the reference solution is limited to the tBUS-SuS solution. Nevertheless, it allow us to check thecorrect shape of the dimension posterior.

    The dimension posteriors estimated by adaptive tBUS-SuS with MwG are shown in Figure 8. We plot the mean and standarddeviation bounds of the approximations. The solutions are computed when the dimension is sampled from the prior (1st row)and when it is sampled from the proposal Q (2nd row). Both alternatives yield comparable results since there are no significantdifferences between the posterior approximations. In general, we observe an increase in the variability around the MAP estimate.Note also that the dimension posteriors have several modes which can be related to the nonuniform distribution of the measurementlocations, together with the symmetry of the KL eigenfunctions. For instance, when using l = 0.3 there is a jump in the valuesof the probability mass after the 15-th dimension, every 5 dimensions until the MAP estimate. Furthermore, as an indicative ofthe algorithm performance, tBUS-SuS requires on averageNlv = 12 intermediate levels and the proposal scaling � changes from0.75 in the first level to 0.09 in the last level (for the correlation length l = 0.1 and sampling from the prior).The approximated dimension posteriors using adaptive tBUS-SuS with the step-wise sampler are shown in Figure 9. In this

    example, the step-wise sampler is more sensitive to the selection of the dimension proposal than MwG. We observed that thedimension prior is not a good proposal choice to sample the dimensions (the results are omitted). The main issue is that theresulting posterior samples are highly correlated since the values of the scaling � at the last level of the simulation are in the orderof 10−4. Therefore, instead of showing a comparison between the dimension proposal schemes, we employ the proposal matrixQ with 2 different settings: usingN samples per level, and using 2N samples per level. In both MCMC algorithms, increasing

  • 20 F. Uribe ET AL

    kfrom

    prior

    30 60 90 120Dimension k

    0.000.250.500.751.001.251.50

    π̄p

    os(k|ỹ

    )

    ×10−2Ref. (µk ± σk)tBUS (µk ± σk)

    20 40 60 80 100Dimension k

    0.00

    0.75

    1.50

    2.25

    3.00

    π̄p

    os(k|ỹ

    )

    ×10−2Ref. (µk ± σk)tBUS (µk ± σk)

    10 20 30 40 50 60Dimension k

    0

    1

    2

    3

    4

    π̄p

    os(k|ỹ

    )

    ×10−2Ref. (µk ± σk)tBUS (µk ± σk)

    kfrom

    Q

    30 60 90 120Dimension k

    0.000.250.500.751.001.251.50

    π̄p

    os(k|ỹ

    )

    ×10−2Ref. (µk ± σk)tBUS (µk ± σk)

    20 40 60 80 100Dimension k

    0.00

    0.75

    1.50

    2.25

    3.00

    π̄p

    os(k|ỹ

    )

    ×10−2Ref. (µk ± σk)tBUS (µk ± σk)

    10 20 30 40 50 60Dimension k

    0

    1

    2

    3

    4

    π̄p

    os(k|ỹ

    )

    ×10−2Ref. (µk ± σk)tBUS (µk ± σk)

    20 40

    10−5

    10−4

    10−3

    10−2

    ℓ = 0.1 ℓ = 0.2 ℓ = 0.3

    FIGURE 8 Diffusion example: model posterior using adaptive tBUS-SuS withMwG sampler, sampling k from the prior (1strow) and from proposal Q (2nd row).

    Nsamples

    30 60 90 120Dimension k

    0.000.250.500.751.001.251.50

    π̄p

    os(k|ỹ

    )

    ×10−2Ref. (µk ± σk)tBUS (µk ± σk)

    20 40 60 80 100Dimension k

    0.00

    0.75

    1.50

    2.25

    3.00

    π̄p

    os(k|ỹ

    )

    ×10−2Ref. (µk ± σk)tBUS (µk ± σk)

    10 20 30 40 50 60Dimension k

    0

    1

    2

    3

    4

    π̄p

    os(k|ỹ

    )

    ×10−2Ref. (µk ± σk)tBUS (µk ± σk)

    2Nsamples

    30 60 90 120Dimension k

    0.000.250.500.751.001.251.50

    π̄p

    os(k|ỹ

    )

    ×10−2Ref. (µk ± σk)tBUS (µk ± σk)

    20 40 60 80 100Dimension k

    0.00

    0.75

    1.50

    2.25

    3.00

    π̄p

    os(k|ỹ

    )

    ×10−2Ref. (µk ± σk)tBUS (µk ± σk)

    10 20 30 40 50 60Dimension k

    0

    1

    2

    3

    4

    π̄p

    os(k|ỹ

    )

    ×10−2Ref. (µk ± σk)tBUS (µk ± σk)

    20 40

    10−5

    10−4

    10−3

    10−2

    ℓ = 0.1 ℓ = 0.2 ℓ = 0.3

    FIGURE 9 Diffusion example: model posterior using adaptive tBUS-SuS with step-wise sampler and sampling k from proposalQ. UsingN samples per level (1st row), using 2N samples per level (2nd row).

    the number of samples per level considerably improves the variability of the estimates. Particularly, using 2N samples per levelin the step-wise sampler yields comparable results to those of MwG since we are evaluating the PDE model approximately thesame number of times. The resulting dimension posteriors are able to capture the trend of the reference solutions. In both cases,the estimation is very close to the reference mean value.

  • F. Uribe ET AL 21

    Model choice Model mixing

    l=0.1

    0.0 0.2 0.4 0.6 0.8 1.0x1

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    x2

    µκ(x)

    0.1

    0.4

    0.6

    0.9

    1.1

    1.4

    1.6

    0.0 0.2 0.4 0.6 0.8 1.0x1

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    x2

    σκ(x)

    0.0

    0.3

    0.7

    1.0

    1.3

    1.7

    2.0

    0.0 0.2 0.4 0.6 0.8 1.0x1

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    x2

    µκ(x)

    0.1

    0.4

    0.7

    0.9

    1.1

    1.4

    1.6

    0.0 0.2 0.4 0.6 0.8 1.0x1

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    x2

    σκ(x)

    0.0

    0.7

    1.4

    2.0

    2.7

    3.4

    4.1

    l=0.3

    0.0 0.2 0.4 0.6 0.8 1.0x1

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    x2

    µκ(x)

    0.1

    0.4

    0.6

    0.9

    1.1

    1.4

    1.6

    0.0 0.2 0.4 0.6 0.8 1.0x1

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    x2

    σκ(x)

    0.0

    0.3

    0.7

    1.0

    1.3

    1.7

    2.0

    0.0 0.2 0.4 0.6 0.8 1.0x1

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    x2

    µκ(x)

    0.1

    0.4

    0.7

    0.9

    1.1

    1.4

    1.6

    0.0 0.2 0.4 0.6 0.8 1.0x1

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    x2

    σκ(x)

    0.0

    0.7

    1.4

    2.0

    2.7

    3.4

    4.1

    FIGURE 10 Diffusion example: posterior mean (1st and 3rd cols) and standard deviation (2nd and 4th cols) of the hydraulicconductivity random field using adaptive tBUS-SuS with the step-wise sampler for different prior correlation lengths (rows) andemploying model choice or model mixing.

    Model choice Model mixing

    l=0.1

    0.0 0.2 0.4 0.6 0.8 1.0x1

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    x2

    µu(x)

    0.0

    0.6

    1.2

    1.8

    2.4

    3.0

    3.6

    0.0 0.2 0.4 0.6 0.8 1.0x1

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    x2

    σu(x)

    0.0

    0.2

    0.3

    0.5

    0.7

    0.8

    1.0

    0.0 0.2 0.4 0.6 0.8 1.0x1

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    x2

    µu(x)

    0.0

    0.6

    1.2

    1.8

    2.4

    3.0

    3.6

    0.0 0.2 0.4 0.6 0.8 1.0x1

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    x2

    σu(x)

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    1.1

    l=0.3

    0.0 0.2 0.4 0.6 0.8 1.0x1

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    x2

    µu(x)

    0.0

    0.6

    1.2

    1.8

    2.4

    3.0

    3.6

    0.0 0.2 0.4 0.6 0.8 1.0x1

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    x2

    σu(x)

    0.0

    0.2

    0.3

    0.5

    0.7

    0.8

    1.0

    0.0 0.2 0.4 0.6 0.8 1.0x1

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    x2

    µu(x)

    0.0

    0.6

    1.2

    1.8

    2.4

    3.0

    3.6

    0.0 0.2 0.4 0.6 0.8 1.0x1

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    x2

    σu(x)

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    1.1

    FIGURE 11 Diffusion example: posterior mean (1st and 3rd cols) and standard deviation (2nd and 4th cols) of the hydrostaticpressure random field using adaptive tBUS-SuS with the step-wise sampler for different prior correlation lengths (rows) andemploying model choice or model mixing.

    Finally, we estimate the posterior hydraulic conductivity and hydraulic head random fields for the investigated correlationlengths. Figure 10 shows the model choice and model mixing solutions in terms of the posterior mean and standard deviation ofthe hydraulic conductivity (only for l ∈ {0.1, 0.3}). The model choice estimate is given by the truncation order that yields themaximum model posterior, in this case kbest ∈ {43, 43, 33} for each investigated correlation length (Figure 9, 2nd row). The

  • 22 F. Uribe ET AL

    model mixing estimate takes into account the whole dimension spectrum. Note that the values in the posterior mean are smallerthan those of the assumed truth. In this case, most of the measurements are concentrated in the lower left corner of the aquifer. Inthis area, the values of the true hydraulic conductivity are very small, and are influencing the posterior solution. Nevertheless, thestatistics are revealing the locations of lower and higher permeability values. The field modeled with the smaller correlationlength is able to represent the small fluctuations better than those with larger correlation lengths, for which the resulting randomfield realizations are smoother. In contrast to the hydraulic conductivity field, the differences in the model choice and mixingsolutions for the hydraulic head random field are minimal (Figure 11). This quantity is computed by integrating the PDE modelwhich can be seen as an averaging operation that reduces the effect of the spatial variability, similar to Example 1.

    6 DISCUSSION OF RESULTS

    The proposed method is an extension of the BUS formulation for the solution of Bayesian inverse problems where the dimensionof the parameter space is variable. Such type of inferences are common in random field updating tasks, since the optimal numberof terms in the random field series expansion is unknown a priori, and hence, it can be modeled probabilistically.The main findings of this contribution in terms of the random field modeling are: (i) If one employs a uniform prior for the

    dimension, the values of the model evidence define the model posterior themselves. In this case, visits to models with highdimension have the same probability of occurrence, e.g., in the beam example for the correlation length l = 0.5, a model with 10terms in the KL expansion is evaluated as many times as the model with 50 terms, despite the fact that the quality of both modelsis essentially the same (Figure 2). Thus, the inclusion of a prior that penalizes models with increased number of parameters isbeneficial in the context of the KL expansion. We use the proposed dimension prior to achieve a trade-off between the dimensionswith large model evidence and the computational cost, in order to reduce the evaluation of unnecessary high-dimensional models.(ii) In the context of random fields, our examples show that the model choice solution is not