Making sense of global sensitivity analyses

se

ng

edlobaasethteraevenalnalsyss inimp

uncertat onlyt buttion (Prth an06; Hi. SA isactioncludeion toin PE,e each

(OAT) method is a computationally frugal method that changes one

nd to show more. We re-interpret-based measuresocal methods ared interpretations

causes the differences. Finally, we propose an alternative approx-

Contents lists available at SciVerse ScienceDirect

.el

Computers &

Computers & Geosciences () system. For CO2 storage, there are potential risks associated withE-mail address: [email protected] (H.M. Wainwright).parameter at a time from randomly generated reference parameter imation method to efciently compute the rst-order sensitivityindex (i.e., Sobol index).

We demonstrate our approach on a problem of pressure propa-gation induced by uid injection and leakage in the CO2 storage

0098-3004/$ - see front matter & 2013 Elsevier Ltd. All rights reserved.http://dx.doi.org/10.1016/j.cageo.2013.06.006

n Corresponding author. Tel.: +1 510 495 2038.Pleas//dx.the presence of nonlinearity and interactions among the parameterscompared to the local sensitivity analysis. The Morris one-at-a-time

obtained from each method. Second, we compare the interpreta-tion and computational cost of the SA methods and discuss whatPE and UA.SA started initially with the derivative-based local sensitivity

method (Cacuci, 2003), but the global sensitivity analysis (GSA)methods (e.g., Morris 1991; Sobol, 2001; and Saltelli et al., 2008)have been increasingly applied in recent years. GSAs explore theparameter space so that they provide robust sensitivity measures in

The objective of this study is to improve theGSA through comparing the three SA methods, ausages of GSAs for better system understandingthe Sobol/Saltelli sensitivity indices as differenceso that direct comparisons with the Morris and lpossible. We also show additional information andata worth analysis, and (3) to reduce the number of parameters tobe varied or estimated and hence to reduce computational burden in

parameter importance, even though GSAs can provide additionalinformation for improving the system understanding.

understanding ofHydrogeological modeling underterization to predictionrequires notical models of ow and transporanalyses, including parameter estima(UA), sensitivity analysis (SA), data wodesign (e.g., Van Griensven et al., 20Tang et al., 2007, Finsterle et al., 2012)process, since there is a strong intercomponents. The objectives of SA ingiven dataset has sufcient informatgiven uncertainty of other parametersallocate limited resources to estimate cite this article as: Wainwright, H.doi.org/10.1016/j.cageo.2013.06.006iintyfrom site charac-the numerical or analy-also various statisticalE), uncertainty analysisalysis, and experimentalll and Tiedeman, 2007;a key component in thisbetween SA and other(1) to check whether adetermine a parameter(2) to determine how toparameter as a part of

method provides the variance-based sensitivity indices that quan-tify the relative contribution of each parameter to the uncertainty inoutputs.

GSAs, however, can be computationally intensive, since theyrequire sampling parameter sets. Although several approximationmethods have been developed to reduce the computational cost, suchmethods introduce additional model assumptions and response sur-face ttings, which are not universally applicable (Marrel et al., 2009;Oladyshkin et al., 2012). There is also an argument that local sensitivityanalysis is sufcient, and GSAs do not provide additional informationto justify the large computational cost (e.g., Foglia et al., 2009). Sucharguments could be attributed to the fact that the value of GSAs hasnot been fully appreciated. The use of GSA is often limited to ranking1. Introduction sets, and computes the difference in the outputs. The Sobol/SaltelliMaking sense of global sensitivity analy

Haruko M. Wainwright n, Stefan Finsterle, Yoojin JuLawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley 94720, CA, USA

a r t i c l e i n f o

Article history:Received 13 March 2013Received in revised form10 June 2013Accepted 13 June 2013

Keywords:Global sensitivity analysisMorris OAT methodSobol indexVariance-based sensitivity indices

a b s t r a c t

This study presents improvlocal sensitivity and two gre-interpret the variance-bmeasures. It suggests thatparameter including its in/Saltelli method. We also dindex, using one-dimensiowe conduct a sensitivity areservoiraquitardaquifermeter importance rankingadditional information to

journal homepage: wwwM., et al., Making sense of gs

, Quanlin Zhou, Jens T. Birkholzer

understanding of sensitivity analysis methods through a comparison of thel sensitivity analysis methods: the Morris and Sobol/Saltelli methods. Wed sensitivity indices from the Sobol/Saltelli method as difference-basede difference-based local and Morris methods provide the effect of eachction with others, similar to the total sensitivity index from the Sobollop an alternative approximation method to efciently compute the Soboltting of system responses from a Monte-Carlo simulation. For illustration,ysis of pressure propagation induced by uid injection and leakage in atem. The results show that the three methods provide consistent para-this system. Our study also reveals that the three methods can providerove system understanding.

& 2013 Elsevier Ltd. All rights reserved.

sevier.com/locate/cageo

Geoscienceslobal sensitivity analyses. Computers & Geosciences (2013), http:

parameter in random order to compute the elementary effect (EE)of x

H.M. Wainwright et al. / Computers & Geosciences () 2i

EEi 1y

f x1n;; xin ;; xknf x1n;; xkn

2

where {xin} is the randomly selected parameter set, and y is theoutput-scaling factor. To compute EEi for k parameters, we need(k+1) simulations (called one path) in the same way as that of thelocal sensitivity method. By having multiple paths, we have anensemble of EEs for each parameter. The total number of simula-leakage of injected CO2 and resident brine. Since the pressureperturbations travel much faster and farther than the CO2 plume,this pressure disturbance can be used to detect leaky pathwaysbefore CO2 actually reaches the shallow aquifer. Jung et al. (in press)conducted a GSA to evaluate the sensitivity of leakages signals tomodel parameters, using the Morris method. In this study, weconduct a more detailed analysis, using the same semi-analyticalmodel developed by Cihan et al. (2011).

2. Methodology

In this section, we discuss three SA methods. An additionalinterpretation is provided for the Sobol/Saltelli method, so thatsimilarities and differences among the three methods can bebetter compared. In all three methods, we consider a set of kparameters denoted by {xi| i1,,k} and a scalar output y f({xi}),where f represents a hydrogeological forward model.

2.1. Local sensitivity method

The local sensitivity index for Parameter i is dened as thescaled partial derivative of y with respect to xi. It is computed bychanging each parameter by a small increment xi from thereference parameter values {xin} and computing the difference in y

Slocali x;iy

yxi

xin

x;iy

f x1n;; xin xi;; xknf x1n;; xknxi

1where x,i is the parameter-scaling factor, and y is the output-scaling factor. x,i can be the standard deviation or range of theparameter that represents the parameter variability or uncertainty.It can also be thought of as the amount by which the parameterwould be changed in the sensitivity analysis, where the parameteris perturbed from its base-case value by an amount consideredreasonable to examine its impact on the model output. y can beviewed as a measure of the change in the output that one wouldconsider to be signicant or representative. y is especially impor-tant when we combine multiple observations of different scales tocreate an integral measure of parameter inuence or outputsensitivity, such as the analysis by Finsterle and Pruess (1995)and the composite scaled sensitivity index in Hill and Tiedeman(2007). The local sensitivity method requires (k+1) simulations, i.e., a simulation for the reference parameter set plus k simulationsfor small-increment changes in the k parameters.

2.2. Morris sensitivity method

The Morris one-at-a-time (OAT) method (Morris, 1991) can beconsidered as an extension of the local sensitivity method. Eachparameter range is scaled to the unit interval [0, 1] and partitionedinto (p1) equally-sized intervals. The reference value of eachparameter is selected randomly from the set {0, 1/(p1), 2/(p1),, 1}. The xed increment p/{2(p1)} is added to eachtions is r(k+1), where r is the number of paths.

Please cite this article as: Wainwright, H.M., et al., Making sense of g//dx.doi.org/10.1016/j.cageo.2013.06.006iWe compute three statistics: the mean EE, standard deviation(STD) of EE, and mean of absolute EE (mean |EE|). Since the mean EErepresents the average effect of each parameter over the parameterspace, the mean EE can be regarded as a global sensitivity measure.As noted by Saltelli (2008), the mean |EE| is used to identify the non-inuential factors, and the STD of EE is used to identify nonlinearand/or interaction effects. The standard error of mean (SEM) of EE,dened as SEMSTD/r0.5, is used to calculate the condenceinterval of mean EE (Morris, 1991).

2.3. Sobol/Saltelli sensitivity method

While the local and Morris sensitivity methods are difference-based, the Sobol/Saltelli method is variance-based (Sobol, 2001;Saltelli et al., 2008). Here we dene the random variable Y and therandom vector {Xi} for the system response and the parameters,respectively. The sampled response and parameters are y and {xi}.The rst-order sensitivity index (i.e., Sobol index) is dened bySiV[E[Y|Xi]]/V[Y], where E[] and V[] represent mean andvariance, respectively. Si quanties the rst-order effect, i.e., therelative contribution of Xi to the uncertainty of Y excluding theinteraction effect with other parameters. In addition, the totalsensitivity index of Xi is dened by Sti1V[E[Y|Xi]]/V[Y], where E[Y|Xi] represents the mean of Y conditioned on all the parametersbut Xi Sti accounts for the total effect of Xi including interactioneffects, and is used to identify parameters with negligible effectsand parameters that can be xed.

To compute Si and Sti, we use an algorithm developed by Saltelli(2008) and modied by Glen and Isaacs (2012). It rst generatestwo sets of sample matrices A and B, each of which is an n kmatrix containing n sets of k-dimensional parameter vectors fromMonte-Carlo (MC) sampling. From A and B, we create matrices Ci(i1,2,.k) such that the i-th column of Ci is the same as the i-thcolumn of A (i.e., Ci,(m, i)A(m, i) for m1,2,.n), and the othercolumns of Ci are the same as B (i.e., Ci,(m, j)B(m, j) for m1,2,.nand ji). The simulation results from the parameter sets A, B and Ciare n-dimensional vectors: {am}, {bm}, and {ci,m} (m1,n),respectively. The number of required simulations is n(k+2). Si iscomputed as a correlation coefcient between {am} and {ci,m}

Si 1s2y

1n1

n

m 1amyci;my; 3

where y and sy2 are the overall sample mean and variance of Y,respectively. We can use an analytical form of the condenceinterval given for the correlation coefcient (Fisher, 1921) ratherthan computing it with the bootstrap method (e.g., Archer et al.,1997, Tang et al., 2007, Saint-Geours et al., 2010). The 95%condence interval of Si is given as tanh{arctanh(Si)71.96SE},where SE is the standard error given by SE(n3)0.5. The width ofcondence interval increases for small Si due to the tanh trans-form, which implies that more simulations are required to rankminor parameters.

Eq. (3) offers an intuitive way to understand Si. The parametersets Ci and A share the same values only for Xi. If Xi is moreinuential, Xi determines the results so that {am} and {ci,m} shouldbe similar and hence have higher correlation. In Eq. (3), thecovariance term can be re-written as a semivariogram (Deutschand Journel, 1992) such that

Si 11s2y

12n1

n

m 1amci;m2: 4

Note that the stationarity is warranted, since am and ci,m (i1,,k) are the same system response (the difference comes fromrandom sampling), and the mean of them should be the same. Si ishigh when the difference between {am} and {ci,m} is small. Here,

we interpret Si as a difference-based measure between the original

lobal sensitivity analyses. Computers & Geosciences (2013), http:

layers (i.e., reservoir, aquitard, and aquifer) are considered asshown in Fig. 1a. The top and bottom of the system are assumedimpervious except for a constant volumetric rate of injection intothe reservoir, Q5700 m3 d1, through the injection well. Theleaky well, located at 2000 m away from the injection well, canbring resident brine from the reservoir to the aquifer throughfocused leakage. Diffuse leakage from the reservoir to the aquiferoccurs through the aquitard. We also assume that the radius ofinjection and leaky wells is 0.15 m. We use the semi-analyticalmodel developed by Cihan et al. (2011), which calculates both owthrough aquitards (diffusive brine migration) and ow throughleaky wells (focused leakage); see Cihan et al. (2011) for details.

In SA, we perturb seven parameters: hydraulic conductivity Kand storativity S for the reservoir, aquitard, and aquifer, and wellhydraulic conductivity. Table 1 shows the reference parameter values

H.M. Wainwright et al. / Computers & Geosciences () 3({am}) and perturbed responses ({ci,m}) caused by a variation in allparameters except Xi.

Similar to Si, Sti can be computed as

Sti 11s2y

1n1

n

m 1ci;mY bmY : 5

Using the covariancesemivariogram relationship, we can re-writeSti as

Sti 1s2y

12n1

n

m 1ci;mbm2: 6

Since Ci and B have the same values except for Xi, (ci,mbm) isequivalent to taking a difference in Y when perturbing Xi with theother parameters xed. This procedure is the same as the Morrismethod, except that the output difference is not divided by theparameter difference (). This similarity could be the reason whyCampolongo et al. (2007) observed the mean |EE| being a good proxyfor Sti. We would note that Sobol (2001) also included the sameequation as Eq. (6) from a mathematical derivation. However, webelieve that the progression from the correlation to the semivario-gram provides us more intuitive insight as described above.

Eqs. (4) and (6) can also explain the difference between Si andSti. When we perturb Xi with the other parameters xed, thedifference (ci,mbm), hence Sti, accounts not only for the impact ofXi as a single factor, but also for the interaction effects with theother parameters. Si, on the other hand, is computed by xing Xiand changing the other parameters. Perturbing all the otherparameters except for Xi includes the total effects involving allthe parameters except for Xi. In other words, the second term inEq. (4) presents the total effect involving all the parameters exceptfor the rst-order effect of Xi. Since the sum of the all the effectsbecomes 1, Eq. (4) shows that Si represents the rst-order effect ofXi excluding interaction effects.

2.4. Alternative approximation method for the Sobol index

We may recall the original denition of Si as V[E[Y|Xi]]/V[Y].Since E[Y|Xi] is the mean of Y as a function of Xi, E[Y|Xi] correspondsto the mean (or tted) line of MC samples as a function of Xi.Therefore, Si can be computed by the variance of the tted line V[E[Y|Xi]] divided by V[Y]. In other words, Si can be calculated bydetermining E[Y|Xi] in an one-dimensional space, which typicallyrequires much less computational effort than tting a responsesurface in a multi-dimensional parameter space (e.g., Marrel et al.,2009; Oladyshkin et al., (2012). Although tting introduces addi-tional assumptions, one-dimensional tting allows us to use lessmodel-dependent approaches, such as semiparametric or non-parametric regression methods.

Another advantage of this approach is that the number ofsimulations is not directly dependent on k, since MC samples areprojected onto each parameter axis independently. Although thenumber of simulations to achieve overall convergence still dependson k, the minor parameters do not affect the convergence signi-cantly. The original approach requires additional n simulations foreach additional parameter, whereas this approach does not increasethe number of simulations for parameters with a minor impact. Thedisadvantage of this method is that it does not provide Sti. However,this approach is cost effective to identify inuential parameters,especially when MC simulations are performed anyway for otherpurposes (e.g., UA).

3. Demonstration problem setup

For demonstration, we use a synthetic example involving uidinjection and leakage in an idealized reservoiraquitardaquifer

system of a lateral innite extent. Three homogeneous isotropic

Please cite this article as: Wainwright, H.M., et al., Making sense of g//dx.doi.org/10.1016/j.cageo.2013.06.006iFig. 1. (a) Conceptual model setup for the pressure leakage problem, and (b) P(m) at the observation point.

Table 1Reference parameter values and uncertainty ranges: hydraulic conductivity K andspecic storativity S.

Aquifer Aquitard Reservoir Well Range

K (m/d) 2.00E1 2.00E6 2.00E1 2.00E+5 One order of magnitudeS (1/m) 1.88E6 1.47E6 1.88E6 N/A Factor of velobal sensitivity analyses. Computers & Geosciences (2013), http:

and parameter ranges. We assume a uniform distribution within therange for each parameter. As a performance measure, we areinterested in the pressure buildup P at the leaky well location inthe shallow aquifer. P as a function of time in the reference case isshown in Fig. 1b.

4. Results and discussion

4.1. Comparison of sensitivity coefcients

Fig. 2 shows the time evolution of the sensitivity indices fromthe three SA methods; Silocal, mean EE, Si and Sti. The number ofsimulations is 8, 3200 (r400, p6), and 45,000 (n5000) for thelocal, Morris and Sobol/Saltelli methods, respectively. For the GSAindices, the condence intervals are shown so that we canevaluate the parameter importance with a given number ofsimulations. For the local and Morris methods, y is xed to onewith the same unit as y, which is acceptable because we do notcombine multiple outputs. For easier comparison, Fig. 3 shows therescaled sensitivity indices: |Silocal|, mean |EE|, 3sy

Si

pand 3sy

Sti

p.

Si and Sti are rescaled because they are the square of difference-based measures, according to Eqs. (4) and (6). The factor threecomes from the fact that the mean difference of two parameterssampled from the scaled uniform distribution ([0, 1]) is 1/3. Sinceall the values in Fig. 3 are positive, the rescaled indices representthe magnitude of parameter effects.

In Fig. 3, all the rescaled sensitivity indices provide a consistentpattern of parameter importance. The sensitivity indices increasemonotonically as time increases, following the increase in P(Fig. 1b). The reservoir S is most inuential at early time, whereas

through the aquitard and reaches the shallow aquifer. There aredifferences among the four sensitivity indices. For |Silocal| (Fig. 3a),the sensitivity to the aquitard K decreases between 500 and 1000days, since Silocal (Fig. 2a) changes from negative to positive.3sy

Si

p(Fig. 3c) is smaller than the mean |EE| (Fig. 3b) and

3sySti

p(Fig. 3d), since Si does not include interaction effects. For

all the methods, the leaky-well conductivity appears to be theleast sensitive parameter. This may be because the very highreference well conductivity (2105 m/d) and its range of oneorder of magnitude for GSA; in this range of well conductivity,the leakage ux is controlled by the capacity of brine supply fromthe reservoir (Jung et al., 2013; Fig. 7). There are some differences(e.g., the relative importance of aquitard S compared to the otherparameters varies depending on the method used), requiringfurther analysis (see Sections 4.2 and 4.4).

Additional information can be gained from the sensitivityindices in Fig. 2. For example, Silocal and mean EE provide the signof the sensitivity, which helps understand the physics of theproblem. Increasing K of leakage pathways (i.e., well K andaquitard K at later times) increases P (hence a positive effect),whereas increasing the aquifer and reservoir Ks dissipates pres-sures more quickly and thus generally decreases P (hence a largenegative effect). The reservoir S has a negative effect at thebeginning, since the compressible matrix pores absorb the initialpressure increase. We see that the pressure perturbation due todiffusive brine migration may arrive between 500 and 1000 days,since the sensitivity to the aquitard K changes from negative topositive both in the Silocal and mean EE.

In Fig. 2c and d, Si and Sti provide the relative contribution ofeach parameter to the uncertainty of the output. The ranking ofparameter importance is more easily recognized than Silocal and

H.M. Wainwright et al. / Computers & Geosciences () 4aquifer and reservoir K have a dominant effect afterward. Thesensitivity to aquitard K and S increases later, as the brine diffusesFig. 2. Time evolution of sensitivity index: (a) Silocal; (b) mean EE; (c) Si and

Please cite this article as: Wainwright, H.M., et al., Making sense of g//dx.doi.org/10.1016/j.cageo.2013.06.006imean EE. The large effect of reservoir S at early times can be clearlyseen. The patterns of the three most inuential parameters are(d) Sti. In (b)(d), the thin lines represent the 95% condence intervals.


H.M. Wainwright et al. / Computers & Geosciences () 5similar in Fig. 2c and d, suggesting that a change in any of theseparameters would have a signicant impact on P even withouttaking into account the interaction effects.

4.2. Identication of nonlinear and interaction effects in GSA

Fig. 4 shows a cross-plot between the mean and STD of EE,following Morris (1991). Each curve corresponds to the timeevolution for a parameter's sensitivity index. The two black linesrepresent Mean EE72SEM. Due to the large number of paths(r400), the SEM is small, and all the parameters are below theblack lines, indicating that their non-zero impact is statisticallysignicant. All the parameters have a non-zero value of STD,indicating that they have nonlinearity and/or interaction effects.The ratio between the mean and STD of EE is larger for aquitard Kand S, which have a large difference in the evaluated sensitivityindices among the three SA methods (Fig. 3).

Fig. 5 shows the difference between Sti and Si as a function of Si,identifying the ratio between the rst-order effect and the inter-action effects. All the parameters show interaction effects, since(StiSi) is larger than zero. The aquitard K and S have a particularlylarge difference (StiSi) relative to Si, suggesting that they have alarge interaction effect compared to the rst-order effect. Compar-ing Figs. 4 and 5 allows us to separate interaction from nonlinear-ity effects, since Morris's STD of EE includes both, but (StiSi)represents only the interaction effects. For example, the reservoirK and aquifer K would have a similar magnitude of the interactioneffect (Fig. 5), but the nonlinear effect is higher for reservoir K(Fig. 4).

Fig. 3. Time evolution of re-scaled sensitivity in

Please cite this article as: Wainwright, H.M., et al., Making sense of g//dx.doi.org/10.1016/j.cageo.2013.06.006i4.3. Computational cost in GSA

In the Morris method, we need to determine the number ofpartitions (p) as well as the number of paths (r). The inuence ofthese two parameters is investigated here. Fig. 6 shows the meanand STD of EE at 5 years for each parameter as a function of r withp6 and 50. The estimated mean and STD of EE appear stabilizedafter 200 and 300 paths, respectively. Running fewer paths, how-ever, still provides a reasonable estimate of the relative importanceof a parameter.

The number of partitions has a small impact on the convergenceof both mean EE and STD of EE in this case. This is surprising since pdetermines the possible number of parameter combinations; pk/2

dex: (a) |Silocal|,; (b) mean |EE|; (c) and (d).

Fig. 4. Mean EE vs. STD from the Morris method. The circle, square and triangle oneach curve represent 100, 500 and 1100 days, respectively. The end of each linecorresponds to 1825 days.


(Morris, 1991). This is because having fewer partitions leads toindividual EE values varying signicantly, which is seen as a largevariability of curves with fewer than 100 paths. In addition, Fig. 6shows that the converged values are slightly different for p6 andp50. We found that increasing p tends to systematically reduce themean EE (in magnitude) and STD of EE in this system. This is due tothe presence of nonlinearity; the parameter difference becomessmaller as p increases, and the difference in outputs divided by (hence EE) changes depending on . The impact of the differentnumber of partitions is further discussed in Section 4.4.1.

In the Sobol/Saltelli method, we only need to determine n.Fig. 7 shows Si, Sti, and mean |EE| (computed from B and Ci) foreach parameter at 5 years as a function of n. The mean |EE| (Fig. 7c)is computed based on EE by taking the difference (ci,mbm) and

dividing it by the parameter difference, according to Eq. (2). Inother words, this mean |EE| is equivalent to Morris's mean |EE| witha variable (Ci,(m,i) B(m,i)). Si and Sti (in Fig. 7a and b) requireseveral thousand sets to stabilize, whereas the mean |EE| onlyrequires several hundred simulations.

Fig. 7 shows that the large computational cost of the Sobol/Saltelli method compared to the Morris method is attributed tothe fact that the Sobol/Saltelli method calculates a variance-basedmeasure (i.e., a second-order statistics). Eq. (6) implies that Morriss mean |EE| quanties the magnitude of each parameter effectincluding its interaction with others, similar to Sti. The majordifference is that Morris's mean |EE| uses a xed parameterdifference, and that the Morris mean |EE| is not normalized bythe variance of the system response. If each observation isanalyzed separately, the normalization does not affect theparameter importance ranking. Our analysis suggests that Morris'smean |EE| can be used instead of Sti to rank the parametereffect including the interaction effect, considerably savingcomputational costs.

4.4. Additional information from GSA

4.4.1. Morris methodFigs. 8 and 9 show all the simulated P at 5 years from the Morris

method as a function of four parameters: reservoir K, aquifer K,aquitard K, and aquitard S. The number of partitions is different: p6in Fig. 8 and p50 in Fig. 9. Although the blue dots are not fromrandom sampling (except for the rst set of each path), they aredistributed over the entire parameter space and show the variability

Fig. 5. (Sti Si) as a function of Si. The circle, square and triangle on each curverepresent 100, 500 and 1100 days, respectively. The end of each line corresponds to0 days and 1825 days.

H.M. Wainwright et al. / Computers & Geosciences () 6Fig. 6. Mean EE and STD of EE at 5 years as a function of r: (a) mean EE with p6;

Please cite this article as: Wainwright, H.M., et al., Making sense of g//dx.doi.org/10.1016/j.cageo.2013.06.006iof possible P depending on each parameter. The dependence of Pon the reservoir K and aquifer K are obvious, although the largescatter in the low K regions suggests an interaction effect: when(b) mean EE with p50; (c) STD of EE with p6 and (d) STD of EE with p50.


Fig. 8. Scatter plot of P at 5 years from the Morris method (p6) as a function of: (a) reservoir K; (b) aquifer K; (c) aquitard K and (d) aquitard S. The red lines represent thechange in P when the corresponding parameter is changed (1 in 10 paths is shown). The black square is the reference point in the local SA. The green and black line use theslope equal to Silocal and mean EE, respectively, passing through the local SA reference point. (For interpretation of the references to color in this gure legend, the reader isreferred to the web version of this article.)

Fig. 7. (a) Si, (b) Sti and (c) mean |EE| at 5 years as a function of n from the Sobol/Saltelli method.

H.M. Wainwright et al. / Computers & Geosciences () 7

Please cite this article as: Wainwright, H.M., et al., Making sense of global sensitivity analyses. Computers & Geosciences (2013), http://dx.doi.org/10.1016/j.cageo.2013.06.006i

H.M. Wainwright et al. / Computers & Geosciences () 8reservoir K or aquifer K is low, P varies signicantly depending onother parameters.

The red lines in each gure represent the change in P whenthe corresponding parameter is changed. is represented by thetwo end points of each line in the x-axis. The slope of each red linerepresents individual EEs (Eq. (2)). The average slope of all the redlines (i.e., mean EE, shown by the green line) is similar to the slopeof the local sensitivity (black line) calculated at the reference point(black square), which suggests a good match between the localand mean EE, although slight differences are visible for thereservoir K and aquitard K.

In addition, these red lines effectively visualize nonlinearity and/or the interaction among parameters. We may nd nonlinearitywhen the slope is different depending on the starting parametervalue. In Fig. 8c, for example, the slope is steeper when the aquitardK is higher. We may nd an interaction effect when the startingparameter value (x-axis) is the same, but the slope depends on thestarting output value (y-axis) due to the impact of other parameters.In Fig. 8c and d, for example, the red lines have a steeper slope

Fig. 9. Scatter plot of P at 5 years from the Morris method (p50) as a function of: (a) rechange in Pwhen the corresponding parameter is changed (1 in 10 paths is shown). Thslope equal to Silocal and mean EE, respectively, passing through the local SA reference poreferred to the web version of this article.)

Please cite this article as: Wainwright, H.M., et al., Making sense of g//dx.doi.org/10.1016/j.cageo.2013.06.006iwhen the starting P is high. This means that the aquitard K and Sare more inuential when P is high, which is equivalent to thecase where the reservoir K and/or aquifer K are low. Although we donot show here, the 2D contour plots (i.e., the system response as afunction of two parameters) were also examined to conrm theobservations obtained from the scatter plots.

The difference between p6 and p50 can be seen by compar-ing Figs. 8 and 9. The range of P variability is larger in Fig. 8, sincemore extreme P values are sampled when p is small.SilocalTheslope is generally larger in Fig. 8 due to nonlinearity, which creates adifference in the converged mean EE and STD values between p6and p50 in Fig. 6. In adidtion, the six discrete points along eachparameter axis (Fig. 8) seem to capture the variability of thepressure reasonably well in this case. From these gures (Figs. 6,8 and 9), we may conclude that we should use a small p as long asthe discrete points can capture the variability in the output, sinceusing a small p can capture the extremes of output values easily andlead to a larger SEM (i.e., not underestimating the uncertainty ofmean EE estimate).

servoir K; (b) aquifer K; (c) aquitard K and (d) aquitard S. The red lines represent thee black square is the reference point in the local SA. The green and black line use theint. (For interpretation of the references to color in this gure legend, the reader is


H.M. Wainwright et al. / Computers & Geosciences () 94.4.2. Sobol/Saltelli methodFig. 10 shows the scatter plots of P (5 years) as a function of the

four parameters from the MC sampling used in the Sobol/Saltellimethod. It shows scatters similar to Fig. 9, suggesting that the samplesfrom the Morris method successfully capture the output variability inthis one-dimensional parameter space, even though the Morrissampling is not purely random. The red line in each gure representsE[Y|Xi] obtained by locally weighted scatterplot smoothing (LOESS)tting (Hastie et al., 2001). The tting is done locally using least-square tting and a second-order polynomial function. The E[Y|Xi]lines show nonlinearity; aquitard K, for example, has an impact onlywhen it is larger than 5.5 in log10, since higher aquitard K allowsbrine to arrive diffusively in the shallow aquifer before 5 years.

Fig. 10 (along with Figs. 8 and 9) visualizes the conceptualdifference between Si and mean EE, as well as between Si and Sti.Fig. 8c and d (and also Fig. 9c and d) show that the aquitard K andS sometimes have a large impact depending on the other para-meters, since some of the red lines have a large slope when thestarting point of the red line is high. Such an interaction effect isnot represented in Si, since the E[Y|Xi] line averages out sucheffects. The uncertainty of E[Y|Xi] would be higher for minorparameters, since the variability from the other parameters is

Fig. 10. P at 5 years from MC sampling as a function of: (a) reservoir K; (b) aquifer K; (cthe references to color in this gure legend, the reader is referred to the web version o

Please cite this article as: Wainwright, H.M., et al., Making sense of g//dx.doi.org/10.1016/j.cageo.2013.06.006ioverwhelming. This again suggests that the Morris method andStidetermined by changing one parameter while xing theotherscan capture interaction effects and are necessary toidentify the impact of minor parameters. Sti has rarely been usedin hydrogeological UQ studies, although Si has commonly beenevaluated. As Saltelli et al. (2008) suggested, both have to becombined in SA, especially when the objective is to nd negligibleparameters that can be xed for PE and UA.

4.5. Alternative approximation method for the Sobol index

As is discussed in Section 2.3, the Sobol index (Si) can be computedby tting of E[Y|Xi] only in one dimension. Fig. 10 shows the tted lineE[Y|Xi] based on the MC samples in this system. Si can be computed bytaking the variace of this tted line and dividing it by V[Y].

Fig. 11 compares Si determined by this approach with the originalapproach. The approximated sensitivity index (dots) fall in the ornear condence intervals computed by the original algorithm,suggesting that this approach is a valid method. The number ofsimulations is reduced to 1/9 (45,000-5000 simulations). Thisapproach uses a standard MC sampling, which is likely to be availablefrom other applications (e.g., UA). In other words, when we perform

) aquitard K and (d) aquitard S. Each red line represent E[Y|Xi]. (For interpretation off this article.)


rithm to compute these indices. It also suggested that Morriss mean |EE| has similar information to the total sensitivity index whenwe are

ranking. However, in this study, we could offer some alternatives

H.M. Wainwright et al. / Computers & Geosciences () 10interested in the parameter importance ranking (i.e., when thenormalization by the total variance does not matter).

We demonstrated the comparison of the three methods using aUA, we can compute Si (i.e., Sobol index) as a by-product withoutany additional computation.

5. Conclusions

In this study, we compared the interpretation and computationalcost of local SA, and the Morris and Sobol/Saltelli GSA methods. Were-interpreted the Sobol/Saltelli sensitivity indices as difference-based measures, which enabled direct comparison to the Morrismethod. We also developed an alternative approximation approachto efciently compute the rst-order sensitivity index (i.e., Sobolindex). We demonstrated that the three methods could provideadditional information to better understand the system behavior, inaddition to the traditional use of ranking parameter importance.

Re-interpreting the Sobol/Saltelli sensitivity indices as difference-based measures provided better understanding of the similaritiesand differences among the three methods, and also more intuitiveunderstanding of the Sobol/Saltelli sensitivity indices and the algo-

Fig. 11. Si from the original algorithm (thick lines) compared with that from thealternative method (dots). The thin lines represent the 95% condence intervalscomputed by the original algorithm.pressure perturbation problem with uid injection and leakage in areservoiraquitardaquifer system. The demonstration results showedthat the three sensitivity methods give similar interpretations andimportance rankings. We found that the local sensitivity method issufcient to identify the inuential parameters in our case. The Morrismethod provides many uses (e.g., identifying positive/negative effects,inuential parameters, and nonlinear and/or interaction effects) withrelatively small computational burden. The Sobol/Saltelli methodgives a more quantitative sensitivity measure in the context of UQ,although it is computationally expensive. It is noted that a highlynonlinear system would create larger differences among the threemethods. However, we believe that the insights offered in this studygenerally improve our understanding of SA.

We also explored the computational cost of GSA. In the Morrismethod, increasing the number of partitions did not change theconvergence of the mean and STD of EE signicantly, but changedthe converged values due to nonlinearity. In the Sobol/Saltellimethod, the number of simulations is much higher than that forthe Morris method. Our results showed that the difference comesfrom the fact that the Sobol/Saltelli method uses variance-basedmeasures, which are second-order statistics.

Please cite this article as: Wainwright, H.M., et al., Making sense of g//dx.doi.org/10.1016/j.cageo.2013.06.006iUA, the proposed alternative method can be used to computethe Sobol index as a by-product.

Although some of these recommendations were documentedelsewhere (Saltelli et al., 2008), we emphasize them here forapplications in hydrogeology.

Acknowledgment

This work was funded by the Assistant Secretary for FossilEnergy, National Energy Technology Laboratory, National RiskAssessment Partnership, of the US Department of Energy underContract no. DEAC02-05CH11231. The authors wish to thank GeorgePau of Lawrence Berkeley National Laboratory for his technicalreview, and three anonymous reviewers for their constructivecomments.

References

Archer, G.E.B., Saltelli, A., Sobol', I.M., 1997. Sensitivity measures, ANOVA-liketechniques and the use of bootstrap. Journal of Statistical Computation andSimulation 58, 99120.

Cacuci, D.G., 2003. Sensitivity and Uncertainty Analysis, Vol. 1: Theory. ChapmanlobThe Sobol index does not differentiate the minor parameterswell; the total sensitivity index is necessary (or alternativelythe Morris mean |EE| can be used).When there is a set of MC simulations already available fromwith the other parameters.The Sobol/Saltelli method requires a large number of simula-tions to converge. It is important to compute the condenceinterval.Examining the scatter plots of the model outputs from theMorris sampling visualizes nonlinear effects and interactionThe number of partitions in the Morris method should be smallas long as the discrete points capture the variability of thesystem response. A small set of MC simulations would behelpful to determine the minimum number of partitions.A local sensitivity analysis should be done rst. Even when thesystem is nonlinear, it still provides considerable insight intothe system behavior.when the forward model is too computationally intensive to use theSobol/Saltelli method. For example, we can use the alternativemethod to compute the Sobol index from UA results (if available),and the Morris mean |EE| can be used as a proxy for the totalsensitivity index to choose non-inuential parameters.

From this study, we make several recommendations for SA.Examining the scatter plots of the output values as a function ofeach parameter from GSA provided much richer information,improving our understanding of the system. Morris sampling wasespecially useful to check the nonlinearity in the output and identifythe individual interaction effects. In MC sampling, we can also checkwhether the number of partitions in the Morris method is enough tocapture the variability in the response.

Based on our analysis, one may conclude that the Morris OATmethod is sufcient. This study, however, does not imply that theSobol/Saltelli method is not necessary. Although the total sensitivityindex could be substituted by the Morris mean |EE| for parameterimportance ranking, the rst-order sensitivity index (i.e., Sobol index)is required whenwe are interested in the rst-order effect. In addition,the Sobol/Saltelli indices are necessary when we are interested in therelative contribution of each parameter to the uncertainty and/orvariability of each output, in addition to the parameter importanceand Hall/CRC Press, Boca Raton, FL.

al sensitivity analyses. Computers & Geosciences (2013), http:

Campolongo, F., Cariboni, J., Saltelli, A., An effective screening design for sensitivityanalysis of large models. Environmental modelling & software,22(10), 15091518, 2007.

Cihan, A., Zhou, Q., Birkholzer, J.T., 2011. Analytical solutions for pressure perturba-tion and uid leakage through aquitards and wells in multilayered aquifersystems. Water Resources Research 47 (10), W10504.

Deutsch, C.V., Journel, A.G., 1992. Gslib: Geostatistical Software Library and User'sGuide. Oxford University Press, New York, USA.

Fisher, R.A., On the probable error of a coefcient of correlation deduced from asmall sample, Metron 1 (4): 332, 1921.

Finsterle, S., Pruess, K., 1995. Solving the estimationidentication problem in two-phase ow modeling. Water Resources Research 31 (4), 913924.

Finsterle, S., Kowalsky, M.B., Pruess, K., 2012. TOUGH: model use, calibration andvalidation. Transactions of the American Society of Agricultural and BiologicalEngineers 55 (4), 12751290.

Foglia, L., Hill, M.C., Mehl, S.W., Burlando, P., 2009. Sensitivity analysis, calibration,and testing of a distributed hydrological model using error-based weightingand one objective function. Water Resources Research 45, W06427, http://dx.doi.org/10.1029/2008WR007255.

Glen, G., Isaacs, K., 2012. Estimating Sobol sensitivity indices using correlations.Environmental Modelling & Software 37, 157166, http://dx.doi.org/10.1016/j.envsoft.2012.03.014.

Hastie, T., Tibshirani, R., Friedman, J.H., 2001. The elements of Statistical Learning:Data Mining, Inference, and Prediction. Springer, New York, USA.

Hill, M.C., Tiedeman, C.R., 2007. Effective Groundwater Model Calibration: WithAnalysis of Data, Sensitivities, Predictions, and Uncertainty. John Wiley & Sons,New York.

Jung, Y., Zhou, Q., Birkholzer, J.T., Early Detection of Brine and CO2 Leakage throughAbandoned Wells Using Pressure and Surface-Deformation Monitoring Data:

Concept and Demonstration, Advances in Water Resources, in press, http://dx.doi.org/10.1016/j.advwatres.2013.06.008.

Marrel, A., Iooss, B., Laurent, B., Roustant, O., 2009. Calculations of Sobol indices forthe Gaussian process metamodel. Reliability Engineering & System Safety 94(3), 742751.

Morris, M.D., 1991. Factorial sampling plans for preliminary computational experi-ments. Technometrics 33 (2), 161174.

Oladyshkin, S., de Barros, F.P.J., Nowak, W., 2012. Global sensitivity analysis: aexible and efcient framework with an example from stochastic hydrogeol-ogy. Advances in Water Resources 37, 1022, http://dx.doi.org/10.1016/j.advwa-tres.2011.11.001, ISSN: 0309-1708.

Saint-Geours, N., Bailly, J.S., Lavergne, C., Grelot, F., 2010. Latin hypercube samplingof Gaussian random eld for Sobol' global sensitivity analysis of models withspatial inputs and scalar output. In: Proceedings of the Ninth InternationalSymposium on Spatial Accuracy Assesment in Natural Resources and Environ-mental Sciences, Accuracy 2010.

Saltelli, A., Ratto, M., Andres, T., Campolongo, F., Cariboni, J., Gatelli, D., Saisana, M.,Tarantola, S., 2008. Global Sensitivity Analysis: The Primer. John Wiley andSons, New York.

Sobol, I.M., 2001. Global sensitivity indices for nonlinear mathematical models andtheir Monte Carlo estimates. Mathematics and Computers in Simulation 55 (13),271280.

Tang, Y., P. Reed, K. van Werkhoven, and T. Wagener (2007), Advancing theidentication and evaluation of distributed rainfall-runoff models using globalsensitivity analysis, Water Resour. Res., 43, W06415, doi:10.1029/2006WR005813.

Van Griensven, A., Meixner, T., Grunwald, S., Bishop, T., Diluzio, M., Srinivasan, R.,2006. A global sensitivity analysis tool for the parameters of multi-variablecatchment models. Journal of Hydrology 324 (1), 1023.

H.M. Wainwright et al. / Computers & Geosciences () 11Please cite this article as: Wainwright, H.M., et al., Making sense of g//dx.doi.org/10.1016/j.cageo.2013.06.006ilobal sensitivity analyses. Computers & Geosciences (2013), http:

Making sense of global sensitivity analysesIntroductionMethodologyLocal sensitivity methodMorris sensitivity methodSobol/Saltelli sensitivity methodAlternative approximation method for the Sobol index

Demonstration problem setupResults and discussionComparison of sensitivity coefficientsIdentification of nonlinear and interaction effects in GSAComputational cost in GSAAdditional information from GSAMorris methodSobol/Saltelli method

Alternative approximation method for the Sobol index

ConclusionsAcknowledgmentReferences

Making sense of global sensitivity analyses

Documents