Top Banner
Coupled simulation-optimization model for coastal aquifer management using genetic programming-based ensemble surrogate models and multiple-realization optimization J. Sreekanth 1,2 and Bithin Datta 1,2 Received 21 June 2010 ; revised 9 November 2010 ; accepted 1 February 2011 ; published 29 April 2011. [1] Approximation surrogates are used to substitute the numerical simulation model within optimization algorithms in order to reduce the computational burden on the coupled simulation-optimization methodology. Practical utility of the surrogate-based simulation- optimization have been limited mainly due to the uncertainty in surrogate model simulations. We develop a surrogate-based coupled simulation-optimization methodology for deriving optimal extraction strategies for coastal aquifer management considering the predictive uncertainty of the surrogate model. Optimization models considering two conflicting objectives are solved using a multiobjective genetic algorithm. Objectives of maximizing the pumping from production wells and minimizing the barrier well pumping for hydraulic control of saltwater intrusion are considered. Density-dependent flow and transport simulation model FEMWATER is used to generate input-output patterns of groundwater extraction rates and resulting salinity levels. The nonparametric bootstrap method is used to generate different realizations of this data set. These realizations are used to train different surrogate models using genetic programming for predicting the salinity intrusion in coastal aquifers. The predictive uncertainty of these surrogate models is quantified and ensemble of surrogate models is used in the multiple-realization optimization model to derive the optimal extraction strategies. The multiple realizations refer to the salinity predictions using different surrogate models in the ensemble. Optimal solutions are obtained for different reliability levels of the surrogate models. The solutions are compared against the solutions obtained using a chance-constrained optimization formulation and single-surrogate-based model. The ensemble-based approach is found to provide reliable solutions for coastal aquifer management while retaining the advantage of surrogate models in reducing computational burden. Citation: Sreekanth, J., and B. Datta (2011), Coupled simulation-optimization model for coastal aquifer management using genetic programming-based ensemble surrogate models and multiple-realization optimization, Water Resour. Res., 47, W04516, doi:10.1029/ 2010WR009683. 1. Introduction [2] Coupled simulation-optimization models are increas- ingly used as decision models to find optimal solutions to groundwater management problems like optimal ground- water remediation design, optimal extraction of groundwater from coastal aquifers, and wetland management [ Gorelick, 1983; Gorelick et al., 1984; Ahlfeld and Heidari, 1994; Hallaji and Yazicigil, 1996; Emch and Yeh, 1998; Wang and Zheng, 1998; Das and Datta, 1999a, 1999b, 2000; Cheng et al., 2000; Mantoglou, 2003; Mantoglou et al., 2004; Katsifarakis and Petala, 2006; Ayvaz and Karahan, 2008; Datta et al. 2009]. One of the major disadvantages of using the coupled simulation-optimization model is the huge computational burden involved due to multiple calls of the simulation model by the optimization algorithm. Recent studies have used nontraditional optimization techniques for solving groundwater management problems. This includes genetic algorithm [Aly and Peralta, 1999; Cheng et al., 2000; Qahman et al., 2005; Bhattacharjya and Datta, 2005], evolutionary algorithm [Mantoglou et al., 2004], simulated annealing [Rao et al., 2004], and differential evolution [Karterakis et al., 2007]. Population-based opti- mization algorithms like genetic algorithms can be effec- tively used to solve optimization problems considering multiple objectives at a time in which the entire nondomi- nated front of solutions can be obtained in a single run of the optimization model. However, with the use of population- based optimization algorithms like genetic algorithm, sev- eral thousands of evaluations of the simulation model may be required before an optimal solution is obtained. One possible approach to reducing the computational burden is to substitute the simulation model using approximate surro- gate models for simulation. In spite of the wide use of surrogate models in coupled simulation-optimization 1 School of Engineering and Physical Sciences, James Cook University, Townsville, Queensland, Australia. 2 CRC for Contamination Assessment and Remediation of the Environ- ment, Mawson Lakes, South Australia, Australia. Copyright 2011 by the American Geophysical Union. 0043-1397/11/2010WR009683 W04516 1 of 17 WATER RESOURCES RESEARCH, VOL. 47, W04516, doi :10.1029/2010WR009683, 2011
17

Coupled simulation-optimization model for coastal aquifer ... · ensemble-based approach is found to provide reliable solutions for coastal ... 1School of Engineering ... for more

Jun 07, 2018

Download

Documents

phungkien
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Coupled simulation-optimization model for coastal aquifer ... · ensemble-based approach is found to provide reliable solutions for coastal ... 1School of Engineering ... for more

Coupled simulation-optimization model for coastal aquifermanagement using genetic programming-based ensemble surrogatemodels and multiple-realization optimization

J. Sreekanth1,2 and Bithin Datta1,2

Received 21 June 2010; revised 9 November 2010; accepted 1 February 2011; published 29 April 2011.

[1] Approximation surrogates are used to substitute the numerical simulation model withinoptimization algorithms in order to reduce the computational burden on the coupledsimulation-optimization methodology. Practical utility of the surrogate-based simulation-optimization have been limited mainly due to the uncertainty in surrogate model simulations.We develop a surrogate-based coupled simulation-optimization methodology for derivingoptimal extraction strategies for coastal aquifer management considering the predictiveuncertainty of the surrogate model. Optimization models considering two conflictingobjectives are solved using a multiobjective genetic algorithm. Objectives of maximizing thepumping from production wells and minimizing the barrier well pumping for hydrauliccontrol of saltwater intrusion are considered. Density-dependent flow and transportsimulation model FEMWATER is used to generate input-output patterns of groundwaterextraction rates and resulting salinity levels. The nonparametric bootstrap method is used togenerate different realizations of this data set. These realizations are used to train differentsurrogate models using genetic programming for predicting the salinity intrusion in coastalaquifers. The predictive uncertainty of these surrogate models is quantified and ensemble ofsurrogate models is used in the multiple-realization optimization model to derive the optimalextraction strategies. The multiple realizations refer to the salinity predictions using differentsurrogate models in the ensemble. Optimal solutions are obtained for different reliabilitylevels of the surrogate models. The solutions are compared against the solutions obtainedusing a chance-constrained optimization formulation and single-surrogate-based model. Theensemble-based approach is found to provide reliable solutions for coastal aquifermanagement while retaining the advantage of surrogate models in reducing computationalburden.

Citation: Sreekanth, J., and B. Datta (2011), Coupled simulation-optimization model for coastal aquifer management using genetic

programming-based ensemble surrogate models and multiple-realization optimization, Water Resour. Res., 47, W04516, doi:10.1029/

2010WR009683.

1. Introduction[2] Coupled simulation-optimization models are increas-

ingly used as decision models to find optimal solutions togroundwater management problems like optimal ground-water remediation design, optimal extraction of groundwaterfrom coastal aquifers, and wetland management [Gorelick,1983; Gorelick et al., 1984; Ahlfeld and Heidari, 1994;Hallaji and Yazicigil, 1996; Emch and Yeh, 1998; Wangand Zheng, 1998; Das and Datta, 1999a, 1999b, 2000;Cheng et al., 2000; Mantoglou, 2003; Mantoglou et al.,2004; Katsifarakis and Petala, 2006; Ayvaz and Karahan,2008; Datta et al. 2009]. One of the major disadvantages ofusing the coupled simulation-optimization model is the huge

computational burden involved due to multiple calls of thesimulation model by the optimization algorithm. Recentstudies have used nontraditional optimization techniques forsolving groundwater management problems. This includesgenetic algorithm [Aly and Peralta, 1999; Cheng et al.,2000; Qahman et al., 2005; Bhattacharjya and Datta,2005], evolutionary algorithm [Mantoglou et al., 2004],simulated annealing [Rao et al., 2004], and differentialevolution [Karterakis et al., 2007]. Population-based opti-mization algorithms like genetic algorithms can be effec-tively used to solve optimization problems consideringmultiple objectives at a time in which the entire nondomi-nated front of solutions can be obtained in a single run of theoptimization model. However, with the use of population-based optimization algorithms like genetic algorithm, sev-eral thousands of evaluations of the simulation model maybe required before an optimal solution is obtained. Onepossible approach to reducing the computational burden isto substitute the simulation model using approximate surro-gate models for simulation. In spite of the wide useof surrogate models in coupled simulation-optimization

1School of Engineering and Physical Sciences, James Cook University,Townsville, Queensland, Australia.

2CRC for Contamination Assessment and Remediation of the Environ-ment, Mawson Lakes, South Australia, Australia.

Copyright 2011 by the American Geophysical Union.0043-1397/11/2010WR009683

W04516 1 of 17

WATER RESOURCES RESEARCH, VOL. 47, W04516, doi:10.1029/2010WR009683, 2011

Page 2: Coupled simulation-optimization model for coastal aquifer ... · ensemble-based approach is found to provide reliable solutions for coastal ... 1School of Engineering ... for more

approaches, they are rarely accepted as reliable models forsimulating groundwater flow and transport, in practicalapplications. These models are most often disfavoredbecause of the inherently uncertain nature of these ‘‘blackbox’’ models.

[3] Use of the surrogate models adds an uncertaintycomponent to the simulation-optimization framework. Pre-dictive uncertainty of the surrogate models may have am-biguous effects on the optimality or even the feasibility ofthe obtained solutions. In the present study, we develop acoupled simulation-optimization model based on an ensem-ble of surrogate models for optimal management of coastalaquifers under the predictive uncertainty of the surrogatemodels. The model determines optimal extraction strategiesfor management. The ensemble of surrogate models is uti-lized to quantify the predictive uncertainty. The ensembleis then used with stochastic-optimization models to deriveoptimal extraction strategies.

[4] Previously, a number of different approaches havebeen used to solve the problem of optimal and sustainableextraction of groundwater from coastal aquifers. The differ-ent approaches use either sharp interface or diffuse inter-face modeling of saltwater intrusion processes within asimulation-optimization framework. Analytical solutionsexist for the sharp interface modeling approach and arecomparatively easy to use in a simulation-optimizationframework [Iribar et al., 1997; Dagan and Zeitoun, 1998;Mantoglou, 2003; Park and Aral 2004; Mantoglou andPapantoniou, 2008]. The diffuse modeling approach con-siders the flow and transport equations which are linkedtogether by the density dependence and needs to be simul-taneously solved. The coupled flow and transport equationsare highly nonlinear and complex. Linking a numericalmodel which solves these equations with an optimizationalgorithm involves huge computational burden [Das andDatta, 1999a, 1999b; Dhar and Datta, 2009].

[5] In the past few years surrogate models have beenused as substitutes for the numerical simulation modelwithin the optimization algorithm. A wide range of approx-imation surrogates have been used in different studies. Arti-ficial Neural Networks (ANN) have been widely used asapproximation surrogates for groundwater models [Ranji-than et al., 1993; Rogers et al., 1995; Aly and Peralta,1999]. Neural network-based approximation surrogateswere developed by Bhattacharjya and Datta [2005, 2009],Yan and Minsker [2006], Kourakos and Mantoglou [2009],and Dhar and Datta [2009] for use in simulation-optimiza-tion models. McPhee and Yeh [2006] used ordinary differ-ential equation surrogates to replace the partial differentialequations of groundwater flow and transport.

[6] Most of these surrogate modeling approaches assumea fixed surrogate model structure and optimize the surro-gate model parameters to obtain the best fit between the ex-planatory and response variables. Even the most popularlyused neural network surrogate modeling approach deter-mines the optimal model architecture by trial and error[Bhattacharjya and Datta, 2005; Rao et al., 2004].

[7] In spite of the method used, developing surrogatemodels from numerical simulation models results in a cer-tain amount of uncertainty in the predicted variable. This isdue to the uncertainty in the structure and parameters ofthe surrogate model. When used in a coupled simulation-

optimization framework to derive optimal groundwatermanagement strategies, the uncertainties in the surrogatemodel predictions affect the optimality of the resultingsolution. Thus, while achieving computational efficiency,increased mathematical uncertainty resulting from theresiduals is introduced into the simulation-optimizationframework by the surrogate model. Depending on theamount of uncertainty, the derived optimal solution may berendered suboptimal or even infeasible. Hence, it is impor-tant to quantify the uncertainty in the surrogate model pre-dictions and reformulate the optimization problem toaddress this uncertainty.

[8] In our study, an ensemble-based surrogate modelingapproach based on genetic programming is used to predictthe salinity intrusion into coastal aquifers resulting fromgroundwater extraction. Genetic programming (GP) hasbeen used in hydrological applications in a few recent stud-ies [Dorado et al., 2002; Makkeasorn et al., 2008; Para-suraman and Elshorbagy, 2008; Wang et al., 2009]. GPhas been used to develop prediction models for runoff, riverstage and real-time wave forecasting [Babovic and Keijzer,2002; Sheta and Mahmoud, 2001; Gaur and Deo, 2008].Zechman et al. [2005] developed a GP-based surrogatemodel for use in a groundwater pollutant source identifica-tion problem. An ensemble-based GP framework is able toquantify the uncertainty in both the model structure and pa-rameters. Parasuraman and Elshorbagy [2008] illustratedthe use of ensemble-based genetic programming frame-work in the quantification of uncertainty in hydrologicalprediction. Sreekanth and Datta [2010] used genetic pro-gramming to develop surrogate models for coastal aquifermanagement and compared it with modular neural net-work-based surrogate models. Genetic programming-basedsurrogate models have the advantage that the surrogatemodel structure need not be fixed prior to the model devel-opment. Instead, the optimum model structure is evolvedby the self-organizing ability of genetic programming algo-rithm. It was found that GP-based surrogate modeling candevelop simpler and effective surrogate models with modelparameters as few as 30 against 1155 weights used in theneural network model. Also, it was demonstrated that theevolution of surrogate model structures by GP and its parsi-mony in identifying the input variables makes it moreeffective than the ANN model structure determined by trialand error and arbitrary selection of variables. In the presentwork we make use of GP to develop an ensemble of surro-gate models which are different from each other and use itfor more reliable predictions of coastal aquifer processesfor use in management model.

[9] Different stochastic optimization techniques havebeen used in the past for optimal decision making underuncertainty [Wagner and Gorelick, 1987; Tiedeman andGorelick, 1993; McPhee and Yeh, 2006]. Chance-constrained programming had been used in groundwatermanagement by Wagner and Gorelick [1987, 1989], Mor-gan et al. [1993], and Datta and Dhiman [1996]. Anothermethod for stochastic simulation optimization is the multiple-realization approach [Wagner and Gorelick, 1989; Morganet al., 1993; Chan, 1993; Feyen and Gorelick, 2004]. In thismethod, numerous realizations of uncertain model parametersare considered simultaneously in an optimization formula-tion. He et al. [2010] used a set of proxy simulators, in a

W04516 SREEKANTH AND DATTA: ENSEMBLE SURROGATES FOR OPTIMAL COASTAL AQUIFERS W04516

2 of 17

Page 3: Coupled simulation-optimization model for coastal aquifer ... · ensemble-based approach is found to provide reliable solutions for coastal ... 1School of Engineering ... for more

coupled simulation-optimization model for groundwaterremediation design under parameter uncertainty of the proxysimulators. The proxy simulators were based on a stepwiseresponse surface analysis. The residuals in the predictionwere treated as stochastic variables and their deterministicequivalent was incorporated into the optimization model.

[10] Most of the real world groundwater managementproblems are multiobjective in nature, i.e., they involvemore than one objective which are conflicting to eachother. The solution to such problems is an entire nondomi-nated front of solutions which gives a trade-off between thedifferent objectives considered. Population-based nontradi-tional optimization algorithms like genetic algorithms areideal to solve such problems as different members of thepopulation can converge to different parts of the nondomi-nated front, thus deriving the entire Pareto-optimal front ina single run of the optimization algorithm. Multiobjectivegenetic algorithm NSGA-II [Deb, 2001] is used in thisstudy to solve the multiobjective optimal coastal aquifermanagement problem.

[11] In this work our main objective is to develop an en-semble of surrogate models for predicting the saltwaterintrusion process in coastal aquifers. The ensemble of sur-rogate models is used in a stochastic multiobjective opti-mization using multiple-realization approach to deriverobust optimal extraction strategies which are less sensi-tive to the uncertainties in the surrogate model predictions.This study considers the uncertainties of the surrogatemodels alone and the numerical model is assumed to becertain. Two objectives of management are consideredsubject to the constraint of controlling saltwater intrusion.The first objective is to maximize the total pumping fromthe production wells tapping the aquifer. The secondobjective is to minimize the total pumping from a set ofbarrier wells which are used to hydraulically control salt-water intrusion. The salinity levels resulting from pumpingis simulated using the surrogate simulation model. Achance-constrained optimization model is also developedfor coastal aquifer management, taking into considerationthe cumulative distribution function of the error residualsof surrogate model predictions. The optimal solutionsobtained using these two methods is compared with the so-lution obtained using only a single-surrogate model in thecoupled simulation-optimization model.

[12] The remaining part of this paper is structured asfollows. Section 2 describes the framework of the coastalaquifer management model. Section 3 describes the devel-opment of the ensemble of surrogate models. Section 4describes the formulation and implementation of optimiza-tion models using multiobjective genetic algorithms. Sec-tion 5 illustrates the application of the methodology using acase study. Section 6 summarizes and concludes the paper.

2. Outline of the Coastal Aquifer ManagementMethodology

[13] The proposed coastal aquifer management method-ology using coupled simulation optimization has essentiallytwo components. The first one is the ensemble of surrogatemodels for simulating the physical process under considera-tion. In this work we consider the saltwater intrusion incoastal aquifers as a function of the groundwater extractions

from the aquifer. The second component is an optimizationmodel used to optimize the groundwater extraction strat-egies such that the resulting salinity levels are maintainedwithin prespecified limits. The genetic programming-basedsurrogate models are trained using randomly generatedinput-output patterns of extraction rates and resulting salin-ity levels. The input-output patterns are generated using athree-dimensional simulation model for simulating coupledflow and transport called FEMWATER [Lin et al., 1997].Nonparametric bootstrap method [Efron and Tibshirani,1993] is used together with genetic programming to con-struct the ensemble of surrogate models. The ensemblemodels are then linked to a multiobjective genetic algorithmto obtain the optimal groundwater extraction rates. The dif-ferent elements of the proposed methodology for develop-ing optimal coastal aquifer management strategies aredescribed in detail in sections 3 and 4.

3. Ensemble of Surrogate Models[14] The following procedure was adopted to develop

the ensemble of surrogate models.

3.1. Design of Experiments[15] The design of experiments is the first step required

for training the GP-based surrogate models. Developing asurrogate model based on genetic programming involveslearning from input-output patterns. In the case of thecoastal aquifer management problem, the inputs are therates of groundwater abstractions from different potentiallocations within the aquifer and outputs are the resulting sa-linity concentrations. The decision space for the problemunder consideration is a multidimensional space represent-ing the combinations of groundwater abstraction rates fromdifferent locations at various time periods. For the surro-gate models to perform satisfactorily, the training patternsshould be representative of the entire decision space. Uni-formly distributed Latin hypercube samples (LHS) of inputpatterns are generated from the decision space to train thegenetic programming-based surrogate models.

[16] LHS, a stratified-random procedure, provides an ef-ficient way of sampling variables from their distributions[Iman and Conover, 1982]. The LHS involves sampling nsvalues from the prescribed distribution of each of k varia-bles X1, X2, . . . , Xk. The cumulative distribution for eachvariable is divided into N equiprobable intervals. A value isselected randomly from each interval The N valuesobtained for each variable are paired randomly with theother variables.

3.2. Numerical Simulation Model[17] Once the input patterns of groundwater abstractions

are generated, the resulting salinity levels corresponding toeach pattern are computed. The numerical simulation modelFEMWATER [Lin et al., 1997] is used for this. FEM-WATER is a finite element-based 3-D coupled flow andtransport simulation model. The density dependent flow andtransport equations used in FEMWATER are given as fol-lows [Lin et al., 1997; Sreekanth and Datta 2010]:

�oF@h@t¼ r � K � rhþ �

�orz

� �� �þ ���o

q ; ð1Þ

W04516 SREEKANTH AND DATTA: ENSEMBLE SURROGATES FOR OPTIMAL COASTAL AQUIFERS W04516

3 of 17

Page 4: Coupled simulation-optimization model for coastal aquifer ... · ensemble-based approach is found to provide reliable solutions for coastal ... 1School of Engineering ... for more

F ¼ �0 �nþ �0�þ n

dSdh

; ð2Þ

K ¼ �g�

k ¼�=�o

� ��=�o

� � �og�o

kskr ¼�=�o�=�o

Ksokr ; ð3Þ

�o¼ a1 þ a2C ; ð4Þ

�@C@tþ �b

@Sa

@tþ V � rC �r � �D � rCð Þ

¼ � �0@h@tþ �

� ��C þ �bSað Þ � �KwC þ �bKsS

að Þ

þ m� ���

qC þ F@h@tþ �o

�V � r �

�o

� �� @�@t

� �C

ð5Þ

D ¼ aT jVj� þ aL � aTð ÞVVjVj þ am�d ; ð6Þ

where F is storage coefficient, h is pressure head, t is time,K is hydraulic conductivity tensor, z is potential head, q issource and/or sink, � is water density at the chemical con-centration C, �o is referenced water density at zero chemi-cal concentration, �� is density of either the injection fluidor the withdrawn water, � is moisture content, �0 is modi-fied compressibility of water, n is porosity of the medium,S is saturation, � is dynamic viscosity of water at chemicalconcentration C, �o is referenced dynamic viscosity ofwater at zero chemical concentration, k is permeability ten-sor, ks is relative permeability or relative hydraulic conduc-tivity, Kso is referenced saturated hydraulic conductivitytensor, a1 and a2 are the parameters used to define concen-tration dependence of water density and C is the chemicalconcentration, �b is bulk density of medium, C is materialconcentration in aqueous phase, Sa is material concentra-tion in adsorbed phase, t is time, V is discharge, r is deloperator, D is dispersion coefficient tensor, �0 is compressi-bility of the medium, h is pressure head, � is decay con-stant, m is qCin (artificial mass rate), q is source rate ofwater, Cin is material concentration in the source, Kw is firstorder biodegradation rate constant through dissolved phase,Ks is first order biodegradation rate through adsorbed phase,F is storage coefficient, jVj is magnitude of V, d is Kro-necker delta tensor, aT is lateral dispersivity, aL is longitu-dinal dispersivity, am is molecular diffusion coefficient, and is tortuosity.

3.3. Genetic Programming[18] Genetic programming [Koza, 1994] is used in this

study to evolve surrogate models for modeling the salinityintrusion in the coastal aquifers resulting from groundwaterabstraction. Genetic programming is an evolutionary algo-rithm similar to genetic algorithm in that it uses the conceptsof natural selection and genetics in evolutionary computa-tion. For a given model structure and predefined parameterspace, the genetic algorithm optimizes the parameter values.Genetic programming has an additional degree of freedomwhich allows an optimum model structure to evolve parallel

to optimizing the parameter values. Thus, genetic program-ming identifies the best model structure for simulating theprocess under consideration while simultaneously estimat-ing the optimal parameter values. Genetic programminglearns from examples. The major inputs for the genetic pro-gramming model are (1) patterns for learning, (2) fitnessfunction (e.g., minimizing the squared error term), (3) func-tional and terminal set, and (4) parameters for the geneticoperators like the crossover and mutation probabilities.

[19] The functional set consists of the basic mathemati-cal operators and basic functions like addition, subtraction,multiplication, division, trigonometric functions, etc. Thechoice of the functional set determines the complexity ofthe model. For example, a functional set with only additionand subtraction results in a linear model structure, whereasa functional set which includes trigonometric functionsresult in highly nonlinear model structures. The terminalset consists of constants and variables of the model. Thetotal number of parameters used can be limited to a prespe-cified number in order to prevent overfitting of the model.By using functional and terminal sets, valid syntacticallycorrect programs can be developed. Parse tree notation oftwo such programs are illustrated in Figure 1. Two parentgenetic programs are shown in Figures 1a and 1b. The par-ent programs are crossed over at the dashed sections andmutation operator changes the value of the constant 2 to 6to generate two new offspring genetic programs shown inFigures 1c and 1d.

[20] In the present work, the operators addition, subtrac-tion, and multiplication are considered in the initial func-tional set. Later, other functions were added into thefunctional set one by one in the order of their increasingcomplexity and nonlinearity. For example, an addition orsubtraction operation is considered in the functional setbefore multiplication is considered. However, consideringthe nonlinear nature of the saltwater intrusion process, mul-tiplication and division are considered in the initial func-tional set itself. The additional function or operator isaccepted upon an improvement in the fitness measurebecause of this addition.

[21] GP starts with a set of randomly generated syntacti-cally correct programs. Each program is evaluated by test-ing the programs in N number of instances, where N is thenumber of patterns in the training data set generated usingLatin hypercube sampling and the numerical simulationmodel. The input-output data set is split into halves. Onehalf is used to train the GP models and the other half isused to test the developed genetic programs. Testing refersto the validation of the model. The testing data set is notused in the fitness function evaluation; instead it is used toevaluate how the model performs for a new set of data.Also, the evaluations based on the testing data set are usedto pick the best programs from the population.

[22] By comparing the outcome of the program on eachof these patterns with the actual outcome, the fitness valueis assigned. The fitness function is usually the root meansquare error (RMSE). The programs are ranked based onthe fitness value and new programs are created using thecrossover and mutation operators. This process of evolvingnew programs by means of genetic operators, and subse-quent fitness evaluation, are performed for a specified num-ber of generations to obtain the best fit genetic program.

W04516 SREEKANTH AND DATTA: ENSEMBLE SURROGATES FOR OPTIMAL COASTAL AQUIFERS W04516

4 of 17

Page 5: Coupled simulation-optimization model for coastal aquifer ... · ensemble-based approach is found to provide reliable solutions for coastal ... 1School of Engineering ... for more

3.4. Nonparametric Bootstrap Method[23] The nonparametric bootstrap method is used to gen-

erate different realizations of the actual input-output patternsof groundwater abstractions and salinity concentrations.Each realization of the data set is then used to train a sepa-rate surrogate model. An ensemble of surrogate models forthe prediction of salinity levels could be obtained using thisprocedure. Each surrogate model is distinctly different fromthe rest in the ensemble because of the difference in thetraining data set and the population based-optimization lead-ing to identification of multiple optima by the search algo-rithm. The distinction in the model structure and parametersamong the different surrogate models is a manifestation ofthe uncertainty in the model structure and parameters itself.A methodology used by Parasuraman and Elshorbagy[2008] is followed to accomplish nonparametric bootstrapsampling. The data set obtained using Latin hypercube sam-pling and using the numerical simulation model is assumedto be a representative set of input-output values from theentire population in the decision space. A training data set Tof size N is generated using Latin hypercube sampling andthe numerical simulation model. Different realizations ofthis data set are obtained using the nonparametric bootstrapmethod. For this a bootstrap size of B is chosen. Then B dif-ferent data sets each of size N is obtained by repeated ran-dom sampling with replacement from the set T. Thus eachbootstrap sample-set TB has different input-output patternsfrom the training data set T repeated many times. The boot-strap sample sets TB differ from each other only in terms ofthe repetition of some patterns and elimination of some fromthe original data set. The repetition of patterns in the boot-strap causes differential weighting of these patterns. Thisresults in development of the models which are different intheir predictive capability in different regions of the decisionspace of the prediction model. This also triggers the conver-gence to multiple optimal solutions while training the predic-

tion model. Thus each surrogate model is an optimal modelfor the prediction, however different in their predictive capa-bility in different regions of the decision space, dependingon the weights assigned to patterns from each region.

[24] The performance of each of the surrogate models isdetermined by evaluating the root mean square error on thetesting data set. After computing the root mean square errorsfor each of the surrogate model in the ensemble, the standarddeviation and coefficient of variation of these errors arecomputed. The coefficient of variation of these errors is ameasure of the predictive uncertainty of the models. Thenumber of surrogate models in the ensemble is determinedby performing an incremental statistical analysis on the en-semble performance, i.e., surrogate models are sequentiallyadded in to the ensemble and the resulting uncertainty isevaluated. Also, the RMSE of the resulting ensemble is alsocomputed after the addition of each surrogate model. RMSEis computed on the testing data considering the testing datasets of all the surrogates in the ensemble taken together ateach stage of addition. The optimum number of surrogatemodels in the ensemble is determined as follows. An ensem-ble with 10 surrogate models is considered initially. The rootmean square error of the salinity concentration predictionsby each surrogate model is computed. The coefficient of var-iation of these root mean square errors are computed and isconsidered as the measure of uncertainty in the ensemble ofmodels. Then, new surrogate models are added into the en-semble one at a time and the resulting RMSE and uncer-tainty are computed. This procedure is repeated until there isno significant change in the uncertainty of the ensemble withfurther addition of surrogate models. The number of surro-gate models in the ensemble at this stage is the ensemblesize. The number of models in the ensemble at which furtheraddition of models into the ensemble do not produce signifi-cant change in the uncertainty is considered as the optimumnumber of surrogate models in the ensemble.

Figure 1. Crossover and mutation in genetic programming.

W04516 SREEKANTH AND DATTA: ENSEMBLE SURROGATES FOR OPTIMAL COASTAL AQUIFERS W04516

5 of 17

Page 6: Coupled simulation-optimization model for coastal aquifer ... · ensemble-based approach is found to provide reliable solutions for coastal ... 1School of Engineering ... for more

4. Optimization Models[25] The main objective of this study is to develop a

coastal aquifer management model which uses an ensembleof surrogate models to simulate the saltwater intrusion pro-cess. Two approaches of optimization addressing the uncer-tainty in surrogate model predictions are used in this study.The first one is based on a stochastic simulation-optimizationmethod called multiple realization or stacking approach[Wagner and Gorelick, 1989; Morgan et al., 1993; Chan,1993; Feyen and Gorelick, 2005]. The second approachuses a chance-constrained optimization model [Morganet al., 1993; Datta and Dhiman, 1996].

[26] The stochastic optimization accounts for the uncer-tainty in the surrogate model structures and parameters. Inthe multiple-realization approach all the surrogate modelsin the ensemble are independently linked to the optimiza-tion model, i.e., if the ensemble consists of 10 different sur-rogate models then the optimization formulation has astack of 10 constraints representing the surrogate models.Thus the optimal solution will be subject to satisfying eachof these constraints representing the different surrogatemodels which differ from each other due to the modelstructure and parameter uncertainty.

4.1. Multiobjective Optimization Using a Multiple-Realization Approach

[27] Two conflicting objectives are considered in thisstudy. The first one is the maximization of total beneficialpumping from the aquifer and the second one is minimiza-tion of the total pumping from the barrier wells which areused to hydraulically control saltwater intrusion. Limitingthe salinity concentrations, resulting from the groundwaterextraction, to specified limits are the constraints. The math-ematical formulation of this multiobjective optimizationproblem using multiple-realization approach is as follows:

Maximize; f1ðQÞ ¼XN

n¼1

XT

t¼1

Qtn ; ð7Þ

Minimize; f2ðQÞ ¼XMm¼1

XT

t¼1

qtm ; ð8Þ

s:t: cri¼ r

iðQ; qÞ 8i; r; ð9Þ

cri� cmax8i; r ; ð10Þ

Qmin � Qtn � Qmax ð11Þ

qmin � qtm � qmax ; ð12Þ

where Qtn is the pumping from the nth production well dur-

ing the tth time period, qtm is the pumping from the mth bar-

rier well during the tth time period, and cri

is the rthrealization of concentration in the ith location at the end ofthe management time horizon. This is obtained from the rthsurrogate model for the salinity at the ith location using thesurrogate model given by r

i ( ). M, N, and T are, respec-tively, the total number of production wells, total numberof barrier wells, and total number of time steps in the man-

agement model. Constraint (10) imposes the maximum per-missible salt concentration in the monitoring welllocations. Constraints (11) and (12) define lower and upperbounds of the pumping from production wells and barrierwells, respectively.

[28] With the multiple-realization approach, optimal sol-utions with different reliability values can be obtained.The reliability value is the fraction of surrogate models inthe entire ensemble whose salinity predictions satisfy theimposed constraints of maximum salinity levels in theoptimization model. For example, if there are N differentsurrogate models in the ensemble, it is possible to obtain anoptimal solution with a reliability of n=N by constrainingthe optimization model to satisfy constraints imposed byat least n surrogate models. Reliability of the optimal solu-tion is close to 1 when the constraints imposed by all Nsurrogate models are satisfied. However, this reliability per-tains to the uncertainty in the ensemble of surrogate modelsonly.

4.2. Chance-Constrained Approach[29] The optimal solutions obtained by the multiple-

realization approach for different reliabilities are comparedto the solutions obtained using a chance-constrained opti-mization formulation. The chance-constrained formulationuses the same objective functions and constraints as in (7)and (8) and (11) and (12). The constraint given by (10) and(11) are replaced as follows:

ci ¼ �i ðQ; qÞ þ "i ; ð13Þ

Rel½ciðQ; q; "iÞ � cmax� � � ; ð14Þ

where ci is the salinity concentration at the ith location atthe end of the management time horizon, "i is the error inthe salinity concentration prediction for the ith location,and �

iðQ; qÞ is the average of the salinities at the ith loca-

tion predicted by the ensemble of surrogate models. Rel isthe reliability level of the ensemble prediction that the pre-dicted concentration is less than cmax. This reliability isbased on the cumulative distribution function of the errorresiduals in the salinity level prediction by the surrogatemodels. The reliability is constrained to be greater than orequal to �. The probabilistic constraint in (14) is convertedinto its deterministic equivalent as follows:

�i ðQ; q; "iÞ þ �i�1 �ð Þ � cmax ; ð15Þ

where �i�1 is the inverse cumulative distribution function

for the residuals in salinity prediction at the ith locationand �i

�1 �ð Þ gives the prediction error corresponding to areliability �.

[30] A coupled simulation-optimization model with asingle-surrogate model predicting the salinity levels at eachmonitoring location is also developed for comparative eval-uation. The same optimization formulation as in (7) – (12)is used for this purpose except that salinity prediction bythe ensemble represented by (9) is replaced as follows:

ci ¼ biðQ; qÞ ; ð16Þ

W04516 SREEKANTH AND DATTA: ENSEMBLE SURROGATES FOR OPTIMAL COASTAL AQUIFERS W04516

6 of 17

Page 7: Coupled simulation-optimization model for coastal aquifer ... · ensemble-based approach is found to provide reliable solutions for coastal ... 1School of Engineering ... for more

where bi represents the best surrogate model, in terms of

the least value of the objective function obtained in the GPmodel, for predicting the salinity at the ith location. Theoriginal data set is used to develop this surrogate modelinstead of the bootstrap sample.

4.3. Multiobjective Genetic Algorithm[31] A multiobjective genetic algorithm NSGA-II [Deb,

2001] is used to solve the multiobjective coastal aquifermanagement problem. Similar to GA, NSGA-II uses a pop-ulation of candidate solutions together with the GA opera-tors cross-over, mutation and selection to evolve improvessolutions to the optimization problem over a number ofgenerations. In addition to this, NSGA-II organizes themembers of the population into nondominated fronts aftereach generation, based on the conflicting objectives of opti-mization. Thus, in a single run, NSGA-II is able to generatethe entire Pareto-optimal set of solutions at the end of thespecified number of generations.

4.4. Ensemble-Based Coupled Simulation-Optimization Model

[32] The coastal aquifer management model makes useof a coupled simulation-optimization framework to derivethe optimal groundwater extraction strategies for coastalaquifers. The ensembles of the surrogate model for simulat-ing the aquifer responses in terms of salinity concentrationsare coupled with the optimization model by linking eachsurrogate model separately with the optimization algorithm.The multiobjective genetic algorithm randomly generatescandidate solutions which are the groundwater extractionrates for the different time periods within the managementhorizon. The aquifer responses corresponding to each ofthese patterns of extraction are obtained from the ensembleof surrogate models. All generated candidate solutions areevaluated for feasibility and fitness. New candidate solu-tions are generated using the genetic algorithm operators.The procedure is repeated for a number of generations, untilthe termination criteria are satisfied. The solutions are pro-gressively improved to converge to the final Pareto-optimalfront. A schematic representation of the ensemble-basedsimulation-optimization model is shown in Figure 2.

4.5. Validation[33] Once the optimal solution is obtained, its validity is

checked by simulating the aquifer processes by using the

optimal pumping values in the actual numerical simulationmodel FEMWATER. The residual in the salinity predic-tion, i.e., the difference between the surrogate-predictedvalue and the numerically simulated value, is evaluated forfive optimal solutions in different regions of the Pareto-optimal front. This is performed for the optimal solutionsobtained using the three optimization models, namely,single-surrogate model, ensemble-based model, and thechance-constrained model.

5. Case Study[34] In order to illustrate the application of the proposed

methodology, it is applied to derive optimal extractionstrategies for an illustrative coastal aquifer system. The aq-uifer is 2.52 km2 in aerial extent with eight potential loca-tions for groundwater extraction for beneficial use, andthree potential barrier well locations for hydraulic controlof salinity intrusion. The aquifer considered is single lay-ered with an average depth of 60 m. The boundaries of thestudy area are all no-flow boundaries, except for the sea-ward side boundary which is a constant head and constantconcentration boundary with a concentration value of 35kg/m3. The aquifer system is illustrated in Figure 3. Theeight potential locations for beneficial groundwater extrac-tion are shown as PW1 –PW8. The barrier well locationsfor hydraulic control of saltwater intrusion are shown asBW1 –BW3. The salinity concentrations were monitored atthree locations, C1, C2, and C3, at the end of the manage-ment time horizon.

[35] The time horizon for the management model wasfixed as 3 years with the extraction rates in each manage-ment period of 1 year considered as uniform. The ground-water recharge is specified as a constant rate of 0.00054 m/d,respectively. The lower and upper limits on groundwaterabstractions for both beneficial and barrier wells are 0 and1300 m3/d. Total number of decision variables in the optimi-zation model is 33, corresponding to pumping from 11 wellsfor three time periods. The management model specifies amaximum permissible salt concentration limit of 0.5, 0.6,and 0.6 kg/m3 at these locations, respectively. The parame-ters used for the FEMWATER model are given in Table 1.

[36] A three-dimensional coupled flow and transportsimulation model was used to simulate the aquifer pro-cesses resulting in salinity intrusion due to groundwater

Figure 2. Schematic representation of the ensemble-based coupled simulation-optimization method.

W04516 SREEKANTH AND DATTA: ENSEMBLE SURROGATES FOR OPTIMAL COASTAL AQUIFERS W04516

7 of 17

Page 8: Coupled simulation-optimization model for coastal aquifer ... · ensemble-based approach is found to provide reliable solutions for coastal ... 1School of Engineering ... for more

abstraction in this study area. Different groundwater extrac-tion scenarios were generated using Latin hypercube sam-pling. The salinity concentrations resulting from each ofthese pumping patterns are simulated using FEMWATER.The simulated salinity level and the corresponding pump-ing rates form the input-output pattern. Altogether 230extraction patterns are used in this study. Different realiza-tions of this input-output data set were generated using thenonparametric bootstrap method. Each of these data setswas used to build surrogate models to create the ensembleof surrogate models. Each data set was split into halves fortraining and testing the GP models. The input-output pat-terns were then used to train the genetic programming-based surrogate models. Adaptive training [Sreekanth andDatta, 2010] was performed to reduce the number of pat-terns required for training.

[37] Surrogates were developed for predicting salinity atthree different locations. For each location 30 models in theensemble was found to be sufficient to characterize theuncertainty. All the genetic programming surrogate modelsused a population size of 500, mutation frequency of 95,and crossover frequency of 50. A commercial genetic pro-gramming software Discipulus was used to develop the sur-rogate models. The parameters values, as per the guidelinesafter performing a sensitivity analysis, were used in the de-velopment of the model. The functional set in the developedGP models contained the operations addition, subtraction,multiplication, division, comparison, and data transfer. Themaximum number of surrogate model parameters used waslimited to 30 to prevent overfitting of the model. Squareddeviation from the actual value was used as the fitness func-tion. At the end of model training and testing source codes

of the model in C language were generated using the inter-active evaluator of the software and are then coupled withthe multiobjective optimization algorithm NSGA II.

6. Results and Discussion6.1. Uncertainty in Surrogate Models

[38] The uncertainty in the surrogate models were quanti-fied using the coefficient of variation of the root mean squareerrors of the individual surrogate models. The root meansquare errors of individual surrogate model salinity predic-tions C1, C2, and C3 are shown in Figures 4, 5, and 6. TheRMSEs are computed over the testing data set used for eval-uating the genetic programming-based surrogate models. Itcould be observed that for different realizations of the samedata set, the root mean square errors are different for differ-ent surrogate models. This is due to the predictive uncer-tainty of the surrogate models. The root mean square errorsfor the ensemble of models predicting salinity C1 are plottedagainst the number of surrogate models in the ensemblestarting from an initial ensemble size of 10 in Figure 7. Asthe number of models in the ensemble increases, RMSE ofthe ensemble prediction decreases, at least in this example.

[39] The coefficient of variation of the RMSEs, as a mea-sure of uncertainty in prediction of salinity, is plottedagainst the number of surrogate models in the ensemble foreach ensemble predicting C1, C2, and C3. The plots areshown in Figures 8, 9, and 10. Uncertainty of the ensemblemodel has a definite decreasing trend with the increasingnumber of models in the ensemble. For each of the salinityconcentrations C1, C2, and C3 the uncertainty in the en-semble of surrogate model decreases with the number ofmodels in the ensemble and reaches a constant value whenthe number of models in the ensemble is around 30. Hencethe optimum number of models in the ensemble for coupledsimulation optimization is chosen as 30. The optimumnumber of surrogate models depends on the uncertaintylevel in the model structure and parameters. For more com-plex systems the uncertainty in the model structure and pa-rameters of surrogate models will be larger and hence morenumber of surrogate models will be required in the ensem-ble. The sensitivity of the derived Pareto-optimal solutionsto the number of surrogate models in the ensemble is ana-lyzed in section 6.4.

Table 1. Parameters for Aquifer Simulation

Parameter Value

Hydraulic conductivity in x direction 25 m/dHydraulic conductivity in y direction 25 m/dHydraulic conductivity in z direction 0.25 m/dLongitudinal dispersivity 80 m/dLateral dispersivity 35 m/dMolecular diffusion coefficient 0.69 m2/dSoil porosity 0.2Density reference ratio 7.14 � 10�7

Figure 3. Three-dimensional aquifer system illustrating the well and monitoring locations.

W04516 SREEKANTH AND DATTA: ENSEMBLE SURROGATES FOR OPTIMAL COASTAL AQUIFERS W04516

8 of 17

Page 9: Coupled simulation-optimization model for coastal aquifer ... · ensemble-based approach is found to provide reliable solutions for coastal ... 1School of Engineering ... for more

Figure 5. RMSE for individual surrogate models simulating salinity C2.

Figure 4. RMSE for individual surrogate models simulating salinity C1.

Figure 6. RMSE for individual surrogate models simulating salinity C3.

W04516 SREEKANTH AND DATTA: ENSEMBLE SURROGATES FOR OPTIMAL COASTAL AQUIFERS W04516

9 of 17

Page 10: Coupled simulation-optimization model for coastal aquifer ... · ensemble-based approach is found to provide reliable solutions for coastal ... 1School of Engineering ... for more

6.2. Multiobjective Optimization[40] The multiobjective optimization algorithm NSGA-II

was used to solve the optimization formulations of bothmultiple-realization and chance-constrained approaches.Similar to an ordinary genetic algorithm, NSGA-II has apopulation-based approach for deriving the optimal solu-tions. The population size used in this study is 200. NSGA-IIwas run for 750 generations to obtain the optimal solution.Thus a total of 200 � 750 evaluations of the aquifer responseto specific groundwater extraction patterns would berequired before obtaining the solutions. The NSGA-II pa-rameters used were crossover probability 0.9 and mutationprobability 0.02. The sensitivity of the optimal solution topopulation size, number of generations, and NSGA-II pa-rameters were evaluated by conducting a number of numeri-cal experiments by running the NSGA-II model withdifferent combinations of the parameters. It was found thatfor the number of generations less than 750 and populationsize less than 200, convergence to the Pareto-optimal front isnot achieved. However, convergence is obtained for asmaller population size of a larger number of generations. Itis noted that reducing the population size affects the spreadof solutions in the Pareto-optimal front. Some regions of thePareto-optimal front get eliminated as a result of reductionin the population size. The optimization problems have 33variables which are the pumping rates from 11 locations forthree time periods. The optimization by multiple-realizationapproach has 90 constraints, corresponding to three ensem-bles with 30 surrogate models each predicting the salinitylevels C1, C2, and C3.

6.3. Pareto-Optimal Front[41] Pareto-optimal solutions refer to a nondominated

front of solutions obtained for the coastal aquifer manage-ment problem. On the Pareto-optimal front any improve-ment in one objective function requires a correspondingdecline in the other objective function. These sets ofsolutions are obtained for the coastal aquifer managementproblem using multiobjective optimization for both multiple-realization and chance-constrained approaches. All the solu-tions on the front are nondominated and the water managers

can choose a prescribed solution to implement a specificpumping pattern so as to maximize the benefits and simulta-neously limiting the aquifer contamination.

[42] The Pareto-optimal solutions for different reliabilitiesobtained by the multiple-realization and chance-constrainedmethods are compared in Figures 11–13. Figure 11 illus-trates the Pareto-optimal front for a reliability of 0.99. In themultiple-realization approach this set of solutions satisfy theconstraints imposed by all the surrogate models linked withthe optimization model. In the chance-constrained formula-tion this set of solutions corresponds to an error in predictioncorresponding to a reliability of 0.99. Similarly Figures 12,13, and 14 illustrate the fronts corresponding to reliabilitylevels of 0.8, 0.66, and 0.5. Figure 14 also compares thefronts of reliability level 0.5 to the Pareto-optimal frontobtained using single-surrogate model in optimization.

[43] For the multiple-realization approach the reliabilityrefers to the percent of surrogate models in the ensemble,the imposed constraints of which are satisfied in the optimi-zation. For the chance-constrained method the reliability isobtained from the inverse cumulative distribution functionof the residuals in the salinity prediction by the ensembleof surrogate models for salinities C1, C2, and C3. The cu-mulative distribution functions corresponding to C1, C2,and C3 are shown in Figures 15, 16, and 17. The errors aremore or less symmetrically distributed with a probability of0.5 for zero residual in all three cases.

[44] It can be noted that Pareto-optimal solutions with ahigher reliability level appears to be inferior to those with alower reliability level. The plausible reason is that, as reli-ability decreases, the probability of these solutions violatingthe constraints increases. Therefore, the apparently bettersolutions may not be feasible. In Figure 14 the Pareto-optimal front obtained for a reliability level of 0.5 are com-pared against the Pareto-optimal front obtained using onlythe best surrogate model in the coupled simulation optimiza-tion. It could be observed that the front obtained using the sin-gle-surrogate model is very close to and slightly better thanthe fronts obtained for a reliability level of 0.5 using multiple-realization and chance-constrained methods. In accordancewith the general trend of variation of the Pareto-optimal front

Figure 7. RMSE of the ensemble simulating salinity C1.

W04516 SREEKANTH AND DATTA: ENSEMBLE SURROGATES FOR OPTIMAL COASTAL AQUIFERS W04516

10 of 17

Page 11: Coupled simulation-optimization model for coastal aquifer ... · ensemble-based approach is found to provide reliable solutions for coastal ... 1School of Engineering ... for more

Figure 8. Uncertainty levels for increasing ensemble size for salinity C1.

Figure 9. Uncertainty levels for increasing ensemble size for salinity C2.

Figure 10. Uncertainty levels for increasing ensemble size for salinity C3.

W04516 SREEKANTH AND DATTA: ENSEMBLE SURROGATES FOR OPTIMAL COASTAL AQUIFERS W04516

11 of 17

Page 12: Coupled simulation-optimization model for coastal aquifer ... · ensemble-based approach is found to provide reliable solutions for coastal ... 1School of Engineering ... for more

with the reliability, it could be deduced that the reliabilitylevel of the solutions obtained using a single-surrogate modellinked with the optimization algorithm is less than 0.5. Inusing a single best surrogate model in the coupled simulationoptimization it is assumed that the surrogate model predictionhas a 0 residual, i.e., the surrogate model simulation is equiv-alent to the numerical model simulation. However, it can beobserved from the cumulative distribution functions that theprobability of zero residual is 0.5. Since most of the optimalsolutions are limit state designs, i.e., optimal solution lying onthe constraint bounds, the uncertainty in the surrogate modelstructure often causes the optimal solution to move into theinfeasible region.

[45] Salinity levels corresponding to five different opti-mal solutions in the Pareto-optimal front, obtained using the

best surrogate model in the coupled simulation-optimizationmodel, are shown in Table 2. It could be observed that, inthe optimal solutions, the salinity levels C1 and C3 con-verge to the permissible maximum concentration and hencethe solutions are on the constraint boundaries. Hence, asmall error in the surrogate model prediction can movethese solutions into the infeasible zone. The salinity levelscorresponding to these solutions is simulated using theactual simulation model and is compared with the valuesobtained using the surrogate model. It could be observedthat some of the actual salinity levels obtained from the nu-merical simulation model violate the constraints, thus forc-ing the derived optimal solutions into the infeasible zone.The errors in the predicted salinity level for the optimalsolutions are given in Tables 3 and 4. Tables 3 and 4

Figure 11. Pareto-optimal fronts with reliability 0.99.

Figure 12. Pareto-optimal fronts with reliability 0.8.

W04516 SREEKANTH AND DATTA: ENSEMBLE SURROGATES FOR OPTIMAL COASTAL AQUIFERS W04516

12 of 17

Page 13: Coupled simulation-optimization model for coastal aquifer ... · ensemble-based approach is found to provide reliable solutions for coastal ... 1School of Engineering ... for more

correspond to multiple-realization and chance-constrainedapproaches, respectively. The errors refer to the differencein the salinity levels obtained using the actual numericalsimulation model and the surrogate model. In both thecases, it is evident that the errors are less when the reliabil-ity level is high.

[46] The ensemble-based surrogate modeling approachquantifies the uncertainties in the model structure and pa-rameters. Reliable optimal solutions for coastal aquifermanagement were obtained using the ensemble surrogatemodels with the stochastic multiple-realization and chance-constrained optimization models.

6.4. Sensitivity Analysis[47] Comparison of Pareto-optimal fronts for different

reliabilities show that for 30 surrogate models in the en-semble, the multiple-realization approach identifies the

same front as the chance-constrained optimizationapproach for identical reliability levels. This implies thatthe constraints imposed by stochastic optimization usingmultiple realization is as rigid as the chance constraintswhen the number of surrogate models in the ensemble islarge enough to quantify the uncertainty in the model struc-tures and parameters.

[48] In order to investigate the effect of the number ofsurrogate models in the ensemble, numerical experimentswere performed with 15, 10, and 5 models in the ensemblefor the multiple-realization optimization approach for eachreliability level. The corresponding Pareto-optimal frontsfor reliability level 0.99 are compared with the frontsobtained using 30 models and the chance-constrainedmodel is shown in Figure 18. As the size of the ensembledecreases, the fronts move further to find seemingly bettersolutions, which actually may be infeasible solutions.

Figure 13. Pareto-optimal fronts with reliability 0.66.

Figure 14. Pareto-optimal front for single-surrogate model compared with fronts with reliability 0.5.

W04516 SREEKANTH AND DATTA: ENSEMBLE SURROGATES FOR OPTIMAL COASTAL AQUIFERS W04516

13 of 17

Page 14: Coupled simulation-optimization model for coastal aquifer ... · ensemble-based approach is found to provide reliable solutions for coastal ... 1School of Engineering ... for more

Figure 16. Cumulative distribution functions for the residuals in the ensemble predictions of salinity C2.

Figure 17. Cumulative distribution functions for the residuals in the ensemble predictions of salinity C3.

Figure 15. Cumulative distribution functions for the residuals in the ensemble predictions of salinity C1.

W04516 SREEKANTH AND DATTA: ENSEMBLE SURROGATES FOR OPTIMAL COASTAL AQUIFERS W04516

14 of 17

Page 15: Coupled simulation-optimization model for coastal aquifer ... · ensemble-based approach is found to provide reliable solutions for coastal ... 1School of Engineering ... for more

Table 2. Salinity Levels Corresponding to Five Optimal Solutions From Single-Surrogate Model-Based Optimizationa

Solution Number

C1 � 0.5 kg/m3 C2 � 0.6 kg/m3 C3 � 0.6 kg/m3

SM � 10�3 kg/m3 NM � 10�3 kg/m3 SM � 10�3 kg/m3 NM � 10�3 kg/m3 SM � 10�3 kg/m3 NM � 10�3 kg/m3

1 500.00 483.04 563.39 561.45 599.99 622.052 500.00 515.33 583.13 575.97 599.99 623.523 500.00 510.34 582.39 573.16 599.99 599.764 500.00 483.00 574.68 548.73 599.99 624.235 500.00 498.25 574.48 563.57 599.99 618.35

aSM ¼ surrogate model, NM ¼ numerical model.

Table 3. Residuals in Salinity Prediction for Five Optimal Solutions Obtained by Multiple-Realization Optimization

ReliabilitySolution Number

0.99 0.8 0.66 0.5

C1 C2 C3 C1 C2 C3 C1 C2 C3 C1 C2 C3

1 3.16 0.61 �4.24 1.83 12.09 8.21 �9.89 1.77 �14.08 30.92 28.19 8.202 6.14 �0.29 5.21 �4.19 3.55 �4.12 17.09 �0.92 �21.90 �8.53 �19.43 �2.823 4.93 0.01 4.89 �6.00 12.66 12.57 �2.58 0.00 6.98 27.14 �22.50 6.964 �0.05 0.44 5.27 �4.52 �2.50 �12.39 �5.22 0.07 1.05 �18.52 �28.29 17.875 2.27 �0.36 �0.04 �5.60 7.70 6.02 8.76 0.39 5.73 �10.23 30.06 �9.14

Table 4. Residuals in Salinity Prediction for Five Optimal Solutions Obtained by Chance-Constrained Optimization

ReliabilitySolution Number

0.99 0.8 0.66 0.5

C1 C2 C3 C1 C2 C3 C1 C2 C3 C1 C2 C3

1 �3.09 0.49 4.55 1.85 6.28 �1.64 9.64 �1.78 �13.98 �20.64 7.34 14.192 �1.13 0.27 �3.47 �6.64 9.14 �6.60 �12.30 0.89 �20.85 �27.10 �8.90 �0.633 2.97 0.24 �5.75 1.93 �6.43 4.79 �9.13 1.99 �12.67 21.80 30.68 �23.044 �0.62 �0.46 5.10 �2.79 �0.60 �10.08 6.86 2.11 �3.32 �12.16 �27.14 �26.315 �3.44 �0.54 �0.76 0.78 6.06 �9.87 �22.46 1.09 �17.28 28.64 �16.12 5.16

Figure 18. Sensitivity of the solutions to the ensemble size.

W04516 SREEKANTH AND DATTA: ENSEMBLE SURROGATES FOR OPTIMAL COASTAL AQUIFERS W04516

15 of 17

Page 16: Coupled simulation-optimization model for coastal aquifer ... · ensemble-based approach is found to provide reliable solutions for coastal ... 1School of Engineering ... for more

Similar results were obtained for other reliability levelsalso. Hence, it can be inferred that the size of the ensemblehas an effect on the stochastic optimization using multiplerealizations. With a sufficiently large number of models inthe ensemble, the multiple-realization approach performssimilar to the chance-constrained optimization approach.

7. Summary and Conclusions[49] Surrogate models are widely used in research to sub-

stitute complex numerical simulations models in solvinggroundwater management problems using coupled simula-tion-optimization. However, their practical applicationshave been limited primarily due to the reliability of the sur-rogate model predictions. The reliability of surrogate modelpredictions is dependent on the uncertainties in the modelstructure and parameters. The uncertain surrogate modelswhen used in a coupled simulation-optimization frameworkaffects the quality as well as reliability of the optimal solu-tions obtained. Because most optimal design solutions arelimit state in nature, the error in the surrogate model predic-tions could make the derived optimal solutions even infeasi-ble. In order to address these issues, and as a possibleremedy, this study proposed and evaluated the performanceof an ensemble of surrogate models based on a simulation-optimization model. The ensemble of surrogate models isalso used to quantify the uncertainty in the surrogate modelstructure and parameters. Salinity prediction by each surro-gate model in the ensemble differs from others due to themodel structure and parameter uncertainty. Two differentoptimization formulations were used to derive the optimalabstraction rates. In the first method, each surrogate modelin the ensemble was independently linked to the multiobjec-tive genetic algorithm NSGA II, using the multiple-realiza-tion formulation. In the second method, the error in salinitypredictions were quantified using the ensemble of modelsand the cumulative distribution function of the errors wasobtained. Based on the cumulative distribution function, thechance-constrained optimization problem was formulatedand solved using the multiobjective genetic algorithmNSGA II. The reliability of the chance-constrained model isanalogous to the reliability obtained using the ensemblesurrogate model approach, as the management model is con-strained by the permissible maximum limits on salinity con-centrations. The Pareto-optimal sets of solutions obtainedusing the two methods for different reliability levels werecompared. Also, these fronts were compared with thePareto-optimal set obtained using the best surrogate modelin the coupled simulation optimization. It was observed thatthe front obtained using the single-surrogate model in theoptimization was close to the front corresponding to a speci-fied reliability of 0.5. It could be argued that the reliabilityof the optimal solution obtained using a single-surrogatemodel in the linked simulation-optimization model forcoastal aquifer management roughly corresponds to 0.5.However, using ensemble of surrogate models with stochas-tic optimization helps improve the reliability of the salinitypredictions and subsequent optimal solutions.

[50] Ensemble-based surrogate modeling in couple-simulation optimization has significant advantages over thesingle-surrogate modeling approach. The single-surrogatemodeling approach does not take into consideration the pre-

dictive uncertainty and assume that the surrogate model pre-diction is equivalent to numerical simulation. The ensemble-based methodology is able to quantify the predictiveuncertainty and use it in a stochastic optimization model.Thus the ensemble-based approach accounts for the error insurrogate model prediction due to predictive uncertaintywhich is difficult to accomplish using the single-surrogatemodel. The ensemble-based approach is found to derivemore reliable optimal solutions while retaining the computa-tional advantages of the surrogate modeling approach.

[51] It should be possible to use ensemble surrogate mod-els in coupled simulation-optimization groundwater manage-ment studies considering the uncertainty in the groundwaterparameters. Ensemble of surrogate models could be used tosubstitute groundwater models with different hydraulic con-ductivities and other uncertain parameters. For this, eachmember of the ensemble has to be trained using a differentdata set obtained by using a particular realization of theuncertain groundwater parameters in the numerical simula-tion model. The ensemble can be then used in a stochastic-optimization framework to derive groundwater managementstrategies under groundwater parameter uncertainty.

[52] Acknowledgments. This study was funded by CRC for Contami-nation Assessment and Remediation of the Environment, Australia. We aregrateful to the four reviewers for the constructive comments which helpedin improving the presentation of this paper.

ReferencesAhlfeld, D. P., and M. Heidari (1994), Applications of optimal hydraulic

control to groundwater systems, J. Water Resour. Planning Manage.,120(3), 350– 365.

Aly, A. H., and R. C. Peralta (1999), Optimal design of aquifer cleanup sys-tems under uncertainty using a neural network and a genetic algorithm,Water Resour. Res., 35(8), 2523– 2532, doi:10.1029/98WR02368.

Ayvaz, M. T., and H. Karahan (2008), A simulation/optimization model forthe identification of unknown groundwater well locations and pumpingrates, J. Hydrol., 357(1 –2), 76– 92.

Babovic, V., and M. Keijzer (2002), Rainfall runoff modelling based ongenetic programming, Nord. Hydrol., 33(5), 331– 346.

Bhattacharjya, R., and B. Datta (2005), Optimal management of coastalaquifers using linked simulation optimization approach, Water Resour.Manage., 19(3), 295– 320.

Bhattacharjya, R. K., and B. Datta (2009), ANN-GA-based model for mul-tiple objective management of coastal aquifers, J. Water Resour. Plan-ning Manage., 135(5), 314–322.

Chan, N. (1993), Robustness of the multiple realization method for stochas-tic hydraulic aquifer management, Water Resour. Res., 29(9), 3159–3167, doi:10.1029/93WR01410.

Cheng, A. H. D., D. Halhal, A. Naji, and D. Ouazar (2000), Pumping opti-mization in saltwater-intruded coastal aquifers, Water Resour. Res.,36(8), 2155– 2165, doi:10.1029/2000WR900149.

Dagan, G., and D. G. Zeitoun (1998), Seawater-freshwater interface in astratified aquifer of random permeability distribution, J. Contam.Hydrol., 29(3), 185– 203.

Das, A., and B. Datta (1999a), Development of management models for sus-tainable use of coastal aquifers, J. Irrig. Drain. Eng. 125(3), 112–121.

Das, A., and B. Datta (1999b), Development of multiobjective managementmodels for coastal aquifers, J. Water Resour. Planning Manage., 125(2),76– 87.

Das, A., and B. Datta (2000), Optimization based solution of density depend-ent seawater intrusion in coastal aquifers, J. Hydrol. Eng., 5(1), 82–89.

Datta, B., and S. D. Dhiman (1996), Chance constrained optical monitoringnetwork design for pollutants in groundwater, J. Water Resour. PlanningManage., 122(3), 180– 189.

Datta, B., D. Chakraborthy, and A. Dhar (2009), Simultaneous identifica-tion of unknown groundwater pollution sources and estimation of aquiferparameters, J. Hydrol., 376(1 –2), 48–57.

W04516 SREEKANTH AND DATTA: ENSEMBLE SURROGATES FOR OPTIMAL COASTAL AQUIFERS W04516

16 of 17

Page 17: Coupled simulation-optimization model for coastal aquifer ... · ensemble-based approach is found to provide reliable solutions for coastal ... 1School of Engineering ... for more

Dhar, A., and B. Datta (2009), Saltwater intrusion management of coastalaquifers. I : Linked simulation-optimization, J. Hydrol. Eng., 14(12),1263–1272.

Deb, K. (2001), Multi-objective Optimization Using Evolutionary Algo-rithms, John Wiley, New York.

Dorado, J., J. R. Rabunal, J. Puertas, A. Santos, and D. Rivero (2002), Pre-diction and modelling of the flow of a typical urban basin through geneticprogramming, Appl. Artifici. Intel., 17(4), 329–343.

Efron, B., and R. J. Tibshirani (1993), An Introduction to the Bootstrap,Monogr. on Stat. and Appl. Prob., vol. 57, Chapman and Hall, New York.

Emch, P. G., and W. W. G. Yeh (1998), Management model for conjunctiveuse of coastal surface water and ground water, J. Water Resour. PlanningManage., 124(3), 129– 139.

Feyen, L., and S. M. Gorelick (2005), Framework to evaluate the worth ofhydraulic conductivity data for optimal groundwater resources manage-ment in ecologically sensitive areas, Water Resour. Res., 41, W03019,doi:10.1029/2003WR002901.

Gaur, S., and M. C. Deo (2008), Real-time wave forecasting using geneticprogramming, Ocean Eng., 35(11– 12), 1166–1172.

Gorelick, S. M. (1983), A review of distributed parameter groundwater-management modeling methods, Water Resour. Res., 19(2), 305–319,doi:10.1029/WR019i002p00305.

Gorelick, S. M., C. I. Voss, P. E. Gill, W. Murray, M. A. Saunders, andM. H. Wright (1984), Aquifer reclamation design—The use of contami-nant transport simulation combined with non-linear programming, WaterResour. Res., 20(4), 415– 427, doi:10.1029/WR020i004p00415.

Hallaji, K., and H. Yazicigil (1996), Optimal management of a coastal aqui-fer in southern Turkey, J. Water Resour. Planning Manage., 122(4),233– 244.

He, L., G. H. Huang, and H. W. Lu (2010), A coupled simulation-optimizationapproach for groundwater remediation design under uncertainty: An appli-cation to a petroleum-contaminated site, Environ. Pollution, 157(8–9),2485–2492.

Iman, R. L., and W. J. Conover (1982), A distribution-free approach to induc-ing rank correlation among input variables, Commun. Stat. B 11, 311–334.

Iribar, V., J. Carrera, E. Custodio, and A. Medina (1997), Inverse modellingof seawater intrusion in the Llobregat delta deep aquifer, J. Hydrol.,198(1– 4), 226– 244.

Karterakis, S. M., G. P. Karatzas, I. K. Nikolos, and M. P. Papadopoulou(2007), Application of linear programming and differential evolutionaryoptimization methodologies for the solution of coastal subsurface watermanagement problems subject to environmental criteria, J. Hydrol.,342(3– 4), 270– 282.

Katsifarakis, K. L., and Z. Petala (2006), Combining genetic algorithmsand boundary elements to optimize coastal aquifers’ management,J. Hydrol., 327(1– 2), 200– 207.

Kourakos, G., and A. Mantoglou (2009), Pumping optimization of coastalaquifers based on evolutionary algorithms and surrogate modular neuralnetwork models, Adv. Water Resour., 32(4), 507– 521.

Koza, J. R. (1994), Genetic programming as a means for programmingcomputers by natural-selection, Stat. Comput., 4(2), 87– 112.

Lin, H.-C. J., D. R. Richards, C. A. Talbot, G.-T. Yeh, J.-R. Cheng, H.-P.Cheng, and N. L. Jones (1997), A three-dimensional finite element com-puter model for simulating density-dependent flow and transport in vari-able saturated media: Version 3.0, U.S Army Engineer Research andDevelopment Center, Vicksburg, Miss.

Makkeasorn, A., N. B. Chang, and X. Zhou (2008), Short-term streamflowforecasting with global climate change implications—A comparativestudy between genetic programming and neural network models,J. Hydrol., 352(3– 4), 336– 354.

Mantoglou, A. (2003), Pumping management of coastal aquifers using ana-lytical models of saltwater intrusion, Water Resour. Res., 39(12), 1335,doi:10.1029/2002WR001891.

Mantoglou, A., and M. Papantoniou (2008), Optimal design of pumpingnetworks in coastal aquifers using sharp interface models, J. Hydrol.,361(1– 2), 52– 63.

Mantoglou, A., M. Papantoniou, and P. Giannoulopoulos (2004), Manage-ment of coastal aquifers based on nonlinear optimization and evolution-ary algorithms, J. Hydrol., 297(1– 4), 209– 228.

McPhee, J., and W. W. G. Yeh (2006), Experimental design for ground-water modeling and management, Water Resour. Res., 42, W02408,doi:10.1029/2005WR003997.

Morgan, D. R., J. W. Eheart, and A. J. Valocchi (1993), Aquifer remedia-tion design under uncertainty using a new chance constrained program-ming technique, Water Resour. Res., 29(3), 551–561, doi:10.1029/92WR02130.

Parasuraman, K., and A. Elshorbagy (2008), Toward improving the reliabil-ity of hydrologic prediction: Model structure uncertainty and its quantifi-cation using ensemble-based genetic programming framework, WaterResour. Res., 44, W12406, doi:10.1029/2007WR006451.

Park, C. H., and M. M. Aral (2004), Multi-objective optimization of pump-ing rates and well placement in coastal aquifers, J. Hydrol., 290(1– 2),80– 99.

Qahman, K., A. Larabi, D. Ouazar, A. Naji, and A. H.-D. Cheng (2005),Optimal and sustainable extraction of groundwater in coastal aquifers,Stochastic Environ. Res. Risk Assess., 19(2), 99– 110.

Ranjithan, S., J. W. Eheart, and J. H. Garrett (1993), Neural network basedscreening for groundwater reclamation under uncertainty, Water Resour.Res., 29(3), 563– 574, doi:10.1029/92WR02129.

Rao, S. V. N., S. M. Bhallamudi, B. S. Thandaveswara, and G. C. Mishra(2004), Conjunctive use of surface and groundwater for coastal anddeltaic systems, J. Water Resour. Planning Manage., 130(3), 255–267.

Rogers, L. L., F. U. Dowla, and V. M. Johnson (1995), Optimal field-scalegroundwater remediation using neural networks and genetic algorithm,Environ. Sci. Technol., 29(5), 1145–1155.

Sheta, A. F., and A. Mahmoud (2001), Forecasting using genetic program-ming, Proceedings of the 33rd Southeastern Symposium on SystemTheory, pp. 343–347.

Sreekanth, J., and B. Datta (2010), Multi-objective management of salt-water intrusion in coastal aquifers using genetic programming and modu-lar neural network based surrogate models, J. Hydrol., doi:10.1016/j.jhydrol.2010.08.023.

Tiedeman, C., and S. M. Gorelick (1993), Analysis of uncertainty in opti-mal groundwater contaminant capture design, Water Resour. Res., 29(7),2139–2153, doi:10.1029/93WR00546.

Wagner, B. J., and S. M. Gorelick (1987), Optimal groundwater qualitymanagement under parameter uncertainty, Water Resour. Res., 23(7),1162–1174, doi:10.1029/WR023i007p01162.

Wagner, B., and S. Gorelick (1989), Reliable aquifer remediation in thepresence of spatially variable hydraulic conductivity: From data todesign, Water Resour. Res., 25(10), 2211– 2225, doi:10.1029/WR025i010p02211.

Wang, M., and C. Zheng (1998), Ground water management optimizationusing genetic algorithms and simulated annealing: Formulation andcomparison, J. Am. Water Resour. Assoc., 34(3), 519–530.

Wang, W. C., K. W. Chau, C. T. Cheng, and L. Qiu (2009), A compari-son of performance of several artificial intelligence methods for fore-casting monthly discharge time series, J. Hydrol., 374(3 – 4), 294 –306.

Yan, S. Q., and B. Minsker (2006), Optimal groundwater remediationdesign using an adaptive neural network genetic algorithm, WaterResour. Res., 42(5), W05407, doi:10.1029/2005WR004303.

Zechman, E., M. Baha, G. Mahinthakumar, and S. R. Ranjithan (2005), Agenetic programming based surrogate model development and its appli-cation to a groundwater source identification problem, ASCE Conf. Proc.173, 341.

B. Datta and J. Sreekanth, School of Engineering and Physical Sciences,James Cook University, Douglas Campus, Townsville, Qld 4811, Australia.([email protected])

W04516 SREEKANTH AND DATTA: ENSEMBLE SURROGATES FOR OPTIMAL COASTAL AQUIFERS W04516

17 of 17