IEEE TRANSACTIONS ON DEPENDABLE AND SECURE …kemper/papers/2006J12.pdfHowever, simulation of stochastic discrete event systems yields only estimates of the performance and dependability

Combining Response Surface Methodologywith Numerical Methods for Optimization of

Markovian ModelsPeter Kemper, Dennis Muller, and Axel Thummler

Abstract—In general, decision support is one of the main purposes of model-based analysis of systems. Response surface

methodology (RSM) is an optimization technique that has been applied frequently in practice, but few automated variants are currently

available. In this paper, we show how to combine RSM with numerical analysis methods to optimize continuous time Markov chain

models. Among the many known numerical solution methods for large Markov chains, we consider a Gauss-Seidel solver with

relaxation that relies on a hierarchical Kronecker representation as implemented in the APNN Toolbox. To effectively apply RSM for

optimizing numerical models, we propose three strategies which are shown to reduce the required number of iterations of the

numerical solver. With a set of experiments, we evaluate the proposed strategies with a model of a production line and apply them to

optimize a class-based queueing system.

Index Terms—Constrained optimization, Markov processes, sparse, structured, and very large systems, performance analysis and

design aids, communication networks.

�

1 INTRODUCTION

IN model-based design of computer and communicationsystems, optimization techniques can help to identify

optimal or nearly optimal configurations to supportdecision-making. From a conceptual point of view, anoptimization procedure searches for those parametersettings that maximize or minimize a given objectivefunction f. In many model-based designs, f depends on astochastic model of a discrete-event system and differentopportunities for its evaluation exist. In the case of finitestate Markov chains, numerical methods are known thatgive exact results with respect to a transient or steady statedistribution. Furthermore, simulation is an approach that iswidely applicable due to its relative lack of constraints.However, simulation of stochastic discrete event systemsyields only estimates of the performance and dependabilitymeasures, typically accompanied by confidence intervals[16]. Markovian models are often used in dependabilitymodeling [11]. Applications span over a set of fieldsincluding software rejuvenation, client/server models, andall kinds of systems with failure and repair of components.

In this paper, we consider f to be defined on a family of

continuous time Markov chains (CTMCs). Numerical

analysis of CTMCs has a number of challenges. In

particular, the generator matrix Q of a CTMC is often large

and very sparse. That has stimulated a lot of research on

data structures that represent Q in a space-efficient way and

iterative solution methods that converge quickly to the

resulting distribution. In the current situation, the spaceused for the resulting distribution is the bottleneck in termsof space if state-of-the-art representations of Q areemployed, i.e., symbolic structures like multiterminalbinary decision diagrams or matrix diagrams and structuresbased on a matrix algebra like modular and hierarchicalKronecker representations; see [5], [18] for recent over-views. An ample variety of numerical methods exist forsteady state analysis (see [23] for a textbook overview);however, no technique is known yet that is clearly superiorin general. A common property of all techniques is thatcomputations to obtain exact results are relatively costly.That has a considerable impact on the selection of anoptimization procedure to be applied for objective functionsthat require numerical analysis of a CTMC.

Optimization is a research area with a long tradition,particularly in the field of operational analysis, which hasgiven rise to a wealth of techniques. These include directsearch methods, simulated annealing, and evolutionaryalgorithms. A comprehensive overview of these and furthertechniques can be found in [22]. Genetic algorithms andevolution strategies have gained a lot of attention recently.Despite their impressive performance in many areas, it isknown that no single optimization method can be superiorin all areas—the no-free-lunch theorem [25]. The fact thatevolutionary algorithms tend to evaluate the objectivefunction at many parameter settings has motivated us tofocus on a different optimization method, namely, theresponse surface methodology (RSM) [20]. RSM has beenknown for a long time and a well-established theory exists.It was originally developed for optimization based on realexperiments and the algorithm is usually described in sucha way that several steps must be done manually. RSMrequires some refinements and extensions to integrate theapproach in an optimization package for stochastic models,

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 3, NO. 3, JULY-SEPTEMBER 2006 1

. The authors are with the Department of Computer Science, University ofDortmund, August-Schmidt-Str. 12, 44221 Dortmund, Germany.E-mail: {peter.kemper, dennis.mueller, axel.thuemmler}@udo.edu.

Manuscript received 23 Sept. 2005; accepted 13 Dec. 2005; published online4 Aug. 2006.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TDSCSI-0129-0905.

1545-5971/06/$20.00 � 2006 IEEE Published by the IEEE Computer Society

in which the experimentation and optimization is doneautomatically.

Recently, Neddermeijer et al. proposed a framework forresponse surface methodology for optimization of simula-tion models [21]. Their framework shows a possible way tocombine the various mathematical and statistical methodsthat belong to RSM. Nevertheless, their framework is stillfar from being an automated algorithm. Kleijnen andSargent proposed a procedure for linear regression meta-modeling in stochastic simulation [15]. Their approachdistinguishes between fitting and validating a singlemetamodel with respect to an underlying simulation model.To the best of our knowledge, until now no fully automatedrealization of the response surface methodology that istailored to the optimization of computationally expensivenumerical models is available.

In this paper, we present a novel approach for theoptimization of numerical Markovian models that is basedon mathematical and statistical methods that belong toRSM. Our approach iteratively uses first and second-orderlinear regression metamodels combined with a gradient-based method to find a direction of improvement. A keyissue is the algorithmic realization of RSM such that it canrun with very limited information or support from a user.Since RSM is able to tolerate an imprecise evaluation of theobjective function to a certain extent, we make use of thiseffect to adaptively adjust the precision of an iterativenumerical solution method that is used to obtain the steadystate distribution of a CTMC. Furthermore, we observe andmake use of the phenomenon that values of an objectivefunction, which result from an iterative solution, mayconverge much faster than the iterative solution itself. Thatgives rise to a novel, adaptive, and heuristic approach forRSM optimization with numerical solution methods. Theapproach trades computation time for accuracy.

We consider a small example of a production line for adetailed assessment on how the algorithm performs. Inparticular, we study the robustness of the algorithm for awide range of initial parameter settings. As a second andmore complex application example, we consider theoptimization of a class-based queueing system [10], [17].Class-based queueing (CBQ) is a “per hop” packet-schedul-ing mechanism that provides differentiated service to trafficflows of different types and is used as part of thedifferentiated service architecture (DiffServ) [13]. Accordingto [4], we develop a stochastic Petri net model of a CBQsystem and apply numerical analysis based on a Gauss-Seidel solver with relaxation for a hierarchical Kroneckerrepresentation of the generator matrix. The APNN toolboxis used for modeling and analysis [2].

The paper is organized as follows: Section 2 introducesthe basic concepts of the response surface methodology andSection 3 develops the fully automated RSM algorithm.Section 4 considers specific aspects of the numericalanalysis of stochastic models and proposes three strategiesfor combining RSM with numerical models. We optimize amodel of a production line in Section 5 and a model of aclass-based queueing system in the Appendix, which can befound on the Computer Society Digital Library at http://computer.org/tdsc/archives.htm. While the former model

is used to analyze the robustness of the proposed strategiesand configuration of RSM, the latter model is considered toillustrate applicability of the approach. Finally, concludingremarks are given.

2 MODEL OPTIMIZATION WITH RESPONSE SURFACE

METHODOLOGY

From an abstract mathematical point of view, a stochasticmodel can be represented by a function �ðwÞ, which maps avector of input parameters w ¼ ðw1; . . . ;wkÞ onto a set ofperformance measures of interest. Since � represents thestochastic nature of a stochastic model, the output perfor-mance measures are random variables, and differentcharacteristics of their distributions, such as mean, variance,further moments, or quantiles, could be of interest. Thatrelationship is expressed by M½�ðwÞ� ¼ fðwÞ, where f iscalled the response surface function, and M is a mapping fromthe random variables onto a real-valued algebraic combina-tion of certain characteristics of their distributions. The mostfrequently considered characteristic is the expectation ofthose random variables; however, any other measure can beused instead, given that the analysis method employed tocompute �ðwÞ is able to support it. Note that the vector ofinput parameters w may have to comply with a set ofconstraints and therefore not all possible combinations ofinput parameters are valid. Typical examples are mixtureproblems where the sum of all input parameters must be aconstant like the class-based queueing system presented inthe Appendix, which can be found on the Computer SocietyDigital Library at http://computer.org/tdsc/archives.htm.

The general optimization problem discussed in thissection is characterized by finding the input parametersthat maximize/minimize the response surface function.Since � is only implicitly represented as a stochastic model,or in other words as a black box, only those optimizationmethods that do not exploit the structure of � can beapplied. To solve that optimization problem, we usemathematical and statistical methods that belong to theResponse Surface Methodology (RSM) [20]. RSM is a determi-nistic optimization method that is able to identify an (local)optimum of the response surface function. In contrast toRSM, genetic algorithms [22] and simulated annealing [14]incorporate some randomness in their search strategy,which makes it nondeterministic and increases the chanceto find a global optimum in a multimodal function. In RSM,the stochastic model is evaluated for certain parametersettings that give design points in the response surface. Aregression model is adjusted to match the response surfaceat those points and the gradient of the regression model isused to direct a stepwise search procedure toward anextreme point. Hence, the initial setting determines atwhich (local) optimum the procedure will terminate. Fig. 1shows a possible course of the general RSM procedure in atwo-dimensional search space. RSM begins its search with asequence of first-order regression metamodels (those withindex I in Fig. 1), combined with a steepest ascent/descentsearch. In that phase, four corner points of a square aresimulated, and a first-order regression model is approxi-mated to characterize the response surface around the

2 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 3, NO. 3, JULY-SEPTEMBER 2006

current center point. In the final optimization phase, RSM

uses a second-order regression metamodel (index II in

Fig. 1) to estimate the response surface more accurately and

search for the optimum from the resulting fit.In RSM, the input parameters of the stochastic model are

usually called factors, whereas the stochastic output is called

the response of the model. Note that in general, input factors

can be continuous and/or discrete variables, and that

variants of RSM are known that handle mixtures of

continuous and discrete variables (see, e.g., [20]). However,

we focus on the continuous case in this paper and assume in

the following that all input factors are continuous variables.

In RSM, it is common to transform the natural input factors

w1; . . . ;wk into normalized or coded variables x1; . . . ; xk, each

ranging from -1 to 1. With that coding, all input factors are

defined to be dimensionless variables with a mean of zero

and the same spread. Let li and ui denote the lower limit

and upper limit of input factor wi; i ¼ 1; . . . ; k, respectively.

The transformation from wi to xi is performed by

xi ¼ ðwi �miÞ=bi, with the center and half-width of the

considered range being mi ¼ ðui þ liÞ=2 and bi ¼ ðui � liÞ=2,

respectively. The main steps of the proposed RSM

optimization algorithm are the following:

1. Approximate the response surface function in a localregion by a low-order linear regression metamodel.

2. Test the metamodel for adequate approximation.3. Use the metamodel to predict factor values that

improve the response.

Note that Steps 1 to 3 are repeated until a certain

stopping criterion is reached. In fact, after Step 3 has

finished and an improved response has been determined, a

new regression metamodel is approximated in the local

region around the improved response. The following

sections describe Steps 1 to 3 in more detail.

2.1 Approximating the Response Surface Functionby Regression Models

For estimation of regression coefficients of first-order and

second-order regression metamodels, we use the ordinary

least-squares (OLS) estimation. The first-order regression

metamodel in coded variables is given by

y ¼ �0 þXk

i¼1

�i � xi þ " ð1Þ

with kþ 1 regression coefficients �� ¼ ð�0; . . . ; �kÞ and anadditive error ". Suppose that n > k observations of theresponse variable, denoted by y ¼ ðyr1; . . . ; ynÞ, are avail-able. Each observed response yi is the result of anevaluation of the stochastic model with input factor valuesxi ¼ ðxi1; . . . ; xikÞ. With OLS estimation, the regressioncoefficients are determined such that the sum of squaresof errors "i for each factors/response pair ðxi; yiÞ in (1) isminimized. According to [20], the least-squares estimates ��of the regression coefficients �� are computed by

�� ¼ XTX� ��1

XTy; ð2Þ

where X is a n� ðkþ 1Þmatrix with row vectors ð1;xiÞ. Thesecond-order regression metamodel in the coded variablesis given by

y ¼ �0 þXk

i¼1

�ixi þXk

i¼1

�i;ixi2 þ

Xki¼1

Xk

j¼iþ1

�i;jxixj þ "; ð3Þ

where the first sum corresponds to the main effects, thesecond sum to pure quadratic effects, and the third sum tothe interaction effects between the variables. Note that thecomputation of estimates for the regression coefficients �i

and �i;j of the second-order model can be carried out muchlike that for the first-order model with OLS estimation basedon (2). To be precise, the regression coefficients �i;i and �i;j

with i ¼ 1; . . . ; k and j ¼ iþ 1; . . . ; k are mapped to newregression coefficients �kþ1 :¼ �1;1; . . . ; �2k :¼ �k;k; �2kþ1 :¼�1;2; . . . and the quadratic and interaction effects ofthe variables are mapped to new variables xkþ1 :¼x1 � x1; . . . ; x2k :¼ xk � xk; x2kþ1 :¼ x1 � x2; . . .; respectively.

Choosing an appropriate set of design pointsxi; i ¼ 1; . . . ; n, is called an experimental design [19]. In theproposed RSM algorithm, we use orthogonal designs withtwo levels (-1 and 1) for each factor for the approximationof first-order regression models, i.e., a 2k factorial design

(see index I in Fig. 1) or a 2k�p fractional factorial design. Toapproximate a second-order model, a 2k factorial designis augmented with nc center points and 2k axial pointsyielding a central composite design (see index II in Fig. 1).

In the presence of a constrained optimization problem, itis possible that predefined designs like the 2k factorialdesign require additional effort or cannot be applied at all.Therefore, a more general approach is required andalgorithms for computer generated designs apply to thissituation by selecting design points from a set of validcandidate points where the selection is guided by a givenoptimality criterion. In our algorithm, the candidate pointsare chosen by selecting all valid points on a grid over thelocal region. We use a combination of the Dykstra andFedorov algorithms [7], where the Dykstra algorithmcreates an initial design from the candidate points whichis optimized by the Fedorov algorithm. The optimalitycriterion we used is the so called d-optimality, whichdepends on the maximization of the determinant of thematrix XTX [7], [20].

KEMPER ET AL.: COMBINING RESPONSE SURFACE METHODOLOGY WITH NUMERICAL METHODS FOR OPTIMIZATION OF MARKOVIAN... 3

Fig. 1. Illustration of the response surface methodology in two

dimensions.

2.2 Testing the Regression Model for AdequateApproximation

Once we obtained a regression model, we need to testwhether it adequately describes the behavior of theresponse in the current region of interest. We use thecoefficient of determination, denoted by R2, for this purpose. Itis defined as the ratio of the variation that is explained bythe metamodel (SSR) to the total variation (SST), i.e.,

R2 ¼ SSR=SST ¼Pn

i¼1 yyi � �yyð Þ2Pni¼1 yi � �yyð Þ2

; ð4Þ

where �yy is the mean of y1; . . . ; yn and yyi is the response ofthe metamodel for input factors xi. It can be shown thatR2 2 ½0; 1� and that the closer R2 is to 1, the better the fit.However, a large value of R2 does not necessarily implythat the metamodel is a good approximation since adding aregression coefficient to the metamodel will always increaseR2, regardless of whether the additional variable isstatistically significant or not. Thus, it is more common touse the adjusted coefficient of determination

R2adj ¼ 1� n� 1

n� k� 11� R2� �

; ð5Þ

which reflects the number of coefficients k and which mightalso decrease if unnecessary terms are added to themetamodel. If the metamodel is found to be inadequate,we suggest reducing the size of the region of interest orusing a higher-order regression metamodel. Note that, it isnot customary to fit a regression metamodel in RSM with anorder higher than second order.

2.3 Predicting Factor Values of Improved Response

For first-order metamodels, we use the method of steepest

ascent/descent to predict a direction of improved response.

The direction of steepest ascent is given by the gradient of

the first-order metamodel, i.e., ð��1; . . . ; ��kÞ, and the direction

of steepest descent is given by the negative gradient. The

question of how to choose the step size for the line search

arises. A common approach is to choose a “most important

factor” xj according to the size of its regression coefficient,

i.e., j ¼ arg maxi¼1;...;k ��i

��, and to scale the values ��i by a

factor � ¼ 1= ��j

��. Using that approach, the first step results

in a point on the boundary of the local region correspond-

ing to factor value xj, which then has a value of ��j= ��j

�� 2

�1; 1f g in coded variables. Since the center of the local

region in coded variables is ð0; . . . ; 0Þ, the mth point in the

direction of steepest ascent is given by

m��1; . . . ;m��k

� �: ð6Þ

A stopping rule is required to end this type of line search.The most recommended rule is to stop the line search whenno further improvement of the response is observed [21]. Fig.2 illustrates the line search algorithm. For second-ordermetamodels, a point of maximum improvement of theresponse is determined via a canonical analysis, i.e., thestationary point is derived from the first derivative of theregression metamodel. The nature of the stationary point,i.e., whether it is a maximum, a minimum, or a saddle point,

can be determined by inspecting the signs of eigenvalues of acertain symmetric matrix B that is composed of theregression coefficients (see [20, p. 241], for further details).

3 A FULLY AUTOMATED RSM ALGORITHM

This section presents an approach that combines thebuilding blocks of the response surface methodology aspreviously introduced in a fully automated algorithmicway. Fig. 3 shows a high-level pseudocode representationof the RSM algorithm that we implemented and use for theoptimization of stochastic models. Our aim is to come upwith a robust, general purpose algorithm, not one that isspecially engineered for a particular case. An automatedalgorithm requires the determination of constants, decisionrules, and the sequence of operations. Since there are trade-offs between the quality of the computed results and therequired effort, we investigated the impact of constants anddecision rules on a number of benchmark examples. Wediscuss the particular decisions we made for the algorithmof Fig. 3 in the following section.

3.1 Selection of Parameter Ranges, Initial CenterPoint, and Size of Local Region

Before starting the algorithm, one has to choose the lowerand upper limits of each input parameter in order totransform the whole search space into a ½�1; 1�k hypercube,i.e., to transform the natural variables into coded variables(see Step 1 in Fig. 3). Furthermore, an initial center pointcnew and an initial half-width ! of the local region in theresponse surface must be defined or user given. Let cnew ¼ð0; . . . ; 0Þ be the default initial center point. This choice isquite natural since ð0; . . . ; 0Þ is the center of the search space½�1; 1�k. We experienced that ! 2 ½0:2; 0:4� is reasonable forthe half-width of the initial local region ½�!; !�k in a numberof empirical studies with different models and benchmarkfunctions. The quality of the result of the RSM algorithm isoften not sensitive to small variations in the initial setting of!, so we start with regions of size either ½�0:2; 0:2�k or½�0:4; 0:4�k. A small region is recommended if additionalconstraints reduce the set of valid values within the ½�1; 1�khypercube. With a small region, it is more likely that itsdesign points are all inside an area of valid values so thatstandard experimental designs like the 2k factorial designcan be applied. On the other hand, if the whole ½�1; 1�khypercube is our design space, a larger value of !, i.e.,! ¼ 0:4, has often been a good choice.


Fig. 2. Line search along the path of steepest ascent/descent.

3.2 Selection of Criteria for Termination

In Fig. 3, the main steps of the RSM optimization algorithmare performed in the while-loop from Step 3 to Step 27. Thealgorithm terminates if the half-width of the local regiondecreases below the threshold !stop. The choice of !stop isbased on several aspects. First, the size of !stop influencesthe number of iterations of the RSM algorithm; hence, itinfluences the amount of computation. Fixing ! and !stop

determines how often RSM can reduce the size of the localregion. In between two reductions of the local region, welearned from experiments that usually not more than threeiterations between lines 3 and 27 take place and the numberof iterations performed between lines 10 and 12 during thesteepest ascent is approximately constant within oneapplication of the RSM algorithm. Therefore, the choice of!stop influences the number of evaluations made during theoptimization process, which should not be too high. On theother hand, the size of !stop influences the quality of thesolution since local optima of the response surface can bedetermined more precisely with small values of !stop. All inall, lower values of !stop increase the number of evaluationsand may improve the result of the optimization process. Inthis context, we choose !stop ¼ 0:01 since it happened to be agood selection in some empirical studies and since values ofa response function are computed with reasonable precisiondue to a numerical solution method.

3.3 Selection of a Goodness-of-Fit Test

In each iteration of the algorithm, the current local region istransformed into a ½�1; 1�k hypercube. Then, the responsesurface function f is approximated by a first-order linearregression metamodel as discussed in Section 2. If the first-order model gives an adequate approximation of f, itsdirection of steepest ascent approximates the direction ofsteepest ascent in the response surface and can be used toguide the line-search algorithm that follows that direction toidentify a peak that becomes the next center point. Thegoodness-of-fit test is used to prevent a bad approximation ofsteepest ascent from misleading RSM and wasting computa-tion time in a false direction. Recall from Section 2 that we usethe adjusted coefficient of determination to check theadequacy of a first-order model. From a couple of experi-ments, we decided to accept the regression model ifR2

adj > 0:8. Note that this is a very mild condition, which onlyprevents from using a completely misleading metamodel.

3.4 Switching from First-Order to Second-OrderModels

If the line-search does not yield an improved response, thealgorithm reduces the half-width of the local region at thelast center point and starts a new RSM iteration. That isexactly what is implemented in Steps 4 to 14 in Fig. 3. If thefirst-order model is found to be no adequate approximationof the response surface function, it is likely that the true


Fig. 3. Pseudocode of the RSM optimization algorithm.

response surface has significant curvature in the localregion and may be better approximated by a second-orderquadratic metamodel. Nevertheless, we recommend theapproximation of a second-order model only in the finalsteps of the optimization procedure (i.e., if ! is less thanfour times the stop-width !stop) since a second-order modelrequires far more evaluations of the stochastic model than afirst-order model does. Thus, a better strategy is to decreasethe local region in order to better approximate a first-ordermodel (see Steps 14 and 25 in Fig. 3).

In the final optimization phase and close to an optimum,it is often worthwhile to use a second-order model. As forthe first-order model, a goodness-of-fit test is performed. Ifthe approximated model shows no significant lack-of-fit, apoint of improved response is predicted with a canonicalanalysis, i.e., the stationary point is derived from the firstderivative of the regression metamodel (see Steps 17 to 24 inFig. 3). At the end of all RSM iterations, the algorithmreturns the center point of the local region, after transfor-mation from coded to natural variables, as the optimal/bestsolution it could find.

4 SPECIALIZING RSM FOR OPTIMIZING NUMERICAL

MODELS

The RSM approach of the foregoing section requires amethod to evaluate the responses yi for any given set ofvalues of input parameters. Precisely, Steps 6 and 17 of thealgorithm given in Fig. 3 require an evaluation of eachindividual design and center point. In this section, wediscuss how iterative numerical procedures of steady stateanalysis for CTMCs can be combined with RSM.

Numerical analysis of a CTMC with generator matrix Qcomputes values for a set R of rate or impulse rewards bycomputation of a steady state distribution ��. Distribution ��is a solution of ��Q ¼ 0. Due to the large dimensions of Q,i.e., the large cardinality of the set of states S, and sparsity ofQ, it is common practice to employ iterative fix pointalgorithms to solve ��Q ¼ 0; see [23] for a comprehensivetextbook explanation. Simple methods for stationary analy-sis are the power method, the method of Jacobi, and themethod of Gauss-Seidel. More involved methods includeprojection methods like GMRES and the method of Arnoldi.Decompositional methods like iterative Aggregation/Dis-aggregation methods and recent Multilevel methods aim tosolve equations at different levels of granularity. Selectioncriteria for application on a particular CTMC are speed ofconvergence, computation time, and memory requirements(plus availability in a tool). Since there is no clear bestchoice known for speed of convergence and required CPUtime, in general, we focus on memory requirements.

Generator matrices Q are usually automatically generatedfrom some modeling formalism and typically provide somestructure such that symbolic representations like MTBDDs,MxDs, or Kronecker representations are particularly space-efficient, which moves the bottleneck in terms of space to theiteration vectors [5], [18]. Those data structures are based on adivide-and-conquer approach that makes effective use ofrepeated information in Q, such that it is represented inspace just once but potentially used more often. Numerical

methods differ in the number of vectors they require; forinstance, Gauss-Seidel can be implemented with a singleiteration vector, while others need more, in particular,projection methods need a significant number of vectors torepresent a Krylov subspace. Therefore, we focus on Gauss-Seidel with relaxation (SOR) in the following paragraph.

For any given iterative solution method, we need todecide on the initial distribution ��0 and the requiredprecision ". Those two issues give rise for the identificationof three strategies to allow us to proceed with RSM andnumerical CTMC analysis.

4.1 Selection of Initial Distribution

RSM repeatedly solves CTMCs of the same model but forsome modified input parameter values w, i.e., a set W ofconfigurations is considered with corresponding generatormatrices Qw for w 2W. A modification of parameters mayhave different effects on Qw:

1. Dimensions and nonzero structure of all Qw infQwjw 2Wg are the same, only numerical values atcertain positions change. Those changes can besignificant for the performance of numerical solutionmethods, since one may modify rates to introduce orremove different time scales in the model. However,evaluation of a set of matrices may imply that Qw

can be derived from the generator matrix Qw0 of aprevious configuration w0 without a costly genera-tion from the model description.

2. Dimensions and structure of the CTMC are changed.In that case, Qw has to be generated from the modeland solution has to start from the beginning.

In the following, we focus on Case 1. It is well known thatthe selection of ��0 has an impact on the number of iterationsneeded to compute solution ��. The repetitive application of anumerical solver to CTMCs of the same dimension but fordifferent entries allows us to start a subsequent SOR solutionwith some previously computed ��. Clearly, the more similarthose CTMCs (as seen by the distance in the Euclidean spacein Fig. 1) are, the more likely it is that a previously computed�� for one CTMC will give a good initial distribution foranother. Note that design points of the first experimentaldesign are rather far apart but that RSM reduces the width ofthe local region, i.e., the distance between design points, thecloser it gets to its termination. This may imply similarity ofCTMCs and their solutions. In consequence, we formulate aheuristic strategy for selection of ��0.

Strategy 1. For the initial distribution ��0 of the currentlyconsidered configuration w, use a previously computedsolution ��0 of a configuration w0 that is close to w.

In RSM, we evaluate design points of first or second-order models and design points during the line-search. In afirst-order model, the center point is close to the designpoints in the Euclidian space. In a second-order model witha cyclic evaluation of design points and during the line-search, the configuration considered directly before thecurrent one is the closest. Thus, the selection has theadditional positive effect that it is not necessary to storemany previously computed solutions.


4.2 Selection of Precision

It is common practice to use several measures ofdistance to decide convergence; for example, they mightinclude the maximum difference between consecutiveiteration vectors d1 ¼ maxs2S j�iðsÞ � �iþ1ðsÞj, the maxi-mum residual d2 ¼ maxs2S jvðsÞj, and the sum of resi-duals d3 ¼

Ps2S jvðsÞj, where v ¼ ��iQ. Convergence is

identified if dk � " for k ¼ 1; 2; 3 for some user-giventhreshold value ". Note that all three measures areapproximations. The response function makes use of ��

with the help of rate or impulse rewards. Since it iscommon practice to encode impulse rewards in the statespace [9], we can consider a set X with reward vectorsrx of nonnegative real values and dimension jSj. Areward value ~rrx corresponding to the reward vector rx iscomputed by the weighted sum

~rrx ¼X

s2SrxðsÞ � �ðsÞ; ð7Þ

where rxðsÞ is the element of vector rx that corresponds tostate s.

Note that associated reward values often do come from asmall finite set of real values. Let Rx denote the set of suchvalues; then

~rrx ¼X

r2Rxr �X

s2S;rxðsÞ¼r�ðsÞ: ð8Þ

Hence, if iteration vectors differ only in such a way that theinner sum in (8) remains mainly unchanged, the rewardvalue ~rrx will remain the same. Therefore, for iterationvectors of steady state solvers, a sequence of reward valuesbased on intermediate vectors pi will converge due toconvergence of ��i, but the convergence rates may differ.Since RSM relies on a sufficiently accurate evaluation of theresponse function f, it may be possible to stop the iterationprocedure for rather large values of " if the evaluation of theresponse function has converged, i.e., we can defined4 ¼ jfðw; ��iÞ � fðw; ��iþ1Þj as a further measure for conver-gence where fðw; ��iÞ gives the response for an intermediatevector ��i and a model configuration w. Since, in general, theconvergence of the response function or its rewardfunctions is as unknown as the convergence of ��i vectors,

it either requires intermediate evaluations or is only knowna posteriori once the numerical procedure has converged.The evaluation of d1; . . . ; d4 implies different efforts. Whiled1 is computationally inexpensive, d2 and d3 are inexpen-sive for the Power method and Jacobi but require anadditional matrix-vector multiplication for SOR, and d4

requires the evaluation of a set of rewards and thesubsequent evaluation of the algebraic expression of f.Hence, we look for ways to avoid the computation ofsolutions that are unnecessarily precise as well as frequentevaluation of d4.

We assume the following. If we need a certain precisionmeasured by d1 < "0 (or d2; d3) to obtain the precisionmeasured by d4 that we want, then all matrices of fQwjw 2Wg are similar in that respect. In consequence, we candetermine "0 for one particular Qw and use it for all othermatrices as well. Based on this assumption, we formulatethe following heuristic strategy for RSM:

Strategy 2.

a. Inspection Step: Evaluate the points of the experi-mental design of the local region that RSM considerswith the procedure presented in Fig. 4; each point givesa value for " ¼ "0 at which one does not observe asignificant change of the response; take the minimumvalue of those values.

b. Main Step: Evaluate the subsequent points during theRSM run just with precision "0 to save iterations atthe price of a potential lack of accuracy.

Note that the inspection step implies that the procedureadapts itself at the beginning of the RSM optimizationprocess. Clearly, there is a lot of room for variations basedon the decision at which design points "0 is determined,whether the minimum, mean or maximum of all sugges-tions for "0 is taken and whether it should be determinedonce, occasionally, or frequently during the RSM algorithm.The algorithm of the inspection step in Fig. 4 has twoconstants, namely, 10�2 in Step 8 and 10�4 in Step 14; weselected those values based on experience but note that only10�4 in Step 14 influences what is considered as conver-gence of response values. The algorithm has two parts. InSteps 4 to 8 in Fig. 4, the Gauss-Seidel solver is forced to


Fig. 4. Inspection step of RSM Strategy 2.

continue iterating until the relative difference betweenconsecutive response values is below a threshold 10�2,where response values yðiÞ; i ¼ 1; 2; . . . , are computed fromintermediate distributions ��ðiÞ only if d2 < 10�i, i.e., if theaccuracy of the intermediate distribution is increased byone order of magnitude. In the second part of the inspectionstep (see Steps 10 to 14 in Fig. 4), the response function isevaluated more frequently in order to determine the valueof "0 more precisely such that the relative differencebetween the responses is below 10�4. Note that Steps 6and 12 should perform at least one iteration of the Gauss-Seidel solver to ensure correct functionality of the algo-rithm. The reason for not applying the inspection step forevery evaluation of the stochastic model is to avoid theeffort of a multiple computation of the response functionduring the iterations of the equation solver in the main stepof the RSM optimization process.

Recall that RSM estimates a direction of steepest ascent/descent with the help of a first-order model that onlyapproximates the real response surface. Hence, some lack ofprecision from the estimation would add only to theimprecision of the gradient estimation and need not becritical. A determination of threshold values for that trade-off is clearly model-dependent. Nevertheless, the followingassumption seems natural. Since RSM initially considersrather large regions and it is likely to be far from theoptimal solution, a very moderate precision is supposed tobe sufficient to identify a promising direction. In a laterphase and closer to an optimum, a higher accuracy isnecessary to identify an optimum or to get close to it. Thatresults in the third heuristic strategy that we propose.

Strategy 3. Start with an initial precision of "0 and update theaccuracy if RSM is forced to reduce the local region of a first-order model significantly.

In Strategy 3, the initial precision "0 and the updatedprecision result from the inspection step of Strategy 2. Forour implementation, we considered a reduction of the half-width ! of the local region below 0.06 (in coded variables,which is about 3 percent of the current range) to besignificant. In Section 5 and the Appendix, which can befound on the Computer Society Digital Library at http://computer.org/tdsc/archives.htm, we evaluate the effect ofthe proposed RSM algorithm when applying the threestrategies with the help of two example models.

5 EVALUATION OF THE RSM OPTIMIZATION

ALGORITHM

This section investigates the impact of the three strategiesproposed in Section 4 on the performance of the RSMoptimization algorithm, where performance is measured interms of the overall number of SOR iterations required byRSM as well as the quality of the solution returned by RSM.Furthermore, we evaluate the robustness of the RSMalgorithm for a wide range of initial parameter settings,i.e., starting points.

The studies are applied to a production line of amanufacturing plant comprised of N service queues ar-ranged in a row, that is, parts leaving a queue after service

are immediately transferred to the next queue [6]. Through-put and costs have an impact on the profit generated by thesystem; the common objective is to maximize profit. Weconsider a conceptually simple example with N ¼ 2 to allowfor an extensive evaluation of the three strategies. Let allqueues have a finite capacity K ¼ 20. Arrivals of parts to thefirst queue occur according to a Poisson process with rate� ¼ 0:5. For subsequent queues, we assume arriving partswill get lost if the queue buffer is occupied. Each queue iscomprised of a single server with first-come, first-served(FCFS) service discipline and exponentially distributedservice time. The service rate (i.e., speed of the server) atqueue n is denoted with �n. Furthermore, the vector �� ¼ð�1; . . . ; �NÞ of service rates is subject to be optimized byRSM with respect to a revenue function

Rð��Þ ¼ r �Xð��Þc0 þ cT � �� ; ð9Þ

where Xð��Þ is the throughput of the production line (i.e.,the time-averaged number of parts leaving the last queue)for service rates �� and r, c0, and c are constants representinga revenue factor, basic costs that occur independent ofservice rates, and a vector with cost factors for each server,respectively. Since Xð��Þ is decreasing by decreasing any ofthe �n; n ¼ 1; . . . ;N, the revenue function (9) clearlyquantifies the trade-off between a high throughput (i.e., ahigh production rate) and costs of providing fast service. Inthe experiments the revenue factor is r ¼ 10, the cost factorsare c ¼ ð1; 4Þ, and basic costs are c0 ¼ 1. Furthermore, thesearch space is bounded to �� 2 ½0; 1�2, which results invalues of Rð��Þ ranging approximately between 0 and 1.18.

In a first experiment, we investigate the dependence ofthe number of iterations of the numerical solver on theresidual measures d2 and d3 and on the response value.Fig. 5 and Fig. 6 show a sequence of values for residuals andthe response function over the number of iterationsperformed by the numerical solver. Individual measuresare taken every 50 iteration steps of an SOR solver, with arelaxation of 0.95 that starts with a uniform distribution for��0. We consider two different settings of the service rates,i.e., ��0 ¼ ð0:4; 0:4Þ and ��00 ¼ ð0:6; 0:5Þ, respectively. Thenumerical solution shows a remarkable behavior withrespect to convergence of residuals and the responsefunction. At service rates ��0, the value of the response does


Fig. 5. Response and residuals versus number of iterations for

�0 ¼ ð0:4; 0:4Þ.

not change significantly after only a few number ofiterations whereas for service rates ��00, we observe asignificant change of the response until about 750 iterationsor a maximum residual d2 � 10�6. In practice, solutions ofd2 < 10�8 to d2 < 10�12 would be considered reasonablyaccurate. Considering (8), we recognize that the set ofreward values Rx is a set of only two values, namely,Rx ¼ 0; �2. Both examples confirm our assumption that anapproximate evaluation of the response function based onnumerical solutions of low accuracy can give reasonablygood approximations in significantly less computation time.This justifies the inspection step of RSM Strategy 2 whichaims to determine the required precision.

To show the dependence of the response value on themaximum residual used for the numerical solver, we plottedin Fig. 7 and Fig. 8 the response surface with stoppingcriteria d2 < 10�2 and d2 < 10�10, respectively. While Fig. 8can be assumed to be a sufficiently accurate representationof the response surface, in Fig. 7, the optimum is moved tothe boundary of the search space. Nevertheless, we observethat with d2 < 10�2 we still obtain good results whenevaluating the model for service rates �1 < 0:5 or �2 < 0:5.Since the optimal service rates reside in the region with �1 >0:5 and �2 > 0:5, a lower value of d2 will be required to finishthe optimization run with sufficiently good results. Thisagrees with RSM Strategy 3, which adjusts the numericalprecision after some steps of RSM.

Recall that RSM Strategy 2 determines the precision d2 atthe beginning of the optimization process using the design

points from the first experimental design. The above resultslead us to the conclusion that using Strategy 2 only once atthe beginning of the optimization process may result in aprecision which is too low to find the correct optimum. Toshow this, we applied the inspection step of Strategy 2 to aset of service rates arranged on grid with 21� 21 pointswithin the search space ½0; 1�2 (see Fig. 9). The figure givesan impression at which service rates which precision will bechosen. Especially at the boundaries with one service ratenear zero, the precision found by Strategy 2 is too low todetermine the correct optimum though these values aregood enough to leave the current region and move towardthe optimum. We conclude that an optimization usingStrategy 2 without Strategy 3 will lead to a low number ofiterations of the numerical solver but will also providepoorer results.

In a final experiment, several combinations of the threeRSM strategies have been compared to the case when nostrategy is applied, which we refer to as the referenceconfiguration. In the cases without Strategies 2 or 3numerical results are computed with precision d2 < 10�12.Every combination of the strategies has been exercised121 times once for each of the 121 different initial points ona grid with step size 0.1 within the valid service rates. Theresults are presented in Table 1 and in Fig. 10 where thebrighter gray bars represent the average response of thebest point found by the RSM optimization algorithm overthe 121 replications with error bars indicating the maximumand minimum response. The darker gray bars represent theaverage number of iterations of the numerical solver duringone replication of the RSM algorithm. Note that the numberof iterations of the numerical solver during one replication


Fig. 6. Response and residuals versus number of iterations for

�00 ¼ ð0:6; 0:5Þ.

Fig. 7. Response surface with maximum residuals less than 10�2.

Fig. 8. Response surface with maximum residuals less than 10�10.

Fig. 9. Starting precisions determined with the inspection step of

Strategy 2.

is the sum of its iterations over all evaluations made duringthis replication.

If we compare the results of the reference configuration(indicated with None) with those obtained with Strategy 1,we recognize that Strategy 1 helps to reduce the number ofiterations but the response is not affected. When comparingthe results of using Strategy 2 and Strategies 2 and 3 withthe results using these strategies combined with Strategy 1,we observe that the number of iterations is decreased whilethe average and minimal response is increased (Strategy 2versus Strategies 1 and 2) or remains almost unchanged(Strategies 2 and 3 versus Strategies 1, 2, and 3). This showsthat using Strategy 1 has indisputable positive effects. Whenusing Strategy 2 the number of iterations of the numericalsolver is reduced by more than 50 percent compared to thereference configuration but there is a tradeoff in the qualityof the response. While Strategy 2 reduces the averageresponse on the grid only slightly, the minimal response isreduced from about 1.17 to 0.82. This performance ofStrategy 2 was expected and confirms the conclusion madefrom Fig. 7 and Fig. 8. Recall that the additional use ofStrategy 3 forces RSM to adjust the precision after somesteps of the RSM algorithm (see Section 4). As expected, thecombined use of Strategies 2 and 3 improves the quality ofthe minimum response on the grid. The minimum responseis about 0.96 which is a significant improvement comparedto the minimum response when using Strategy 2 alone.

There are three conclusions to be drawn from Fig. 10 andthe discussion above. The first conclusion is that Strategy 1can be used without further risk as it provides at least equalresults while reducing the time needed by the optimization.The second conclusion is that Strategy 2 can significantlyreduce the number of iterations needed by the numericalsolver but it is outperformed by the combination ofStrategies 2 and 3. The third and final conclusion is thatthe combined use of RSM Strategies 1, 2, and 3 is the bestchoice since it saves about 50 percent of iterations, i.e., about50 percent CPU time, while still resulting in an almostoptimal response.

6 CONCLUSION

We presented an approach for the optimization of stochasticmodels that is based on the response surface methodology.In contrast to previous work, our approach works in a fullyautomated way and is tailored to the optimization ofcomputationally expensive numerical models. During the

optimization process, we iteratively use first and second-order linear regression metamodels combined with agradient-based method to find a direction of improvement.A high flexibility of the optimization algorithm is achievedthrough the use of different types of experimental designsincluding computer generated designs which are shown tobe useful for constrained optimization problems.

The numerical analysis is based on a Gauss-Seidel solverwith relaxation that uses a hierarchical Kronecker represen-tation for a given continuous time Markov chain. Since RSMimplies a repeated solution of related CTMCs, we developedthree strategies to reduce the number of iterations persolution. First, we use a previously computed distribution asa sophisticated guess for an initial distribution of asubsequent computation (see Strategy 1). Second, weidentify the minimal precision required to solve one CTMCfor a reasonably precise value of the response function andinfer that that precision is acceptable for the solution of otherCTMCs (see Strategy 2). Third, we increase precision duringthe RSM search if necessary (see Strategy 3). All strategiesaim to reduce the amount of computation time perevaluation of the response function. A small example of aproduction line showed that the RSM strategies are robustfor a wide range of initial parameters. The applicability of theapproach for a very large example is demonstrated with aclass-based queueing system (see the Appendix, which canbe found on the Computer Society Digital Library at http://computer.org/tdsc/archives.htm).

Ongoing work goes into further empirical analysis over arange of numerical solvers and benchmark models to givemore statistical evidence to the robustness of the proposedstrategies. Furthermore, we should mention that there existsinteresting work focusing on sensitivity analysis of relia-bility and performability measures [3], which could be usedas additional guides during the optimization process.

REFERENCES

[1] M. Ajmone Marsan, G. Balbo, G. Conte, S. Donatelli, and G.Francheschinis, Modelling with Generalized Stochastic Petri Nets.John Wiley and Sons, 1995.

[2] F. Bause, P. Buchholz, and P. Kemper, “A Toolbox for Functionaland Quantitative Analysis of DEDS,” Proc. 10th Int’l Conf.Computer Performance Evaluation—Modelling Techniques and Tools(TOOLS ’98), LNCS, vol. 1469, pp. 356-359, Springer, 1998.

[3] J.T. Blake, A.L. Reibman, and K.S. Trivedi, “Sensitivity Analysis ofReliability and Performability Measures for MultiprocessorSystems,” Proc. ACM SIGMETRICS Conf. Measurement and Model-ing of Computer Systems, pp. 177-186, 1988.


TABLE 1Results of the RSM Strategies

Fig. 10. Impact of the RSM strategies.

[4] P. Buchholz and A. Panchenko, “Numerical Analysis andOptimisation of Class Based Queueing,” Proc. 16th EuropeanSimulation Multiconf., pp. 543-547, 2002.

[5] P. Buchholz and P. Kemper, “Kronecker Based Matrix Represen-tations for Large Markov Models,” Validation of Stochastic Systems,lNCS C. Baier, B.R. Haverkort, H. Hermanns, J.P. Katoen, and M.Siegle, eds., vol. 2925, pp. 256-295. Springer, 2004.

[6] P. Buchholz and A. Thummler, “Enhancing Evolutionary Algo-rithms with Statistical Selection Procedures for SimulationOptimization,” Proc. ACM Winter Simulation Conf. (WSC), 2005.

[7] R.D. Cook and C.J. Nachtsheim, “A Comparison of Algorithms forConstructing Exact D-Optimal Designs,” Technometrics, vol. 22,pp. 315-324, 1980.

[8] A. Cumani, “On the Canonical Representation of HomogeneousMarkov Processes Modeling Failure—Time Distributions,”J. Microelectronics and Reliability, vol. 22, pp. 583-602, 1982.

[9] D.D. Deavours, G. Clark, T. Courtney, D. Daly, S. Derisavi, J.M.Doyle, W.H. Sanders, and P.G. Webster, “The Mobius Frameworkand Its Implementation,” IEEE Trans. Software Eng., vol. 28,pp. 956-969, 2002.

[10] S. Floyd and V. Jacobson, “Link-Sharing and Resource Manage-ment Models for Packet Networks,” IEEE/ACM Trans. Networking,vol. 3, pp. 365-386, 1995.

[11] B.R. Haverkort, “Markovian Models for Performance and De-pendability Modeling,” Formal Methods and Performance Analysis(FMPA ’00), LNCS, E. Brinksma, H. Hermanns, and J.-P. Katoen,eds., vol. 2090, pp. 38-83, Springer, 2001.

[12] Internet Traffic Archive, ita.ee.lbl.gov/index.html, 2005.[13] V. Jacobson, K. Nichols, and L. Zhang, “A Two-Bit Differentiated

Service Architecture for the Internet,” Request for Comments,vol. 2638, Internet Eng. Task Force, 1999.

[14] S. Kirkpatrick, C.D. Gelatt, and M.P. Vecchi, “Optimization bySimulated Annealing,” Science, vol. 220, pp. 671-680, 1983.

[15] J.P.C. Kleijnen and R.G. Sargent, “A Methodology for Fitting andValidating Metamodels in Simulation,” European J. OperationalResearch, vol. 120, pp. 14-29, 2000.

[16] A.M. Law and W.D. Kelton, Simulation Modeling and Analysis,third ed. McGraw-Hill, 2000.

[17] A. Michalas, M. Louta, P. Fafali, G. Karetsos, and V. Loumos,“Proportional Delay Differentiation Provision by BandwidthAdaptation of Class-Based Queue Scheduling,” Int’l J. Comm.Systems, vol. 17, pp. 743-761, 2004.

[18] A. Miner and D. Parker, “Symbolic Representations and Analysisof Large Probabilistic Systems,” Validation of Stochastic Systems,LNCS, C. Baier, B.R. Haverkort. H. Hermanns, J.P. Katoen, andM. Siegle, eds., vol. 2925, pp. 296-338, Springer, 2004.

[19] D.C. Montgomery, Design and Analysis of Experiments, fifth ed.John Wiley and Sons, 2001.

[20] D.C. Montgomery and R.H. Myers, Response Surface Methodology:Process and Product Optimization Using Designed Experiments,second ed. John Wiley and Sons, 2002.

[21] H.G. Neddermeijer, G.J. vanOortmarssen, N. Piersma, and R.Dekker, “A Framework for Response Surface Methodology forSimulation Optimization,” Proc. ACM Winter Simulation Conf.(WSC), pp. 129-136, 2000.

[22] H.P. Schwefel, Evolution and Optimum Seeking. John Wiley & Sons,1995.

[23] W.J. Stewart, Introduction to the Numerical Solution of MarkovChains. Princeton Univ. Press, 1994.

[24] A. Thummler, P. Buchholz, and M. Telek, “A Novel Approach forFitting Probability Distributions to Real Trace Data with the EMAlgorithm,” Proc. Int’l Conf. Dependable Systems and Networks(DSN), 712-721, 2005.

[25] D.H. Wolpert and W.G. Macready, “No Free Lunch Theorems forOptimization,” IEEE Trans. Evolutionary Computation, vol. 1,pp. 67-82, 1997.

Peter Kemper received the diploma degree incomputer science (Dipl.-Inform., 1992) and adoctoral degree (Dr. rer. nat, 1996), both fromUniversitat Dortmund, Germany, where he per-forms research and lectures in the Departmentof Computer Science. His main interests are inthe quantitative evaluation of systems andformal aspects of software engineering. Inaddition to modeling, techniques, and tools forperformance and dependability assessment of

computer and communication systems, he also works on a model-basedevaluation of manufacturing systems and logistic networks. He haspublished more than 60 technical papers in these areas. Since 1998, Dr.Kemper has contributed to the Collaborative Research Center onModeling Large Networks in Logistics, SFB 559, funded by DeutscheForschungsgemeinschaft. He has contributed to several tools forfunctional and quantitative analysis of discrete event systems includingthe QPN tool, APNN toolbox, and the ProC/B toolset.

Dennis Muller received the diploma degree incomputer science (Dipl.-Inform., 2004) from theUniversity of Dortmund, Germany. Currently, heis a PhD student at the modeling and simulationgroup at the Department of Computer Science,where he is contributing work to DoMuS and theCollaborative Research Center on modelinglarge networks in logistics, SFB 559. Hisresearch interests are focused on modeling,simulation, and optimization.

Axel Thummler received the degree Diplom-Informatiker (MS in computer science) in April1998 and the degree Dr. rer. nat. (PhD incomputer science) in July 2003, both from theUniversity of Dortmund. Presently, he is aresearch scientist in the modeling and simulationgroup in the Department of Computer Science atthe University of Dortmund. His research inter-ests include simulation optimization, communi-cation networks, mobile computing, and

performance evaluation techniques. He has published more than20 technical papers in these areas.

. For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.


IEEE TRANSACTIONS ON DEPENDABLE AND SECURE …kemper/papers/2006J12.pdfHowever, simulation of stochastic discrete event systems yields only estimates of the performance and dependability

Documents