-
Approximation models in optimization functions
Alan Daz Manrquez
Abstract
Nowadays, the most real world problems require the optimization
of one or more goals, different techniqueshave been used for the
solution of such problems. However, evolutionary algorithms have
shown flexibility,adaptability and good performance in this class
of problems. The main disadvantage of these algorithms isthat they
require too many evaluations of the fitness function to achieve an
acceptable results. Therefore, whenthe fitness function is
computationally expensive (eg simulations of engineering problems)
the optimizationprocess becomes prohibitive. To reduce the cost,
surrogate models, also known as metamodels, are constructedand used
instead of the real fitness function. There are a wide variety of
work on this class of models, however,most existing work does not
justify the choice of the metamodel. There are studies in the
literature that makea comparison between different techniques for
creating metamodels. However, most only make comparisonsbetween two
techniques, or only taken as a point of comparison the accuracy of
the metamodel. A mainproblem of the surrogate models is the
scalability, due to that, the most have a good performance for
arelatively low number of variables but has a poor performance for
high dimensionalities. In this work, wecompare four metamodeling
techniques (Polynomial approximation models, Kriging, Radial basis
functions,Support vector regression), different aspects are taken
into account for measuring the performance in sixscalables problems
of optimization that representing different classes of problems.
The objective of this studyis investigate the advantages and
disadvantages of the metamodeling techniques in different test
problems,measuring performance based on multiple aspects, including
scalability.
1 Introduction
In recent years, Evolutionary Algorithms (EAs) have been applied
with a great success to complex optimiza-tion problems. The main
advantage of the EA lies in their ability to locate close to the
global optimum.However, for many real-world problems, the number of
calls to the objective function for locate a near opti-mal solution
may be too. In many science and engineering problems researchers
make heavy use of computersimulation codes in order to replace
expensive physical experiments and improve the quality and
performanceof engineered products and devices. For example,
Computational Fluid Dynamics (CFD) solvers, Computa-tional Electro
Magnetics (CEM) and Computational Structural Mechanics (CSM) have
been shown to be veryaccurate. Such simulations are often times
very expensive computationally. A simulation can take
severalminutes, hours, days or even weeks or years. Hence, in many
real world optimization problems, the numberof objective function
evaluations needed to obtain a good solution dominates the
optimization cost, that is,the optimization process is taken up by
runs of the computationally expensive analysis codes.
In order to obtain efficient optimization algorithms, it is
crucial to use prior information gained during theoptimization
process. Conceptually, a natural approach to utilizing the known
prior information is buildinga model of the fitness function to
assist in the selection of candidate solutions for evaluation. A
variety oftechniques for constructing of such a model, often also
referred to as surrogates, metamodels or approximationmodels for
computationally expensive optimization problems have been
considered.
There are a variety of work on such problems, however, most
existing work does not justify the choice ofthe metamodel or do not
make a comparison between different surrogate models. In this work,
we focus on astudy on surrogate models, evaluating different
aspects, in order to choose correctly the use of a
metamodel,depending on the interests of the algorithm.
The remainder of this work is organized as follows. We begin
with a brief overview of surrogate modelingtechniques commonly used
in the literature. Section 3 presents an overview of the state of
the art of evolu-tionary algorithms with surrogate models. Section
4 presents the proposed methodology for the comparison
1
-
of the metamodeling techniques. Experimental results obtained on
synthetic test problems are presented insection 5 and the
discussion of results. Finally, section 6 summarizes our main
conclusions.
2 Background
A surrogate model is a mathematical model that mimics the
behavior of a computationally expensive simula-tion code over the
complete parameter space as accurately as possible, using as little
data points as possible.There are a variety of techniques for
creating metamodels: rational functions, radial basis functions,
arti-ficial neural networks, kriging models, support vector
machines, splines, polynomial approximation models.The following
are the most common approaches to constructing approximate models
based on learning andinterpolation from known fitness values of a
small population, also know as metamodeling techniques.
2.1 Polynomial approximation models
The response surface methodology (RSM) approximation is one of
the most well established meta-modelingtechniques. This methodology
employs the statistical techniques of regression analysis and
analysis of variancein order to obtain minimum variances of the
responses.
For most responses surfaces, the function use for approximation
are polynomials because of their simplicity,although other types of
function are, of course, possible.
In general, a polynomial in the coded inputs x1, x2, ..., xk is
a function which is a linear aggregate (orcombination) of powers
and products of the xs. A term in the polynomial is said to be of
order j (or degreej) if it contains the product of j of the xs. A
polynomial is said to be of order d, or degree d, if the term(s)of
highest order in it is (are) of order or degree d.
The response surface for d = 2, k = 2 and if x1 and x2 denote
two coded inputs is described as follows:
y(p) = (0) + (1x(p)1 + 2x
(p)2 ) + (11x
(p)1 x
(p)1 + 22x
(p)2 x
(p)2 + 12x
(p)1 x
(p)2 ) (1)
where yp is the response to x(p)1 and x
(p)2 . In the expression 1 the coefficients
s are coefficients of(empirical) parameters which, in practice,
have to be estimated from the data.
The polynomial model is written in matrix notation as:
yp = T xp (2)
where is the vector of coefficients to be estimated, and xp is
the vector corresponding to the form of thex(p)1 and x
(p)2 terms in the polynomial model.
As seen from Table 1, the number of parameters increases rapidly
as the number, k, of the input variablesand the degree, d, of the
polynomial are both increased.
Table 1: Number of coefficients in Polynomials of Degree d
involving k inputs
Degree of Polynomial, dNumber of 1 2 3 4inputs k Planar
Quadratic Cubic Quartic2 3 6 10 153 4 10 20 354 5 15 35 705 6 21 56
126
To estimate the unknown coefficients of the polynomial model,
both the least squares method (LSM) andthe gradient method can be
used, but either of them requires at least the same number of
samples of the realobjective function than the number of
coefficients in order to obtain good results.
The PRS can be construct by a Full Regression or by a Stepwise
Regression, the basic procedure forStepwise Regression involve (1)
identifying an initial model, (2) iteratively stepping, that is,
repeatedly
2
-
altering the model at the previous step by adding or removing a
predictor variable in accordance with thestepping criteria, and (3)
terminating the search when stepping is no longer possible given
the steppingcriteria, or when a specified maximum number of steps
has been reached.
The principal advantages of PRS is that the fitness of the
approximated response surface can be evaluatedusing powerful
statistical tools and the minimum variances of the response
surfaces can be obtained usingdesign of experiments with a small
number of experiments.
In practice, we can often proceed by supposing that, over
limited regions of the factor space, a polynomialof only first or
second degree might adequately represent the true function.
Higher-order polynomials can beused; however, instabilities may
arise [1], or it may be too difficult to take sufficient sample
data to estimateall of the coefficients in the polynomial equation,
particularly in large dimensions. In this work, second degreePRS
models are considered.
In this work the code for PRS used is the reported in [23].
2.2 Kriging
Kriging is a spatial prediction method based on minimizing the
mean squared error, which belongs to thegroup of Geo-statistical
methods and describing the spatial and temporal correlation between
the values ofan attribute, named in honor of D. G. Krige, a South
African engineer who developed an empirical methodto determine the
distribution of gold deposits based on samples of the same.
The DACE model (Design and Analysis of Computer Experiments) is
a parametric regression modeldeveloped by Sack et al. [20] using
the Kriging approach, because Kriging has been often used in
spacesusually of only two or three-dimensional in geostatistical
situations, there is no obvious way to estimate thesemivariogram
for high-dimensional inputs.
The DACE model can be expressed as a combination of a know
function a(x) (e.g., polynomial function,trigonometric series, etc)
and a Gaussian random process b(x).
y(x) = a(x) + b(x) (3)
The Gaussian random process b(x) is assumed to have mean zero
and covariance:
E(b(x(i)), b(x(j))) = Cov(b(x(i)), b(x(j))) = 2R(,x(i),x(j))
(4)where 2 is the process variance of the response and
R(,x(i),x(j)) is the correlation model with param-
eters . In table 2, show different types of correlation
models.
Table 2: The correlation functions have the form R(,w,x)
=ni=1
Rj(, wj xj) , dj = wj xj
NameRj(, dj)
Exponential 1(a) exp(j |dj |)
Gaussian 1(b) exp(jd2j )
Linear 1(c) max{0, 1 j |dj |}
Spherical 1(d) 1 1.5j + 0.53j , j = min{1, j |dj |}
Cubic 1(e) 1 32j + 23j , j = min{1, j |dj |}
Spline 1(f)
1 152j + 303j for 0 j 0
1.25(1 j)3 for 0.2 < j < 10 for j 1
j = j |dj |
3
-
Is common to choose the correlation function as a decreasing
function of the distance between two points.Thus, two points close
together will have a small distance and high correlation. In the
figure 1 note that inall the cases the correlation decreases with
|dj | and a larger j leads to faster decrease.
For the set S of design sites (training set) we have F :
F = [f(s1), f(s2), ..., f(sm)]T (5)
Further, define R as the matrix R of stochastic-process
correlations between zs at design sites,
Rij = R(, si, sj) i, j = 1, ...,m (6)At an untried point x
let:
r(x) = [R(, s1,x), ...,R(, sm,x)]T (7)be the vector of
correlations between zs at designs sites and x.Now, for the sake of
convenience, consider the linear predictor:
y(x) = cTY (8)
The error is:
y(x) y(x) = cTY y(x)= cT (F + Z) (f(x)T + z) (9)= cTZ z + (FT c
f(x))T
where Z = [z1, ..., zm] are the errors at the design sites. To
keep the predictor unbiased we demand thatFT c f(x) = 0, or
FT c = f(x) (10)
Under this condition the mean squared error (MSE) of the
predictor (8) is:
MSE =
mi=1
[y(x) y(x)]2 (11)
Using Lagrange multipliers the MSE can be minimized. And the
predictor model of DACE can be writedas:
y(x) = + rT (x)R1(Y 1) = (FTR1F )1FTR1y (12)
2 =1
m[(Y 1)TR1(Y F )]
where r(x) is the vector of correlations between zs at designs
sites and x and R is the matrix of stochastic-process correlations
between zs at design sites.
The values of and 2 depend on the value of j . The parameter j
can be found with the maximumlikelihood method or maximizing the
expression:
12
[(m ln2) + ln |R|] (13)The principal disadvantage of Kriging is
that the model construction can be very time-consuming, more-
over, estimating the parameters is a n-dimensional optimization
problem (n is the number of variables inthe design space), that can
be computacionally expensive to solve.
In this work the code for KRG used is the reported in [15].
4
-
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
dj
R j(,
d j)
=0.2=1=5
(a) Exponential
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
dj
R j(,
d j)
=0.2=1=5
(b) Gaussian
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
dj
R j(,
d j)
=0.2=1=5
(c) Linear
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
dj
R j(,
d j)
=0.2=1=5
(d) Spherical
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
dj
R j(,
d j)
=0.2=1=5
(e) Cubic
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
dj
R j(,
d j)
=0.2=1=5
(f) Spline
Figure 1: Correlation functions
5
-
2.3 Radial Basis Function Network
A radial basis function network (RBFN) is an artificial neural
network that uses radial basis functions (RBF)as activation
functions. It is a linear combination of radial basis functions.
They are used in functionapproximation, time series prediction, and
control.
The RBF were first introduced by R. Hardy in 1971 [10]. A RBF is
a real-valued function whose valuedepends only on the distance from
the origin, so that (x) = (||x||); or alternatively on the distance
fromsome other point c, called a center, so that (x, c) = (||x
c||). Any function that satisfies the property(x) = (||x||) is a
radial function. The norm is usually Euclidean distance, although
other distance functionsare also possible:
||x|| = d
i=0
x2i = distance of x to the origin (14)
Typical choices for the RBF include linear splines, cubic
splines, multiquadratics splines, thin-plate splinesand Gaussian
functions as show in Table 3.
Table 3: Radial Basis Functions r = ||x ci||Type of RBF
Functionlinear splines |r|cubic splines |r|3multiquadratics
splines
1 + (r)2
thin-plate splines |r|2 ln |r|Gaussian exp((r)2)
A RBFN typically have three layers: an input layer, a hidden
layer with a non-linear RBF activationfunction and a linear output
layer (Figure 2). The output, : Rn R, of the network is thus:
(x) =
Ni=1
wi(||x ci||) (15)
where N is the number of neurons in the hidden layer, ci is the
center vector for neuron i, and wi are theweights of the linear
output neuron. In the basic form all inputs are connected to each
hidden neuron. Thenorm is typically taken to be the Euclidean
distance and the basis function is taken to be Gaussian.
RBF networks are universal approximators on a compact subset of
Rn. This means that a RBF networkwith enough hidden neurons can
approximate any continuous function with arbitrary precision.
In a RBF network there are three types of parameters that need
to be chosen to adapt the network fora particular task:The weights
wi, the center vector ci, and the RBF width parameters i. In the
sequentialtraining of the weights are updated at each time step as
data streams in.
For some tasks it makes sense to define an objective function
and select the parameter values that minimizeits value. The most
common objective function is the least squares function:
K(w) =
t=1
Kt(w) (16)
where
Kt(w) = [y(t) (x(t),w)]2 (17)Radial basis function networks have
been shown to produce good fits to arbitrary contours of both
deter-
ministic and stochastic response functions.In this work the code
for RBF is an own implementation obtained of [].
6
-
Input x
Weights
RBF
Linear Weights
Output y
Figure 2: Architecture of a radial basis function network. An
input vector x is used as input to all radial basisfunctions, each
with different parameters. The output of the network is a linear
combination of the outputs fromradial basis functions.
2.4 Support Vector Regression
The Support Vector Machines (SVM) is mainly inspired froms
statistical learning theory [22]. SVM are a setof related
supervised learning methods which analyze data and recognize
patterns. A support vector machineconstructs a hyperplane or set of
hyperplanes in a high or infinite dimensional space, which can be
used forclassification, regression or other tasks. Intuitively, a
good separation is achieved by the hyperplane that hasthe largest
distance to the nearest training data points of any class
(so-called functional margin), since ingeneral the larger the
margin the lower the generalization error of the classifier.
SVM schemes use a mapping into a larger space so that cross
products may be computed easily in termsof the variables in the
original space making the computational load reasonable. The cross
products in thelarger space are defined in terms of a kernel
function K(x, y) which can be selected to suit the problem. Inthe
Table 4, show different types of kernel functions.
Table 4: Kernel Functions for SVM
Type of Kernel Kernel Function
Polynomial K(x, x) = d
Gaussian Radial Basis Function K(x, x) = exp( ||xx||222
)Exponential Radial Basis Function K(x, x) = exp
( ||xx||22
)Multilayer Perceptron K(x, x) = tanh(+ %)
SVM can also be applied to regression problems1 by the
introduction of an alternative loss function. Theloss function must
be modified to include a distance measure.
Consider the problem of approximating the set of data,
1The SVM for regression problems are know as Support Vector
Regression (SVR)
7
-
D = {(x1, y1), ..., (xl, yl)}, x Rn, y R (18)with a linear
function,
f(x) = + b (19)
the optimal regression function is given by the minimum of the
functional,
(w, ) =1
2||w||2 + C
ni=1
(i + +i ) (20)
where C is a pre-specified value, and +, are slack variables
representing upper and lower constraintson the outputs of the
system.
Exist different types of loss functions (Quadratic, Laplace,
Huber, -insensitive), here only describe theproblem with the
-insensitive.
Using an -insensitive loss function,
L =
{0 for |f(x) y| <
|f(x) y| otherwise (21)
the solution is given by,
max,
W (, ) = max,1
2
li=1
lj=1
(i i )(j j )+li=1
i(yi ) i (yi + ) (22)
with constraints,
0 i, i C, i = 1, ..., l (23)li=1
(i i ) = 0
Solving Equation 22 with constraints Equation 23 determines the
Lagrange multipliers, , , and theregression function is given by
Equation 19, where:
w =
li=1
(i, i )xi (24)
b = 12
Note that the above solution is for a regression with a linear
kernel function.In this work the code for SVR used is the reported
in [5].
3 State of the art
This section describes some approaches that have successfully
used the metamodeling techniques.Ratle [17] proposed a hybrid
algorithm that use a Genetic Algorithm (GA) with Kriging, the first
gener-
ation is randomly initiated like in a basic GA. Fitness is
evaluated using the true fitness function. Solutionsfound in the
first generation are used for building up a metamodel with Kriging.
The metamodel is then ex-ploited for several generations until a
convergence criteria is reached (the convergence criteria is
proposed bythe authors). The next generation is evaluated using the
real fitness function and the metamodel is updatedusing the new
data points. The proposed algorithm waa applied in a test function
with two variables and ina test function with 20 variables. The
authors say that the algorithm seems to be appropriate for
obtainingrapidly and moderately good solution.
Ratle [18] combined a genetic algorithm with a approximated
model created by the Kriging method.The metamodel is updated every
k generations. Six different strategies proposed to update the
metamodel
8
-
and they were evaluated in two test problems. Finally, the best
strategy is used to the design of a simplemechanical structure with
a noise or vibration level reduction criteria. The authors not
justify the selectionof the parameters of the Kriging method.
Beltagy et al. [7] proposed an algorithm with a Gaussian
regression model. The algorithm create ametamodel of the original
function using a the Gaussian regression model, the metamodel is
updated everytime a generation delay criterion is satisfied, the
update is performed by taking into account the fitness andthe
minimum distance of the population algorithm with respect to the
vectors who built the metamodel.The individuals that be taken into
account for the creation of the model must meet a minimum
distancewith respect to the vectors with those was created the
metamodel, helping to the diversity of points in themetamodel and
to decrease computation time. The metamodeling approach seems to
work best with smoothobjective functions. It stalls in situations
where the global optima has strong local features.
Bull [3] proposed a GA with a neural network. The neural network
is trained using example individualswith the explicit fitness and
the resulting model is then used by the GA to find a solution. The
model isupdated every R generations, the current fittest individual
from the evolving population is evaluated in thereal function. This
individual replace the one with lowest fitness in the training set
and the neural networkis re-trained. The approach were applied at
20 fitness landscapes created with the NK model [13].
Pierret [16] designed an algorithm for the turbomachinery blade
design. For the improvement of themachine performance require a
detailed knowledge that can be provided by Navier-Stokes solvers
(thesesolvers are time consuming). The algorithm have a database
containing the input and output of previousNavier-Stokes solutions.
One starts by scanning the database to select the sample that has a
performancecloset to the required one. This is then adapted to the
required performance for an optimization procedure.The optimization
algorithm used is a simulated annealing and use a approximated
model for the performanceevaluation. The approximated model is
obtained from the database with a neural network. The new
solutionobtained from the optimization procedure is evaluated by
the Navier-Stokes solver and is added to thedatabase. Finally, if
the target performance has not been reached, a new iteration is
started. The methodrequire a few Navier-Stokes computations to
define an optimized blade.
Beltagy et al. [6] used a Gaussian Process (GP) with a
Evolutionary Algorithm (EA), they present theadvantages of using GP
over other neural-net biologically inspired approaches. The
metamodel is updatedwith an online model expansion, the model can
expand to include new data points with minimal computationalcost.
In the algorithm the metamodel isnt updated all the generations but
is updated with respect to thepredicted standard deviation of the
metamodel. The results are presented for a real world-engineering
probleminvolving the structural optimization of a satellite
boom.
Jin et al. [12] was mentioned for first time the evolution
control. The evolution control help to avoidfalse optima. They
propose two methods of evolution control: The first is the
Individual-based control, inthis approach part of the individuals
in the population are chosen and evaluated with the real function.
Ifthe individual are chosen randomly the call it random strategy,
if the individuals are the best individualsthey call it a best
strategy. The second strategy is the Generation-based control, in
this approach the wholepopulation of n generations will be
evaluated with the real function in every k generations. They
proposed aframework for managing approximate models in
generation-based evolution control. Also, they proposed analgorithm
that combine a evolutionary strategy (ES) with a neural network and
used the proposed framework.They evaluated the new approach in two
theoretical functions and in a real problem for the blade
designoptimization.
Emmerich et al. [8] proposed the Metamodel Assisted Evolution
Strategies (MAES), they used a EScombined with Kriging. The
approach take into account the error associated with each
prediction. Theestimated value and the predicted error are used to
select the individuals to be evaluated with the realfunction. They
evaluated the approach in artificial landscapes and in a problem of
Airfoil Shape Optimization.The principal advantage of this method
is the use of the error associated with the predictions.
Regis and Shoemaker [19] proposed two algorithms 1) an ES with
local quadratic approximation and 2)ES with local cubic radial
basis function. The main feature of the algorithms is that the
objective functionvalue (or the fitness value) of an offspring
solution will be estimated by fitting a model using its
k-nearestneighbors among the previously evaluated points. The
algorithms were applied to a twelve-dimensional (12-D)groundwater
bioremediation problem involving a complex nonlinear finite-element
simulation model.
Bueche et al. [2] proposed an algorithm denominated Gaussian
Process Optimization Procedure (GPOP),they use a Gaussian Process
as a inexpensive function that replace the original function, the
GaussianProcess is created with individuals in the neighborhood of
the current best solution and with the most recent
9
-
individuals evaluated. The GPOP was applied to a real-world
problem: the optimization of stationary gasturbine compressor
profiles. The authors mention that the GPOP converged much faster
than a range ofalternative evolution strategies, and to
significantly better results.
The most of the previous work do not justify the choice of
metamodel, probably the author chose thetechnique by his knowledge.
There are some works in which the choice of the metamodel technique
is by thecharacteristics of it [8]. Some other works compare some
metamodeling techniques, however the comparisonmethodology is
inefficient [19].
There are studies in the literature that make a comparison
between different techniques for creatingmetamodels. However, most
only make comparisons between two techniques [4, 21, 9, 24], or
only takenas a point of comparison the accuracy of the metamodel.
Another work that takes into account severalmetamodeling techniques
is the reported in [11], they compared four metamodeling techniques
(PolynomialRegression, Kriging, Radial basis functions,
Multivariate Adaptive Regression Splines), the comparison takeinto
account multiple criteria to decide the best techniques in
different problems. However, one disadvantageof this work is that
the dimensionality isnt taken as an important factor. Other
disadvantage is that nottaken account the fitness landscape of the
metamodel, is natural to think that a metamodel can be veryaccurate
but more difficult to optimize.
4 Metodology
The principal challenge in the approximation models is be as
accurate as possible over the complete domain ofinterest while
minimizing the simulation cost (efficiency). The most of the
approaches that use metamodelingtechniques take account only the
accuracy of the technique [4, 21, 9, 24]. However, other approaches
suggestthe use of multiple criteria for assessing the quality of a
metamodel [11], by example, the robustness, theefficiency, the
simplicity. In this work take account this aspects and others have
been added as scalabilityand easy to optimize.
To measure the performance of the metamodeling techniques were
taken into account the following aspects:
Accuracy
The accuracy is the capability o have a prediction close to the
real value of the system. For accuracyuse two data sets, the first
one is the training data set and is used for train the metamodeling
technique,the second set is the validation data set and is used for
validate the accuracy of the technique. Formeasure the accuracy use
the G-Metric:
G = 1Ni=1 (yi yi)2Ni=1 (yi yi)2
= 1 MSEVariance
(25)
where N is the size of the validation data set, yi is the
predicted value for the input i, and yi is thereal value; y is the
mean of the real values. The MSE (Mean Square Error) measure the
differencebetween the estimator and the real value, the variance
describes how far values lie from the mean, thatis the variance
capture how irregular the problem is. The larger the value of G,
the more accurate themetamodel.
Robustness
The robustness is the capability of the technique to achieving
good accuracy for different test problems.Six problems were used,
shown in the previous section. Of the six problems three are
unimodals andthe other three are multimodals, the six problems have
different features and are common problems foroptimization
algorithms.
Scalability
The scalability is the capability of the technique to achieving
good accuracy for different number ofdecision variables. The six
test problems have the capacity to be scaled in the number of
decisionvariables. The number of variables used are divided in 9
levels v = [2, 4, 6, 8, 10, 15, 20, 25, 50].
Efficiency
The efficiency refers to the computational effort required for
the technique for construct the metamodeland for predict the
response for a new input. The efficiency of each metamodeling
technique is measuredby the time used for the metamodel
construction and new predictions.
10
-
Easy to optimize
The easy to optimize refers to the ease of optimizing a
metamodel created for a technique. It is obvious tothink that a
more accurate metamodel is more complicated to optimize, because
their fitness landscapemay be more rugged. For measure the ease to
optimize a Differential Evolution (DE) is used to optimizea
metamodel and the best value found for the DE is evaluated for the
real function and measured thedistance from the optimum.
Simplicity
The simplicity is refers to ease of each technique. The number
of parameters, the size of the parameters,the implementation and
the he knowledge necessary for understanding the technique.
4.1 Test problems
To test the metamodeling techniques to different classes of
problems, six test problems for unconstrainedglobal optimization
are selected. The six problems are scalables in the design space.
The test problems wereselected based in the space search shape and
in the number of local minima.
A summary of the features of the six problems is given in Table
5, the test problem are described in moredetail in the appendix
A.
Table 5: Features of the test problems
Problem name Space search shape # of local minima # of
vari-ables
Global minimum
Step Unimodal no local minima exceptthe global one
n x = (0, . . . , 0), f(x) =0
Sphere Unimodal no local minima exceptthe global one
n x = (0, . . . , 0), f(x) =0
Rosenbrock Unimodal for n 3otherwise multimodal
several local minima forn > 3
n x = (1, . . . , 1), f(x) =0
Ackley Multimodal several local minima n x = (0, . . . , 0),
f(x) =0
Rastrigrin Multimodal large number of localminima
n x = (0, . . . , 0), f(x) =0
Schwefel Multimodal several local minima n x =(420.9687, . . . ,
420.9687),f(x) = 0
4.2 Scheme for metamodeling techniques comparison
The scheme proposed for the comparative study is the next:
1. Create a training data set with latin hypercubes [14] of size
100.
2. Train each technique used (PRS, KRG, RBF, SVR) with the
training set.
3. Create the validation data set with latin hypercubes of size
200.
4. Predict the validation data set with the metamodel.
5. Measure the accuracy with the G-metric
6. Repeat the step 1 for 31 different data sets.
11
-
The procedure is applied to the six problems with the nine
levels of variables (6 problems x 9 levels = 54different problems).
The objective of this experiment is to measure the accuracy,
robustness, scalability andefficiency of the techniques.
For each technique the parameters were discretized and a full
factorial design is created, in order to avoidaffecting a technique
by poor tuning of parameters. The parameters used for each
technique are the next:
Polynomial RegressionDegree of the polynomial={2}Technique used
for construct the polynomial = {Full, Stepwise}
KrigingCorrelation function = {Gaussian, Exponential, Cubic,
Linear, Spherical, Spline}
Radial basis functionNumber of neurons in the hidden
layer={3-100}
Support Vector RegressionC ={25 : 215} ={0.1 : 2} ={210 :
25}
5 Discussion of results
5.1 Accuracy, robustness and scalability
For each metamodeling technique was chosen the set of parameters
that had the best accuracy in all theproblems (a setting for each
metamodeling technique), this is named as Best overall settings, in
the sameway was chosen the set of parameters that had the best
accuracy for each problem (54 settings for eachmetamodeling
technique), this is named as Best local settings.
The Best overall settings found are the next:
Polynomial RegressionDegree of the polynomial={2}Technique used
for construct the polynomial = {Stepwise}
KrigingCorrelation function = {Exponential}
Radial basis functionNumber of neurons in the hidden
layer={6}
Support Vector RegressionC ={210.5} ={0.2} ={22.5}
To illustrate the performance of the metamodeling techniques use
a boxplot graphic; the median is shownwith a straight line inside
the box and indicate that half of the problems are above it and the
other half arebelow it. The mean is shown with a circle and
indicate the average accuracy of a technique, the size of thebox
indicate the variability of the technique. The smaller the box was
more robust technique.
Figure 3 show the results of accuracy for all the problems and
different problem sizes, the figure show thatthe accuracies of KRG,
RBF and SVR the settings found for all the problems and different
problem sizes
12
-
2.1933
1.9933
1.7933
1.5933
1.3933
1.1933
0.9933
0.7933
0.5933
0.3933
0.1933
0.0067
0.2067
0.4067
0.6067
0.8067
1.0067
Metamodeling technique
GM
etric
PRS KRG RBF SVR
Best local settingsBest overall settings
Figure 3: Accuracy for the metamodeling techniques in all the
problems.
(Best overall settings) have a comparable performance with
respect to the Best local settings. Moreover, inPRS the results for
the Best overall settings worsen the performance of the
technique.
Figure 3 show that the accuracies of RBF and KRG are among the
best of the techniques; their values arevery close to each other
(the median is close to 0.9); RBF is slightly better than KRG.
However their resultsare not conclusive with respect to SVR. The
worst performance is for PRS. In figure 4 show the average
ofG-metric for all the problems, this figure is confirmed as the
RBF and KRG have similar results and betterperformance with respect
to SVR and PRS.
In terms of the robustness of the accuracy for all the problems,
the RBF is the best for Best local settingsand Best overall
settings, however their results are slightly better than KRG.
Overall, RBF is the best forthe average accuracy and the robustness
when handling different types of problems.
The problems were divided into two types: unimodal and
multimodal. For each type the performance ofeach metamodeling
technique was illustrated in a boxplot graphic. In figure 5 show
that RBF are slightlybetter than KRG. PRS and SVR have a worst
performance. However, the results show that there are
certainproblems for which PRS and SVR have a good performance
because the top of the box is close to 1. Moreover,RBF is more
robust than KRG because the box size is smaller.
In figure 6 show that the best performance is for RBF, but their
results are slightly better than KRG.In the same way as in the
unimodal problems the worst performance is for PRS and SVR. In
terms of therobustness the best is the RBF, but their results show
that it is not very robust.
Due to the lack of robustness, we need to know if the problem is
on the increase in the number of variablesor in the type of
problem. Next we present an analysis of the techniques for each
test problem:
Step
In figure 7 shows the behavior of the metamodeling techniques in
the Step function, you can see thatwith very few variables (2 or 4)
SVR has good behavior. But at increasing the number of variablesSVR
worsens considerably. PRS is seen as having a bad behavior even
with few variables. KRG has abetter performance than RBF up to
maximum 10 variables, since for a greater number of variables
RBFachieves better results. Moreover, as a special mention can be
seen that the RBF maintain a steadyperformance despite the increase
in the variables, we can say that is robust to the increase of
variables.
13
-
00.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
PRS KRG RBF SVR
Metamodeling technique
GM
etric
Best local settingsBest overall settings
Figure 4: Accuracy for the metamodeling techniques: average in
all the problems.
2
1.5
1
0.5
0
0.5
1
Metamodeling technique
GM
etric
PRS KRG RBF SVR
Best local settingsBest overall settings
Figure 5: Accuracy for the metamodeling techniques in unimodal
problems and the nine levels of variables.
14
-
2
1.5
1
0.5
0
0.5
1
Metamodeling technique
GM
etric
PRS KRG RBF SVR
Best local settingsBest overall settings
Figure 6: Accuracy for the metamodeling techniques in multimodal
problems and the nine levels of variables.
Finally, we note that both the best local settings and the best
overall settings their results are similar,except for PRS. Thus,
one can say that the general parameters found (Best Overall
settings) can beused without prior adjustment (Best local
settings).
Sphere
In figure 8 shows the behavior of the metamodeling techniques in
the Sphere function, it appears thatthe technique is with the best
performing is RBF. Furthermore, its behavior is very consistent
with theincrease in the number of variables. It can be seen in best
local settings PRS achieves good performanceto maximum 10
variables, and then its performance decreases significantly. In
addition, KRG and SVRin both local best settings and global best
settings, achieve comparable results. In the same way as inthe
previous problem can be said that the best global settings and best
local settings obtained similarresults at least for RBF, KRG and
SVR.
Rosenbrock
In figure 9 shows the behavior of the metamodeling techniques in
the Rosenbrock function, PRS is thetechnique with a worse
performance, with just less than five variables have achieved
moderately goodresults. But the increase in the number of variables
leads at the performance decreased significantly.KRG and SVR have
comparable results, both with a relatively low number of variables
(v < 10) arebetter than RBF, however, to a greater number of
variables (v 10) RBF achieves better results. Amain feature of RBF
is his constant behavior even with the increase in the number of
variables.
Ackley
In figure 10 shows the behavior of the metamodeling techniques
in the Ackley function, KRG andRBF have similar results in best
local settings and best overall settings. PRS is the worst
performingtechnique, even with few variables can not approximate
the function. KRG is the technique with betterperformance with a
few variables (v < 20) while RBF is the technique with better
performance for agreater number of variables (v 20). Again, RBF
remained constant with the increase of the variables.
Rastrigrin
15
-
0 5 10 15 20 25 30 35 40 45 500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Number of variables
GM
etric
Behavior by number of variablesBest local settings
Problem: Step
KRGPRSSVRRBF
(a) Best local settings
0 5 10 15 20 25 30 35 40 45 500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Number of variables
GM
etric
Behavior by number of variablesBest overall settings
Problem: Step
KRGPRSSVRRBF
(b) Best overall settings
Figure 7: The mean for accuracy metric by number of variables,
problem: Step
0 5 10 15 20 25 30 35 40 45 500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Number of variables
GM
etric
Behavior by number of variablesBest local settingsProblem:
Sphere
KRGPRSSVRRBF
(a) Best local settings
0 5 10 15 20 25 30 35 40 45 500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Number of variables
GM
etric
Behavior by number of variablesBest overall settings
Problem: Sphere
KRGPRSSVRRBF
(b) Best overall settings
Figure 8: The mean for accuracy metric by number of variables,
problem: Sphere
16
-
0 5 10 15 20 25 30 35 40 45 500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Number of variables
GM
etric
Behavior by number of variablesBest local settings
Problem: Rosenbrock
KRGPRSSVRRBF
(a) Best local settings
0 5 10 15 20 25 30 35 40 45 500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Number of variables
GM
etric
Behavior by number of variablesBest overall settings
Problem: Rosenbrock
KRGPRSSVRRBF
(b) Best overall settings
Figure 9: The mean for accuracy metric by number of variables,
problem: Rosenbrock
0 5 10 15 20 25 30 35 40 45 500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Number of variables
GM
etric
Behavior by number of variablesBest local settingsProblem:
Ackley
KRGPRSSVRRBF
(a) Best local settings
0 5 10 15 20 25 30 35 40 45 500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Number of variables
GM
etric
Behavior by number of variablesBest overall settings
Problem: Ackley
KRGPRSSVRRBF
(b) Best overall settings
Figure 10: The mean for accuracy metric by number of variables,
problem: Ackley
17
-
In figure 11 shows the behavior of the metamodeling techniques
in the Rastrigrin function, RBF is thetechnique with best
performing. PRS has a good performance up to maximum 15 variables.
Moreover,SVR and KRG have a similar behavior, and its performance
is decreasing with the increase in thenumber of variables. Finally,
in the same way as in previous problems best overalls settings and
bestlocal settings are comparable in RBF, SVR and KRG.
0 5 10 15 20 25 30 35 40 45 500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Number of variables
GM
etric
Behavior by number of variablesBest local settingsProblem:
Rastrigrin
KRGPRSSVRRBF
(a) Best local settings
0 5 10 15 20 25 30 35 40 45 500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Number of variables
GM
etric
Behavior by number of variablesBest overall settingsProblem:
Rastrigrin
KRGPRSSVRRBF
(b) Best overall settings
Figure 11: The mean for accuracy metric by number of variables,
problem: Rastrigrin
Schwefel
In figure 12 shows the behavior of the metamodeling techniques
in the Schwefel function, in this problemthe four techniques are
well-behaved, however the techniques that have better performance
in best localsettings are RBF and KRG, and RBF for best overall
settings. PRS behavior is decreasing with theincrease of the
variables. For this problem KRG and RBF are remain constant with
the increase in thenumber of variables.
In figure 13 show the performance of the techniques in the six
test functions, while KRG is the techniquethat has better
performance for a few variables (v < 15), RBF is the technique
with better performance forhigh-dimensional functions. The
technique with the worst overall performance is PRS. SVR achieves
similarperformance to KRG although this is slightly better. RBF is
very robust against the increase in the numberof variables in
general for all problems, since their results are held
constant.
6 Conclusions
The study presented in this paper has provided interesting
results about the performance of differents meta-modeling
techniques (PRS, KRG, RBF, SVR). The metamodeling techniques was
evaluated making use ofmultiple aspects in several test problems
with different features.
We define the size of the problem with respect to the
dimensionality, the high dimensionality for problemswith v > 15
and low dimensionality for others. In table 6 show that KRG is the
best for low dimensionalityproblems. Moreover RBF is the best for
High dimensionality problems. Overall the best technique
withrespect to the accuracy and the scalability is the RBF.
18
-
0 5 10 15 20 25 30 35 40 45 500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Number of variables
GM
etric
Behavior by number of variablesBest local settingsProblem:
Schwefel
KRGPRSSVRRBF
(a) Best local settings
0 5 10 15 20 25 30 35 40 45 500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Number of variables
GM
etric
Behavior by number of variablesBest overall settingsProblem:
Schwefel
KRGPRSSVRRBF
(b) Best overall settings
Figure 12: The mean for accuracy metric by number of variables,
problem: Schwefel
0 5 10 15 20 25 30 35 40 45 500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Number of variables
GM
etric
Behavior by number of variablesBest local settings
General
KRGPRSSVRRBF
(a) Best local settings
0 5 10 15 20 25 30 35 40 45 500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Number of variables
GM
etric
Behavior by number of variablesBest overall settings
General
KRGPRSSVRRBF
(b) Best overall settings
Figure 13: The mean for accuracy metric by number of variables
in all the problems
19
-
Table 6: Best techniques with respect to the accuracy and
dimensionality: All the problems
Unimodal Multimodal OverallLow dimensionality KRG KRG KRGHigh
dimensionality RBF RBF RBF
Overall RBF RBF RBF
In table 7 show the best techniques with respect to accuracy and
the dimensionality by problem. Forlow dimensional problems the best
technique is KRG, although exist techniques to achieve a
performancecomparable to it. For high dimensionality problems the
best is RBF.
Table 7: Best techniques with respect to the accuracy and
dimensionality: By problem
Step Sphere Rosenbrock Ackley Rastrigrin SchwefelLow
dimensionality KRG ALL KRG,SVR KRG ALL KRG,RBF,SVRHigh
dimensionality RBF RBF RBF RBF RBF KRG,RBF
Overall RBF RBF RBF RBF RBF RBF
Finally, to improve our study, we need evaluate the missing
aspects (efficiency, easy to optimize, simplicity).
20
-
A Test problems
A.1 Step
Step is the representative of the problem of flat surfaces. The
step function consists of many flat plateauswith uniform steep
ridges. For algorithms that require gradient information to
determine a search direction,this function poses considerable
difficulty. Flat surfaces are obstacles for optimization
algorithms, becausethey do not give any information as to which
direction is favorable. Unless an algorithm has variable stepsizes,
it can get stuck on one of the flat plateaus.
f1(x) =
Di=1
b|xi|+ 0.5c2 (26)
Its global minimum equal f2(x) = 0 is obtainable for xi = 0, i =
1, . . . , D. In Figure 14, show the Stepfunction with D = 2.
5
0
5
5
0
50
10
20
30
40
50
x1
Step function
x2
f 1(x 1,
x 2)
Figure 14: An overview of Step function with D = 2.
A.2 Sphere
So called first function of De Jongs is one of the simplest test
benchmark. Function is continuous, convexand unimodal. It has the
following general definition
f2(x) =
Di=1
x2i (27)
Its global minimum equal f2(x) = 0 is obtainable for xi = 0, i =
1, . . . , D. In Figure 15, show the Spherefunction with D = 2.
A.3 Rosenbrock
Rosenbrocks valley is a classic optimization problem, also known
as banana function or the second functionof De Jong. The global
optimum lays inside a long, narrow, parabolic shaped flat valley.
To find the valleyis trivial, however convergence to the global
optimum is difficult and hence this problem has been frequentlyused
to test the performance of optimization algorithms. Function has
the following definition:
f3(x) =
Di=1
(100
(xi+1 x2i
)2+ (xi 1)2
)(28)
21
-
5
0
5
5
0
50
10
20
30
40
50
x1
Sphere function
x2
f 2(x 1,
x 2)
Figure 15: An overview of Sphere function with D = 2.
Its global minimum equal f3(x) = 0 is obtainable for xi = 1, i =
1, . . . , D. In Figure 16, show theRosenbrock function with D =
2.
21
01
2
21
01
2
0
1000
2000
3000
4000
x1
Rosenbrock function
x2
f 3(x 1,
x 2)
Figure 16: An overview of Rosenbrock function with D = 2.
A.4 Ackley
The Ackley Problem is a minimization problem. Originally this
problem was defined for two dimensions, butthe problem has been
generalized to D dimensions. Ackleys is a widely used multimodal
test function. Ithas the following definition
f4(x) = 20 exp(0.2 1DDi=1
x2i ) exp(1
DDi=1
cos(2pixi)) + 20 + exp(1) (29)
Its global minimum equal f4(x) = 0 is obtainable for xi = 0, i =
1, . . . , D. In Figure 17, show the Ackleyfunction with D = 2.
22
-
42
02
4
3210
1230
2
4
6
8
10
12
x1
Ackley function
x2
f 4(x 1,
x 2)
Figure 17: An overview of Ackley function with D = 2.
A.5 Rastrigrin
The Rastrigin Function is a typical example of non-linear
multimodal function. Rastrigins function is basedon the function of
De Jong with the addition of cosine modulation in order to produce
frequent local minima.Thus, the test function is highly multimodal.
This function is a fairly difficult problem due to its large
searchspace and its large number of local minima. However, the
location of the minima are regularly distributed.Function has the
following definition
f5(x) =
Di=1
(x2i 10 cos (2pixi) + 10
)(30)
Its global minimum equal f5(x) = 0 is obtainable for xi = 0, i =
1, . . . , D. In Figure 18, show theRastrigrin function with D =
2.
42
02
4
42
02
40
10
20
30
40
50
60
70
x1
Rastrigrin function
x2
f 5(x 1,
x 2)
Figure 18: An overview of Rastrigrin function with D = 2.
23
-
A.6 Schwefel
Schwefels function is deceptive in that the global minimum is
geometrically distant, over the parameter space,from the next best
local minima. Therefore, the search algorithms are potentially
prone to convergence inthe wrong direction. Function has the
following definition
f6(x) = 418.9809D Di=1
xi sin(|xi|)
(31)
Its global minimum equal f6(x) = 0 is obtainable for xi =
420.9687, i = 1, . . . , D. In Figure 19, show theSchwefel function
with D = 2.
Figure 19: An overview of Schwefel function with D = 2.
References
[1] Russell R. Barton. Metamodels for simulation input-output
relations. In Proceedings of the 24th confer-ence on Winter
simulation, WSC 92, pages 289299, New York, NY, USA, 1992. ACM.
[2] D. Bueche, N.N. Schraudolph, and P. Koumoutsakos.
Accelerating evolutionary algorithms with gaussianprocess fitness
function models. IEEE Trans. on Systems, Man, and Cybernetics: Part
C, 2004. In press.
[3] L. Bull. On model-based evolutionary computation. Soft
Computing, 3:7682, 1999.
[4] W. Carpenter and J.-F. Barthelemy. A comparison of
polynomial approximation and artificial neuralnets as response
surface. Technical Report 92-2247, AIAA, 1992.
[5] Chih-Chung Chang and Chih-Jen Lin. LIBSVM: a library for
support vector machines, 2001. Softwareavailable at
http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[6] M.A. El-Beltagy and A.J. Keane. Evolutionary optimization
for computationally expensive problemsusing Gaussian processes. In
Proceedings of International Conference on Artificial Intelligence,
pages708714. CSREA, 2001.
[7] M.A. El-Beltagy, P.B. Nair, and A.J. Keane. Metamodeling
techniques for evolutionary optimizationof computationally
expensive problems: promises and limitations. In Proceedings of
Genetic and Evolu-tionary Conference, pages 196203, Orlando, 1999.
Morgan Kaufmann.
[8] M. Emmerich, A. Giotis, M. Ozdenir, T. Back, and K.
Giannakoglou. Metamodel-assisted evolutionstrategies. In Parallel
Problem Solving from Nature, number 2439 in Lecture Notes in
Computer Science,pages 371380. Springer, 2002.
24
-
[9] Anthony A. Giunta and Layne T. Watson. A comparison of
approximation modeling techniques: Poly-nomial versus interpolating
models, 1998.
[10] R. L. Hardy. Multiquadric equations of topography and other
irregular surfaces. J. Geophys. Res.,76:19051915, 1971.
[11] R. Jin, W. Chen, and T.W. Simpson. Comparative studies of
metamodeling techniques under miltiplemodeling criteria. Technical
Report 2000-4801, AIAA, 2000.
[12] Y. Jin, M. Olhofer, and B. Sendhoff. Managing approximate
models in evolutionary aerodynamic designoptimization. In
Proceedings of IEEE Congress on Evolutionary Computation, volume 1,
pages 592599,May 2001.
[13] Stuart A. Kauffman. The Origins of Order: Self-Organization
and Selection in Evolution. OxfordUniversity Press, USA, 1 edition,
June 1993.
[14] M. D. McKay, R. J. Beckman, and W. J. Conover. A comparison
of three methods for selecting valuesof input variables in the
analysis of output from a computer code. Technometrics,
21(2):239245, 1979.
[15] H.B. Nielsen, S.N. Lophaven, Hans Bruun Nielsen, and J.
Sndergaard. Dace - a matlab kriging toolbox,2002.
[16] S. Pierret. Turbomachinery blade design using a
Navier-Stokes solver and artificial neural network.ASME Journal of
Turbomachinery, 121(3):326332, 1999.
[17] A. Ratle. Accelerating the convergence of evolutionary
algorithms by fitness landscape approximation. InA. Eiben, Th.
Back, M. Schoenauer, and H.-P. Schwefel, editors, Parallel Problem
Solving from Nature,volume V, pages 8796, 1998.
[18] A. Ratle. Optimal sampling strategies for learning a
fitness model. In Proceedings of 1999 Congress onEvolutionary
Computation, volume 3, pages 20782085, Washington D.C., July
1999.
[19] R. G. Regis and C. A. Shoemaker. Local function
approximation in evolutionary algorithms for theoptimization of
costly functions. Evolutionary Computation, IEEE Transactions on,
8(5):490505, 2004.
[20] Jerome Sacks, William J. Welch, Toby J. Mitchell, and Henry
P. Wynn. Design and Analysis of ComputerExperiments. Statistical
Science, 4(4):409423, November 1989.
[21] T. Simpson, T. Mauery, J. Korte, and F. Mistree. Comparison
of response surface and Kriging modelsfor multidiscilinary design
optimization. Technical Report 98-4755, AIAA, 1998.
[22] Vladimir N. Vapnik. Statistical Learning Theory.
Wiley-Interscience, September 1998.
[23] F.A.C. Viana. SURROGATES Toolbox Users Guide. Gainesville,
FL, USA, version 2.1 edition, 2010.
[24] L. Willmes, T. Baeck, Y. Jin, and B. Sendhoff. Comparing
neural networks and kriging for fitness approx-imation in
evolutionary optimization. In Proceedings of IEEE Congress on
Evolutionary Computation,pages 663670, 2003.
25
IntroductionBackgroundPolynomial approximation
modelsKrigingRadial Basis Function NetworkSupport Vector
Regression
State of the artMetodologyTest problemsScheme for metamodeling
techniques comparison
Discussion of resultsAccuracy, robustness and scalability
ConclusionsTest
problemsStepSphereRosenbrockAckleyRastrigrinSchwefel