Benchmarking Exponential Natural Evolution Strategies on the Noiseless and Noisy Black-box Optimization Testbeds Tom Schaul Courant Institute of Mathematical Sciences, New York University Broadway 715, New York, USA [email protected]ABSTRACT Natural Evolution Strategies (NES) are a recent member of the class of real-valued optimization algorithms that are based on adapting search distributions. Exponential NES (xNES) are the most common instantiation of NES, and particularly appropriate for the BBOB 2012 benchmarks, given that many are non-separable, and their relatively small problem dimensions. This report provides the the most ex- tensive empirical results on that algorithm to date, on both the noise-free and noisy BBOB testbeds. Categories and Subject Descriptors G.1.6 [Numerical Analysis]: Optimization—global opti- mization, unconstrained optimization ; F.2.1 [Analysis of Algorithms and Problem Complexity]: Numerical Al- gorithms and Problems General Terms Algorithms Keywords Evolution Strategies, Natural Gradient, Benchmarking 1. INTRODUCTION Evolution strategies (ES), in contrast to traditional evo- lutionary algorithms, aim at repeating the type of muta- tion that led to those good individuals. We can characterize those mutations by an explicitly parameterized search dis- tribution from which new candidate samples are drawn, akin to estimation of distribution algorithms (EDA). Covariance matrix adaptation ES (CMA-ES [10]) innovated the field by introducing a parameterization that includes the full covari- ance matrix, allowing them to solve highly non-separable problems. A more recent variant, natural evolution strategies (NES [16, 6, 14, 15]) aims at a higher level of generality, providing a Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. GECCO’12, July 7–11, 2012, Philadelphia, USA. Copyright 2012 ACM 978-1-4503-0073-5/10/07 ...$10.00. procedure to update the search distribution’s parameters for any type of distribution, by ascending the gradient towards higher expected fitness. Further, it has been shown [12, 11] that following the natural gradient to adapt the search dis- tribution is highly beneficial, because it appropriately nor- malizes the update step with respect to its uncertainty and makes the algorithm scale-invariant. Exponential NES (xNES), the most common instantiation of NES, used a search distribution parameterized by a mean vector and a full covariance matrix, and is thus most sim- ilar to CMA-ES (in fact, the precise relation is described in [4] and [5]). Given the relatively small problem dimen- sions of the BBOB benchmarks, and the fact that many are non-separable, it is also among the most appropriate NES variants for the task. In this report, we retain the original formulation of xNES (including all parameter settings, except for an added stop- ping criterion) and describe the empirical performance on all 54 benchmark functions (both noise-free and noisy) of the BBOB 2012 workshop. 2. NATURAL EVOLUTION STRATEGIES Natural evolution strategies (NES) maintain a search dis- tribution π and adapt the distribution parameters θ by fol- lowing the natural gradient [1] of expected fitness J , that is, maximizing J (θ)= E θ [f (z)] = Z f (z) π(z | θ) dz Just like their close relative CMA-ES [10], NES algorithms are invariant under monotone transformations of the fit- ness function and linear transformations of the search space. Each iteration the algorithm produces n samples zi ∼ π(z|θ), i ∈{1,...,n}, i.i.d. from its search distribution, which is pa- rameterized by θ. The gradient w.r.t. the parameters θ can be rewritten (see [16]) as ∇ θ J (θ)= ∇ θ Z f (z) π(z | θ) dz = E θ [f (z) ∇ θ log π(z | θ)] from which we obtain a Monte Carlo estimate ∇ θ J (θ) ≈ 1 n n X i=1 f (zi ) ∇ θ log π(zi | θ) of the search gradient. The key step then consists in replac- ing this gradient by the natural gradient defined as F -1 ∇ θ J (θ) where F = E h ∇ θ log π (z|θ) ∇ θ log π (z|θ) > i is the Fisher information matrix. The search distribution is iteratively
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
ABSTRACTNatural Evolution Strategies (NES) are a recent memberof the class of real-valued optimization algorithms that arebased on adapting search distributions. Exponential NES(xNES) are the most common instantiation of NES, andparticularly appropriate for the BBOB 2012 benchmarks,given that many are non-separable, and their relatively smallproblem dimensions. This report provides the the most ex-tensive empirical results on that algorithm to date, on boththe noise-free and noisy BBOB testbeds.
Categories and Subject DescriptorsG.1.6 [Numerical Analysis]: Optimization—global opti-mization, unconstrained optimization; F.2.1 [Analysis ofAlgorithms and Problem Complexity]: Numerical Al-gorithms and Problems
1. INTRODUCTIONEvolution strategies (ES), in contrast to traditional evo-
lutionary algorithms, aim at repeating the type of muta-tion that led to those good individuals. We can characterizethose mutations by an explicitly parameterized search dis-tribution from which new candidate samples are drawn, akinto estimation of distribution algorithms (EDA). Covariancematrix adaptation ES (CMA-ES [10]) innovated the field byintroducing a parameterization that includes the full covari-ance matrix, allowing them to solve highly non-separableproblems.
A more recent variant, natural evolution strategies (NES [16,6, 14, 15]) aims at a higher level of generality, providing a
Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.GECCO’12, July 7–11, 2012, Philadelphia, USA.Copyright 2012 ACM 978-1-4503-0073-5/10/07 ...$10.00.
procedure to update the search distribution’s parameters forany type of distribution, by ascending the gradient towardshigher expected fitness. Further, it has been shown [12, 11]that following the natural gradient to adapt the search dis-tribution is highly beneficial, because it appropriately nor-malizes the update step with respect to its uncertainty andmakes the algorithm scale-invariant.
Exponential NES (xNES), the most common instantiationof NES, used a search distribution parameterized by a meanvector and a full covariance matrix, and is thus most sim-ilar to CMA-ES (in fact, the precise relation is describedin [4] and [5]). Given the relatively small problem dimen-sions of the BBOB benchmarks, and the fact that many arenon-separable, it is also among the most appropriate NESvariants for the task.
In this report, we retain the original formulation of xNES(including all parameter settings, except for an added stop-ping criterion) and describe the empirical performance onall 54 benchmark functions (both noise-free and noisy) ofthe BBOB 2012 workshop.
tribution π and adapt the distribution parameters θ by fol-lowing the natural gradient [1] of expected fitness J , that is,maximizing
J(θ) = Eθ[f(z)] =
Zf(z) π(z | θ) dz
Just like their close relative CMA-ES [10], NES algorithmsare invariant under monotone transformations of the fit-ness function and linear transformations of the search space.Each iteration the algorithm produces n samples zi ∼ π(z|θ),i ∈ {1, . . . , n}, i.i.d. from its search distribution, which is pa-rameterized by θ. The gradient w.r.t. the parameters θ canbe rewritten (see [16]) as
of the search gradient. The key step then consists in replac-ing this gradient by the natural gradient defined as F−1∇θJ(θ)
where F = Eh∇θ log π (z|θ)∇θ log π (z|θ)>
iis the Fisher
information matrix. The search distribution is iteratively
updated using natural gradient ascent
θ ← θ + ηF−1∇θJ(θ)
with learning rate parameter η.
2.1 Exponential NESWhile the NES formulation is applicable to arbitrary pa-
rameterizable search distributions [16, 11], the most com-mon variant employs multinormal search distributions. Forthat case, two helpful techniques were introduced in [6],namely an exponential parameterization of the covariancematrix, which guarantees positive-definiteness, and a novelmethod for changing the coordinate system into a “natural”one, which makes the algorithm computationally efficient.The resulting algorithm, NES with a multivariate Gaussiansearch distribution and using both these techniques is calledxNES, and the pseudocode is given in Algorithm 1.
Algorithm 1: Exponential NES (xNES)
input: f , µinit, ησ, ηB, uk
initializeµ ← µinit
σ ← 1B ← I
repeatfor k = 1 . . . n do
draw sample sk ∼ N (0, I)zk ← µ + σB>skevaluate the fitness f(zk)
end
sort {(sk, zk)} with respect to f(zk)and assign utilities uk to each sample
Table 1: Default parameter values for xNES (includ-ing the utility function and adaptation sampling) asa function of problem dimension d.
parameter default value
n 4 + b3 log(d)c
ησ = ηB3(3 + log(d))
5d√d
ukmax
`0, log(n
2+ 1)− log(k)
´Pnj=1 max
`0, log(n
2+ 1)− log(j)
´ − 1
n
3. EXPERIMENTAL SETTINGSWe use identical default hyper-parameter values for all
benchmarks (both noisy and noise-free functions), whichare taken from [6, 11]. Table 1 summarizes all the hyper-parameters used.
In addition, we make use of the provided target fitness fopt
to trigger independent algorithm restarts1, using a simplead-hoc procedure: If the log-progress during the past 1000devaluations is too small, i.e., if
log10
˛fopt − ft
fopt − ft−1000d
˛< (r+2)2 ·m3/2 · [log10 |fopt−ft|+8]
where m is the remaining budget of evaluations divided by1000d, ft is the best fitness encountered until evaluation tand r is the number of restarts so far. The total budget is105d3/2 evaluations.
Implementations of this and other NES algorithm vari-ants are available in Python through the PyBrain machinelearning library [13], as well as in other languages at www.
idsia.ch/~tom/nes.html.
4. CPU TIMINGA timing experiment was performed to determine the CPU-
time per function evaluation, and how it depends on theproblem dimension. For each dimension, the algorithm wasrestarted with a maximum budget of 10000/d evaluations,until at least 30 seconds had passed.
Our xNES implementation (in Python, based on the Py-Brain [13] library), running on an Intel Xeon with 2.67GHz,required an average time of 1.1, 0.9, 0.7, 0.7, 0.9, 2.7 mil-liseconds per function evaluation for dimensions 2, 5, 10, 20,40, 80 respectively (the function evaluations themselves takeabout 0.1ms).
5. RESULTSResults of xNES on the noiseless testbed (from experi-
ments according to [7] on the benchmark functions given in[2, 8]) are presented in Figures 1, 3 and 5 and in Tables 2and 4.
Similarly, results of xNES on the testbed of noisy func-tions (from experiments according to [7] on the benchmarkfunctions given in [3, 9]) are presented in Figures 2, 4 and 5and in Tables 3, and 4.
6. DISCUSSIONThe top rows in Figures 3 and 4 give a good overview
picture, showing that across all benchmarks taken together,xNES performs almost as well as the best and better thanmost of the BBOB 2009 contestants. Beyond this high-levelperspective, the results speak for themselves, of course, wewill just highlight a few observations.
According to Tables 2 and 3, the only conditions wherexNES significantly outperforms all algorithms from the BBOB2009competition on dimension 20 are on functions f18, f115 andf119 (during the early phase), as well as on f118 on dimension5. We observe the worst performance on multimodal func-tions like f3, f4 and f15 that other algorithms tackle very
1It turns out that this use of fopt is technically not permit-ted by the BBOB guidelines, so strictly speaking a differentrestart strategy should be employed, for example the onedescribed in [11].
D=
5
0 1 2 3 4 5log10 of FEvals / DIM
0.0
0.5
1.0pro
port
ion o
f tr
ials
f101-130+1:30/30-1:28/30-4:20/30-8:19/30
0 2 4 6 8 10 12 14 16 18log10 of Df / Dftarget
f101-130
D=
20
0 1 2 3 4 5log10 of FEvals / DIM
0.0
0.5
1.0
pro
port
ion o
f tr
ials
f101-130+1:24/30-1:14/30-4:11/30-8:10/30
0 2 4 6 8 10 12 14 16 18log10 of Df / Dftarget
f101-130
Figure 4: Empirical cumulative distribution func-tions (ECDFs) of the 30 noisy benchmark functions.Plotted is the fraction of trials versus running time(left subplots) or versus ∆f (right subplots) (see Fig-ure 3 for details).
easily. Comparing different types of noise, xNES appearsto be least sensitive to Cauchy noise and most sensitive touniform noise (see Figure 2).
From Figure 5 and Table 4, we observe a good loss ra-tio across the board on all benchmarks, with the best oneson moderate functions, ill-conditioned functions, and for alllevels of noise. On the other hand, the algorithm is lesscompetitive on (noisy or noise-free) multimodal benchmarks,which we expect to be directly related to its small defaultpopulation size.
AcknowlegementsThe author wants to thank the organizers of the BBOBworkshop for providing such a well-designed benchmark setup,and especially such high-quality post-processing utilities.
This work was funded in part through AFR postdoc grantnumber 2915104, of the National Research Fund Luxem-bourg.
7. REFERENCES[1] S. I. Amari. Natural Gradient Works Efficiently in
Learning. Neural Computation, 10:251–276, 1998.
[2] S. Finck, N. Hansen, R. Ros, and A. Auger.Real-parameter black-box optimization benchmarking2009: Presentation of the noiseless functions.Technical Report 2009/20, Research Center PPE,2009. Updated February 2010.
[3] S. Finck, N. Hansen, R. Ros, and A. Auger.Real-parameter black-box optimization benchmarking2010: Presentation of the noisy functions. TechnicalReport 2009/21, Research Center PPE, 2010.
[4] N. Fukushima, Y. Nagata, S. Kobayashi, and I. Ono.Proposal of distance-weighted exponential naturalevolution strategies. In 2011 IEEE Congress of
Table 4: ERT loss ratio compared to the respectivebest result from BBOB-2009 for budgets given inthe first column (see also Figure 5). The last rowRLUS/D gives the number of function evaluationsin unsuccessful runs divided by dimension. Shownare the smallest, 10%-ile, 25%-ile, 50%-ile, 75%-ileand 90%-ile value (smaller values are better). TheERT Loss ratio equals to one for the respective bestalgorithm from BBOB-2009. Typical median valuesare between ten and hundred.
f 1–f 24 in 5-D, maxFE/D=164731#FEs/D best 10% 25% med 75% 90%
[5] T. Glasmachers, T. Schaul, and J. Schmidhuber. ANatural Evolution Strategy for Multi-ObjectiveOptimization. In Parallel Problem Solving from Nature(PPSN), 2010.
[6] T. Glasmachers, T. Schaul, Y. Sun, D. Wierstra, and
2 3 5 10 20 400
1
2
3
4 1 Sphere
+1 +0 -1 -2 -3 -5 -8
2 3 5 10 20 400
1
2
3
4 2 Ellipsoid separable
2 3 5 10 20 400
1
2
3
4
5
6
7
9
3 Rastrigin separable
2 3 5 10 20 400
1
2
3
4
5
6
7
9
14 Skew Rastrigin-Bueche separ
2 3 5 10 20 40
0
1
2 5 Linear slope
2 3 5 10 20 400
1
2
3
4 6 Attractive sector
2 3 5 10 20 400
1
2
3
4 7 Step-ellipsoid
2 3 5 10 20 400
1
2
3
4 8 Rosenbrock original
2 3 5 10 20 400
1
2
3
4 9 Rosenbrock rotated
2 3 5 10 20 400
1
2
3
4
5 10 Ellipsoid
2 3 5 10 20 400
1
2
3
4 11 Discus
2 3 5 10 20 400
1
2
3
4
5 12 Bent cigar
2 3 5 10 20 400
1
2
3
4 13 Sharp ridge
2 3 5 10 20 400
1
2
3
4 14 Sum of different powers
2 3 5 10 20 400
1
2
3
4
5
6
14
11
15 Rastrigin
2 3 5 10 20 400
1
2
3
4
5
6 716 Weierstrass
2 3 5 10 20 400
1
2
3
4
5
6 17 Schaffer F7, condition 10
2 3 5 10 20 400
1
2
3
4
5
6
14
18 Schaffer F7, condition 1000
2 3 5 10 20 400
1
2
3
4
5
6
7
10
19 Griewank-Rosenbrock F8F2
2 3 5 10 20 400
1
2
3
4
5
613
20 Schwefel x*sin(x)
2 3 5 10 20 400
1
2
3
4
5
6
14 2
21 Gallagher 101 peaks
2 3 5 10 20 400
1
2
3
4
5
6
1312
22 Gallagher 21 peaks
2 3 5 10 20 400
1
2
3
4
5
6
7
3
23 Katsuuras
2 3 5 10 20 400
1
2
3
4
5
6
7
4
24 Lunacek bi-Rastrigin
+1 +0 -1 -2 -3 -5 -8
Figure 1: Expected number of f-evaluations (ERT, with lines, see legend) to reach fopt + ∆f , median numberof f-evaluations to reach the most difficult target that was reached at least once (+) and maximum number off-evaluations in any trial (×), all divided by dimension and plotted as log10 values versus dimension. Shown
are ∆f = 10{1,0,−1,−2,−3,−5,−8}. Numbers above ERT-symbols indicate the number of successful trials. The lightthick line with diamonds indicates the respective best result from BBOB-2009 for ∆f = 10−8. Horizontal linesmean linear scaling, slanted grid lines depict quadratic scaling.
2 3 5 10 20 400
1
2
3
4 101 Sphere moderate Gauss
+1 +0 -1 -2 -3 -5 -8
2 3 5 10 20 400
1
2
3
4
5
6104 Rosenbrock moderate Gauss
2 3 5 10 20 400
1
2
3
4
5
6
7
13 14
5
107 Sphere Gauss
2 3 5 10 20 400
1
2
3
4
5
6
7
14
7
110 Rosenbrock Gauss
2 3 5 10 20 400
1
2
3
4
5
6
7
14
113 Step-ellipsoid Gauss
2 3 5 10 20 400
1
2
3
4 102 Sphere moderate unif
2 3 5 10 20 400
1
2
3
4
5
6 7105 Rosenbrock moderate unif
2 3 5 10 20 400
1
2
3
4
5
6
7 108 Sphere unif
2 3 5 10 20 400
1
2
3
4
5
6
7 111 Rosenbrock unif
2 3 5 10 20 400
1
2
3
4
5
6
7
5 3
114 Step-ellipsoid unif
2 3 5 10 20 400
1
2
3
4 103 Sphere moderate Cauchy
2 3 5 10 20 400
1
2
3
4106 Rosenbrock moderate Cauchy
2 3 5 10 20 400
1
2
3
4 109 Sphere Cauchy
2 3 5 10 20 400
1
2
3
4
5
6
7
13
112 Rosenbrock Cauchy
2 3 5 10 20 400
1
2
3
4
5 115 Step-ellipsoid Cauchy
2 3 5 10 20 400
1
2
3
4
5
6
7
14
2116 Ellipsoid Gauss
2 3 5 10 20 400
1
2
3
4
5
6
7
14
9
119 Sum of diff powers Gauss
2 3 5 10 20 400
1
2
3
4
5
6
7
4
122 Schaffer F7 Gauss
2 3 5 10 20 400
1
2
3
4
5
6
14
3125 Griewank-Rosenbrock Gauss
2 3 5 10 20 400
1
2
3
4
5
6
7
14
3128 Gallagher Gauss
2 3 5 10 20 400
1
2
3
4
5
6
7 117 Ellipsoid unif
2 3 5 10 20 400
1
2
3
4
5
6 120 Sum of diff powers unif
2 3 5 10 20 400
1
2
3
4
5
6
7 123 Schaffer F7 unif
2 3 5 10 20 400
1
2
3
4
5
6126 Griewank-Rosenbrock unif
2 3 5 10 20 400
1
2
3
4
5
69
2 129 Gallagher unif
2 3 5 10 20 400
1
2
3
4 118 Ellipsoid Cauchy
2 3 5 10 20 400
1
2
3
4121 Sum of diff powers Cauchy
2 3 5 10 20 400
1
2
3
4
5
6
7
5
1124 Schaffer F7 Cauchy
2 3 5 10 20 400
1
2
3
4
5
6
7
2
127 Griewank-Rosenbrock Cauchy
2 3 5 10 20 400
1
2
3
4
5
6
29
130 Gallagher Cauchy
+1 +0 -1 -2 -3 -5 -8
Figure 2: Expected number of f-evaluations (ERT, with lines, see legend) to reach fopt + ∆f , median numberof f-evaluations to reach the most difficult target that was reached at least once (+) and maximum number off-evaluations in any trial (×), all divided by dimension and plotted as log10 values versus dimension. Shown
are ∆f = 10{1,0,−1,−2,−3,−5,−8}. Numbers above ERT-symbols indicate the number of successful trials. The lightthick line with diamonds indicates the respective best result from BBOB-2009 for ∆f = 10−8. Horizontal linesmean linear scaling, slanted grid lines depict quadratic scaling.
D = 5 D = 20
all
funct
ions
0 1 2 3 4 5log10 of FEvals / DIM
0.0
0.5
1.0
pro
port
ion o
f tr
ials
f1-24+1:24/24-1:23/24-4:22/24-8:22/24
0 2 4 6 8 10 12 14 16 18log10 of Df / Dftarget
f1-240 1 2 3 4 5
log10 of FEvals / DIM
0.0
0.5
1.0
pro
port
ion o
f tr
ials
f1-24+1:23/24-1:17/24-4:17/24-8:17/24
0 2 4 6 8 10 12 14 16 18log10 of Df / Dftarget
f1-24
separa
ble
fcts
0 1 2 3 4 5log10 of FEvals / DIM
0.0
0.5
1.0
pro
port
ion o
f tr
ials
f1-5+1:5/5-1:5/5-4:5/5-8:5/5
0 2 4 6 8 10 12 14 16 18log10 of Df / Dftarget
f1-50 1 2 3 4 5
log10 of FEvals / DIM
0.0
0.5
1.0
pro
port
ion o
f tr
ials
f1-5+1:5/5-1:3/5-4:3/5-8:3/5
0 2 4 6 8 10 12 14 16 18log10 of Df / Dftarget
f1-5
mis
c.m
oder
ate
fcts
0 1 2 3 4 5log10 of FEvals / DIM
0.0
0.5
1.0
pro
port
ion o
f tr
ials
f6-9+1:4/4-1:4/4-4:4/4-8:4/4
0 2 4 6 8 10 12 14 16 18log10 of Df / Dftarget
f6-91 2 3 4 5
log10 of FEvals / DIM
0.0
0.5
1.0
pro
port
ion o
f tr
ials
f6-9+1:4/4-1:4/4-4:4/4-8:4/4
0 2 4 6 8 10 12 14 16 18log10 of Df / Dftarget
f6-9
ill-
condit
ioned
fcts
0 1 2 3 4 5log10 of FEvals / DIM
0.0
0.5
1.0
pro
port
ion o
f tr
ials
f10-14+1:5/5-1:5/5-4:5/5-8:5/5
0 2 4 6 8 10 12 14 16 18log10 of Df / Dftarget
f10-140 1 2 3 4 5
log10 of FEvals / DIM
0.0
0.5
1.0
pro
port
ion o
f tr
ials
f10-14+1:5/5-1:5/5-4:5/5-8:5/5
0 2 4 6 8 10 12 14 16 18log10 of Df / Dftarget
f10-14
mult
i-m
odal
fcts
0 1 2 3 4 5log10 of FEvals / DIM
0.0
0.5
1.0
pro
port
ion o
f tr
ials
f15-19+1:5/5-1:5/5-4:5/5-8:5/5
0 2 4 6 8 10 12 14 16 18log10 of Df / Dftarget
f15-190 1 2 3 4 5
log10 of FEvals / DIM
0.0
0.5
1.0
pro
port
ion o
f tr
ials
f15-19+1:5/5-1:3/5-4:3/5-8:3/5
0 2 4 6 8 10 12 14 16 18log10 of Df / Dftarget
f15-19
wea
kst
ruct
ure
fcts
0 1 2 3 4 5log10 of FEvals / DIM
0.0
0.5
1.0
pro
port
ion o
f tr
ials
f20-24+1:5/5-1:4/5-4:3/5-8:3/5
0 2 4 6 8 10 12 14 16 18log10 of Df / Dftarget
f20-240 1 2 3 4 5
log10 of FEvals / DIM
0.0
0.5
1.0
pro
port
ion o
f tr
ials
f20-24+1:4/5-1:2/5-4:2/5-8:2/5
0 2 4 6 8 10 12 14 16 18log10 of Df / Dftarget
f20-24
Figure 3: Empirical cumulative distribution functions (ECDFs), plotting the fraction of trials with an outcomenot larger than the respective value on the x-axis. Left subplots: ECDF of number of function evaluations(FEvals) divided by search space dimension D, to fall below fopt + ∆f with ∆f = 10k, where k is the firstvalue in the legend. Right subplots: ECDF of the best achieved ∆f divided by 10−8 for running times ofD, 10D, 100D, . . . function evaluations (from right to left cycling black-cyan-magenta). The thick red linerepresents the most difficult target value fopt + 10−8. Legends indicate the number of functions that weresolved in at least one trial. Light brown lines in the background show ECDFs for ∆f = 10−8 of all algorithmsbenchmarked during BBOB-2009.
Table 2: Expected running time (ERT in number of function evaluations) divided by the best ERT measuredduring BBOB-2009 (given in the respective first row) for different ∆f values for functions f1–f24. The mediannumber of conducted function evaluations is additionally given in italics, if ERT(10−7) = ∞. #succ is thenumber of trials that reached the final target fopt + 10−8.
5-D 20-D 5-D 20-D
nois
eles
sfu
nct
ions
1 2 3 4 5 6log10 of FEvals / dimension
-2
-1
0
1
2
3
4
log10 o
f ER
T loss
rati
o
CrE = 0 f1-24
1 2 3 4 5 6log10 of FEvals / dimension
-2
-1
0
1
2
3
4
log10 o
f ER
T loss
rati
o
CrE = 0 f1-24
1 2 3 4 5 6log10 of FEvals / dimension
-2
-1
0
1
2
3
4
log10 o
f ER
T loss
rati
o
CrE = 0 f101-130
1 2 3 4 5 6log10 of FEvals / dimension
-2
-1
0
1
2
3
4
log10 o
f ER
T loss
rati
o
CrE = 0 f101-130
nois
yfu
nct
ions
Figure 5: ERT loss ratio vs. a given budget FEvals. The target value ft used for a given FEvals is the smallest(best) recorded function value such that ERT(ft) ≤ FEvals for the presented algorithm. Shown is FEvals dividedby the respective best ERT(ft) from BBOB-2009 for all functions (noiseless f1–f24, left columns, and noisyf101–f130, right columns) in 5-D and 20-D. Line: geometric mean. Box-Whisker error bar: 25-75%-ile withmedian (box), 10-90%-ile (caps), and minimum and maximum ERT loss ratio (points). The vertical line givesthe maximal number of function evaluations in a single trial in this function subset.
Table 3: ERT ratios, as in table 2, for functions f101–f130.
J. Schmidhuber. Exponential Natural EvolutionStrategies. In Genetic and Evolutionary ComputationConference (GECCO), Portland, OR, 2010.
[7] N. Hansen, A. Auger, S. Finck, and R. Ros.Real-parameter black-box optimization benchmarking2012: Experimental setup. Technical report, INRIA,2012.
[8] N. Hansen, S. Finck, R. Ros, and A. Auger.Real-parameter black-box optimization benchmarking2009: Noiseless functions definitions. Technical ReportRR-6829, INRIA, 2009. Updated February 2010.
[9] N. Hansen, S. Finck, R. Ros, and A. Auger.Real-parameter black-box optimization benchmarking2009: Noisy functions definitions. Technical ReportRR-6869, INRIA, 2009. Updated February 2010.
[10] N. Hansen and A. Ostermeier. Completelyderandomized self-adaptation in evolution strategies.IEEE Transactions on Evolutionary Computation,9:159–195, 2001.
[12] T. Schaul. Natural Evolution Strategies Converge onSphere Functions. In Genetic and EvolutionaryComputation Conference (GECCO), Philadelphia, PA,2012.
[13] T. Schaul, J. Bayer, D. Wierstra, Y. Sun, M. Felder,F. Sehnke, T. Ruckstieß, and J. Schmidhuber.PyBrain. Journal of Machine Learning Research,11:743–746, 2010.
[14] Y. Sun, D. Wierstra, T. Schaul, and J. Schmidhuber.Stochastic search using the natural gradient. InInternational Conference on Machine Learning(ICML), 2009.
[15] D. Wierstra, T. Schaul, T. Glasmachers, Y. Sun, andJ. Schmidhuber. Natural Evolution Strategies.Technical report, 2011.
[16] D. Wierstra, T. Schaul, J. Peters, andJ. Schmidhuber. Natural Evolution Strategies. InProceedings of the IEEE Congress on EvolutionaryComputation (CEC), Hong Kong, China, 2008.