Renewing the Theory of Continuous Evolutionary Algorithms · Gheorghita Zbaganux Abstract Evolutionary algorithms (EAs) acting on continuous space require a more sophis-ticated modeling

Renewing the Theory of Continuous Evolutionary

Algorithms

Alexandru Agapie∗ Mircea Agapie† Gunter Rudolph‡

Gheorghita Zbaganu§

Abstract

Evolutionary algorithms (EAs) acting on continuous space require a more sophis-ticated modeling than their discrete counterparts. Sharing with classical theory ofevolution strategies the interest for characterizing the expected one-step progress,we propose an approach based on stochastic renewal processes. The new approachis powerful enough to provide global convergence conditions, as well as computa-tion times for particular algorithms on particular fitness functions.

Keywords: Evolutionary algorithm, continuous optimization, convergence time,Wald’s equation, renewal process, drift analysis.

1 Introduction

An objective reader of the theoretical developments of Evolutionary Algorithms (EAs)might find her/himself let down by either the excessive restrictiveness of the assumptionsunder which the analysis is performed, or by its extreme generality, which usually leadsto little usefulness. The precise tradeoff is always complicated, and the difficulty of thetask is inverse proportional to the algorithms’ efficiency. Building on probability theoryrather than statistical physics, we aim at introducing a new stochastic modeling forcontinuous EAs, one based on renewal processes.

A second goal of the paper consists in estimating analytically the hitting time of the(1+1)EA with uniform mutation inside the (hyper)-sphere of volume 1, minimising thewell-known SPHERE function1

f : <n → < f(x) = f(x1, . . . , xn) =n∑

i=1

x2i .

Although new in essence, a perseverant reader could trace back the roots of our approachto the following sources:

∗Dept. Mathematics, Bucharest University of Economic Studies, Cl. Dorobantilor 15-17, Bucharest010552, Romania, [email protected]

†Computer Science, Tarleton State University, Box T-0930 Stephenville, TX 76402, USA‡Computer Science XI, Technical University Dortmund 44227, Germany§Institute of Math. Statistics and Applied Mathematics, Casa Academiei Romane, Calea 13 Septem-

brie 13, Bucharest 050711, Romania1In order to avoid confusion, we shall use uppercase when referring to the fitness function, and

lowercase when referring to the mutation operator.

1

• Rudolph [22, 23], for the extrapolation of the Markov chain modeling of discreteEAs onto their continuous counterparts;

• Beyer [6], for the geometrically oriented approach and the integral calculus of(one-step) progress rate on the SPHERE model;

• Jagerskupper [14, 15], for the analysis of atypical (that is, non-normal) mutationoperators, the application of Wald’s equation and, again, the geometrical andintegral calculus of progress rate on SPHERE.

While Rudolph’s stochastic analysis of the continuous case is only incipient, lackingthe computation of hitting times, the theory of Evolution Strategies (ES) developedby Beyer has been criticized for using several approximations, not all of them mathe-matically rigorous: ”A first approximation called the first order approximation amountsto considering averaged instead of random behaviors. In a second kind of approxima-tion, fluctuations are modelled as Gaussian noises which can be a crude approximationof the actual behavior” [7, p.270]. Actually, approximations are inevitable due to theanalytical intractability of the (one-step) expected progress rate

φ1+1 = E[d−

√(d− x)2 + h2

](1)

in case of normal mutation. Eq. (1) defines progress as difference between distances toorigin of current (d), and next position of the algorithm (

√(d− x)2 + h2) - see Figure

1, or Eq. (3.10) and Figure (3.1) in [6, p.53].In addition to Beyer’s work, Jagerskupper proved - mathematically rigorously, using

a modification of Wald’s equation - the following lower bound on the expected runtimeof the (1 + 1)EA minimising the SPHERE. Notice that |c| is replaced in our setting byd.

Theorem 1.1 ([14]) Let the (1+1)ES minimise SPHERE using isotropic2 mutations,and let c ∈ <n denote the current search point. Independently of the mutation adaptationused the expected number of steps necessary to obtain a spatial gain of Θ(|c|) is in Ω(n).

Building on previous analyses of atypical probabilistic algorithms, like ball walk and hit-and-run [19], Jagerskupper took a very interesting turn in extrapolating the previousresult to a more general class of black-box optimization algorithms, the ones usingrandom directions [15]. Needless to say, all the (1 + 1)EAs with isotropic mutation -including the classical normal-mutation ES, and the uniform-mutation inside the sphere(ball walk) algorithm studied in this paper - fall into that category. Focusing on theexpected value of the angle α = ∠OSC in Figure 1 - rather than on the expected locationof the next individual, Jagerskupper derived a closed form of E(cos α) and narrow upperand lower bounds for E(sinα). His conclusion is the same as that of Theorem 1.1: theexpected number of steps to halve the approximation error has a lower bound linear inn.

Section 2 underlines the theoretical resemblance between normal and uniform mu-tation, then section 3 introduces the new stochastic model of continuous EAs. Step bystep, classical theory of renewal processes - initially devoted to queueing systems - isadapted to our algorithmic paradigm, up to the theorem describing the asymptotics of

2See section 2 for the definition of isotropic mutation.

2

the expected first hitting time. In section 4 we show how the stochastic characterisa-tion of hitting time can be read in terms of drift analysis. The next two sections aredevoted to the direct computation of convergence time. First, an integral calculus ofhypersphere volumes and centroids is developed in section 5. The results are then usedin section 6, together with the stochastic analysis, to compute lower and upper boundson the convergence time of (1+1) EA with uniform mutation on the SPHERE. Section7 concludes the paper.

2 Uniform versus normal mutation

One could argue that a theoretical analysis employing only uniform mutation is sim-plistic and unpractical. To answer such criticism we point out that the first hittingtime analysis based on renewal processes performed in this paper does not rely on theparticular form of the mutation operator, but only on the expected value and varianceof the random variable one-step progress. Depending on the complexity of the fitnessfunction and of the mutation distribution, the moments of this random variable couldbe computed either analytically, or numerically.

From a stochastic point of view, the two distributions - normal, and uniform insidethe sphere - are very much alike. According to [10, p.28], they both belong to the sameclass of spherical distributions, i.e. they both can be represented as

x d= r · u

where x is the n-dimensional spherical random vector, u is the the random vectordistributed uniformly on the unit sphere surface in <n, r = ||x|| is a non-negativescalar random variable independent of u, and d= denotes the fact that two randomvectors have the same distribution. ”Thus, a mutation according to a multivariatenormal distribution can be interpreted as a step in random direction u with randomstep length r” [24]. And the same holds for the uniform mutation inside the sphere, yetonly with a different step length. For the normal mutation x = N(0, σ2I), an analyticalderivation of the first two moments of r yields [22, p.21]

E(r) ≈ σ

√n− 1

2+

116n

V ar(r) ≈ σ2

2

(1− 1

8n

)

”indicating that the random step length actually does not vary strongly” [24].The concept of spherical distribution has quite a history in EA literature under a

different name: isotropic mutation.

Definition 2.1 ([14]) For m ∈ <n let |m| denote its length/L2-norm and m := m/|m|the normalized vector. The random mutation vector m is isotropically distributed iff|m| is independent of m and m is uniformly distributed upon the unit hyper-sphereU := x ∈ < | |x| = 1.It follows that, the uniform mutation employed in our study is isotropic, as much as thenormal mutation of classical EA and ES theory.

3

3 The Continuous EA is a Renewal Process

The search space considered in this paper is <n, while the EA will be of the simplesttype, (1 + 1), with population consisting of only one individual. In each iteration amutation is produced, and the best of the old and new individuals is selected. Themutation’s effect is modeled by a probability distribution, set to be normal (Gaussian)in practice and most theoretical studies, yet uniform inside the sphere for the purposeof this paper - see section 2 for a stochastic comparison between the two cases.

For each t = 0, 1, 2, . . ., let Pt be the random variable ’(best individual from) EApopulation at iteration t’. Then Ptt≥0 is a stochastic process on <n. We also define adistance d : <n ← <+

0 , accounting for the (one-dimensional) distance to optimum, thatis, to 0 := (0, . . . , 0) since we are minimising. Distance d will also stand for our driftfunction. As generally the case with probabilistic algorithms on continuous space, wesay convergence is achieved at iteration t if the algorithm has entered an ε-vicinity of 0for some fixed ε, 0 ≤ d(Pt) < ε. We also define the stochastic process Xtt≥1 given by

Xt = d(Pt−1)− d(Pt) t = 1, 2, . . . . (2)

In our EA framework, Xt will stand for the (relative) progress of the algorithm in onestep, namely from the (t − 1)st iteration to the tth. Due to EA’s elitism Xtt≥1 arenon-negative random variables (r.v.s), and we shall also assume they are independent.Each Xt is composed of a point mass (singular, or Dirac measure) in zero accountingfor the event where there is no improvement from Pt−1 to Pt, and a continuous partaccounting for the real progress toward the optimum - a truncated uniform or normaldistribution, e.g.. A second natural assumption is PXt = 0 < 1, or equivalentlyPXt > 0 > 0, for all t, otherwise convergence of the algorithm would be precluded.This does not require a progress at each iteration, but only a strictly positive probabilityof such progress.

The conditions above are not yet sufficient for a meaningful stochastic analysis. Toprovide first hitting times for the algorithms we study, we also require the followingnatural hypotheses:

H: Xtt≥1 are non-negative, independent r.v.s and there exist constantsµ1, µ2, σ > 0 such that µ1 ≤ E(Xt) ≤ µ2 and V ar(Xt) ≤ σ2, for all t.

Condition H is flexible enough to cover the (inhomogeneous) case of different mutationrates and different success probabilities at different algorithmic iterations. For exam-ple, H describes families of distributions that are all normal, or all uniform, with theparameters ranging within certain positive bounds.

It is shown below that H yields a stronger restriction on the progress probabilitiesthan the already stated ’PXt > 0 > 0 for all t’. We need first some general resultsfrom probability theory.

Lemma 3.1 If X is a positive random variable and α > 0 s.t. PX ≥ α = 0, thenE(X) ≤ α · PX < α .

Proof.Let M > α and define the r.v.s

Xα =

X if X ≥ α

α if X < αXM =

infX, M if X ≥ α

α if X < α.

4

Then Xα, XM are positive and

E(XM ) ≤ M · PX ≥ α+ α · PX < α = α · PX < α.

Moreover, since XMM is monotone increasing and XM → Xα as M→∞, Lebesque’sMonotone Convergence theorem3 - see e.g. [25] p.59 - ensures that E(XM ) → E(Xα)as M→∞, thus

E(Xα) ≤ α · PX < αand the conclusion obtains after taking expected values in X ≤ Xα. ¤

Lemma 3.2 H ⇒ there exist α, β > 0 such that PXt ≥ α ≥ β for all t.

Proof.We show first that H implies PXt ≥ α > 0 for all α < µ1 and all t. Let us fix0 < α < µ1 arbitrarily. Reasoning by contradiction, suppose that there is a t withPXt ≥ α = 0. Then lemma 3.1 implies

E(Xt) ≤ α · PXt < α = α · 1 < µ1

which contradicts H. So PXt ≥ α > 0 holds for all t. We show next that thesame inequality holds for some β > 0. Actually, we are going to prove that for anynon-negative r.v. X with E(X) = µ and V ar(X) = σ2,

P (X > α) ≥ (µ− α)2

σ2 + µ2for any α < µ. (3)

Decomposing the r.v. X with respect to the indicator function IX≤α = 1 if X ≤ α andzero otherwise, we obtain

µ =E(X · IX≤α) + E(X · IX>α) ==E(X|X ≤ α) · PX ≤ α+ E(X|X > α) · PX > α.

We have next

E(X|X > α) · PX > α ≥ µ− αPX ≤ α ⇒

⇒ E(X|X > α) ≥ µ− αPX ≤ αPX > α

and if we apply the same decomposition to E(X2),

σ2 + µ2 =E(X2) = E(X2 · IX≤α) + E(X2 · IX>α) =

=E(X2|X ≤ α) · PX ≤ α+ E(X2|X > α) · PX > α ≥≥E(X2|X > α) · PX > α ≥ E(X|X > α)2 · PX > α ≥

≥ (µ− αPX ≤ α)2PX > α ≥ (µ− α)2

PX > α3If 0 ≤ X1 ≤ X2 ≤ . . . and Xn → X with probability 1, then E(Xn) → E(X).

5

which proves Eq. (3). Now, if we have a family of non-negative r.v.s Xtt≥1 such thatµ1 ≤ E(Xt) ≤ µ2 and V ar(Xt) ≤ σ2 for all t (hypothesis H), the same reasoning yields,for an arbitrarily fixed α with 0 < α < µ1,

P (Xt > α) ≥ (µ1 − α)2

σ2 + µ22

=: β.

¤Let us return to defining the renewal process in case of the continuous EA optimization.By summing up the relative progress at each iteration we obtain St, the overall progressin t iterations:

St = X1 + . . . + Xt. (4)

Remark 3.3 Under hypothesis H, the the stochastic process Stt≥1 is an atypical dis-continuous random walk, since r.v.s Xt are independent but not identically distributed.One could speak in this case of an inhomogeneous random walk.

From Eq. (2) we obtain

St =d(P0)− d(P1) + d(P1)− d(P2) + . . . + d(Pt−1)− d(Pt) ==d(P0)− d(Pt) t = 1, 2, . . . .

Remark 3.4 By definition, St is bounded within the closed interval [0, d(P0)], for allt ≥ 1. If we fix at the start of the algorithm a positive δ to designate the ’maximaldistance to optimum’, then we have

0 ≤ St ≤ d(P0) ≤ δ.

Let us now introduce another r.v., accounting for the EA’s first hitting time of the area[0, d(P0) − d), or equivalently, for the overall progress to go beyond a certain positivethreshold d4:

Td = inft | d(Pt) < d(P0)− d = inft | St > d.According to [11, 21], the process Tdd>0 will be called a renewal process5 with thefollowing interpretation: We say a renewal occurs at distance d(P0)−d from the optimumif St = d for some iteration t. A renewal is actually a ’successful iteration’, that is, aniteration that produced a strictly positive progress towards the optimum. After eachrenewal the process (the algorithm) starts over again.

3.1 First Hitting Time

Proposition 3.5 Under hypothesis H, the first hitting time of the continuous EA isfinite with probability 1.

Proof.Under H, the Strong Law of Large Numbers for independent non-identical r.v.s6 yields

µ1 ≤ St

t≤ µ2 with probability 1. (5)

4In order to keep the notation simple, we shall use the same letter ’d’ for denoting the distancefunction d(·), and a scalar d > 0.

5The continuous-time index t of a classical renewal process Ntt≥0 in queueing theory is replacedin our paradigm by a continuous-distance index d.

6Xtt≥1 independent s.t.∞∑1

V ar(Xi)

i2< ∞, then

t∑1

[Xi−E(Xi)]

t→ 0 with probability 1.

6

Using the left hand side of the inequality for a fixed d yields St ≤ d only finitely often,and Td < ∞ with probability 1. ¤

Definition 3.6 An integer valued positive random variable T is called a stopping timefor the sequence Xtt≥1 if the event T = t is independent of Xt+1, Xt+2, . . . for allt ≥ 1.

We have the following simple result.

Lemma 3.7 Td defined as above is a stopping time for Xtt≥1, for any d > 0.

Proof.

Td = t = St > d, St−1 ≤ d = t∑

k=1

Xk > d,

t−1∑

k=1

Xk ≤ d

which is obviously independent of Xt+1, Xt+2, . . . . ¤We also have the relationship that the first hitting time of a distance d from the startingpoint is greater than t if and only if the tth iteration yields a point situated at distanceless than or equal d. Formally,

Td > t ⇔ St ≤ d.

According to [21], E(Td), the mean/expected value of Td is called the renewal function,and much of classical renewal theory is concerned with determining its properties. Inour EA framework, if we set d := d(P0)−ε with some fixed positive ε defining the target-zone of the continuous space algorithm, then Td = inft | d(Pt) < ε is the first hittingtime of the target-zone, and E(Td) the expected (first) hitting time. So determining theproperties of the renewal function seems to be the principal goal of EA theory as well.

Table below summarizes the intuitive interpretation of the random variables Xt, St

and Td under the continuous EA setting.

Random Variable InterpretationXt (one-dimensional) progress between

the (t− 1)st and the tth iteration

St overall progress up to the tth iteration

Td (no. of iterations) first hitting time ofa distance d from the starting point

The following theorem is crucial to the stochastic analysis of continuous EAs. Note thatthis result was also used in [14], yet outside the context of renewal processes.

Theorem 3.8 (Wald’s Equation, [21] p.38) If Xtt≥1 are independent and iden-tically distributed random variables having finite expectations E(X), and T is a stoppingtime for Xtt≥1 such that E(T ) < ∞, then

E

(T∑

t=1

Xt

)= E(T ) · E(X).

¤

7

When applied to the continuous EA paradigm, Wald’s equation provides only a lowerbound on the expected hitting time. In order to obtain both upper and lower bounds,the application of limit theorems from renewal processes is necessary.

A reformulation in terms of inequalities of Wald’s equation is first required.

Theorem 3.9 (Wald’s Inequation) If Xtt≥1 are independent, non-negative, µ1 ≤E(Xt) ≤ µ2 for all t and T is a stopping time for Xtt≥1, then

µ1E(T ) ≤ E

(T∑

t=1

Xt

)≤ µ2E(T ).

Proof.Letting

Yt =

1 if T ≥ t

0 if T < t

we have thatT∑

t=1

Xt =∞∑

t=1

XtYt.

Due to Lebesque’s Monotone Convergence theorem (see proof of lemma 3.1) since allproducts XtYt are positive, one can interchange expectation and summation

E

(T∑

t=1

Xt

)=

∞∑t=1

E(XtYt). (6)

Note that Yt = 1 if and only if we have not stopped after succesively observingX1, . . . , Xt−1, therefore Xt is independent of Yt for all t. Then

E

(T∑

t=1

Xt

)=∞∑

t=1

E(Xt)E(Yt)

µ1

∞∑t=1

E(Yt) ≤E

(T∑

t=1

Xt

)≤ µ2

∞∑t=1

E(Yt)

µ1

∞∑t=1

PT ≥ t ≤E

(T∑

t=1

Xt

)≤ µ2

∞∑t=1

PT ≥ t

µ1E(T ) ≤E

(T∑

t=1

Xt

)≤ µ2E(T ).

That E(T ) =∑∞

t=1 PT ≥ t can be seen - for any non-negative integer r.v. T - asfollows:

∞∑t=1

PT ≥ t =∞∑

t=1

∞∑

k=t

PT = k =∞∑

k=1

PT = k ·k∑

t=1

1 =

=∞∑

k=1

k · PT = k = E(T ).

8

¤Note that the only restrictions on Xtt≥1 required by theorem 3.9 were ’Xt ≥ 0’, and’µ1 ≤ E(Xt) ≤ µ2’, for all t - hence a simplified version of H. Condition ’E(T ) < ∞’,which occurred in Wald’s equation, was no longer used in the inequation. Actually, if wefollow the proof of theorem 3.9, we find that the only point where the condition couldapply is when we interchange expectation and summation in equation (6). Instead, weused Lebesque’s Monotone Convergence theorem, which does not require a conditionlike ’E(T ) < ∞’ but only monotony of the partial sums - ensured by ’Xt ≥ 0 for all t’.

So, apparently, one could conclude that whenever Wald’s inequation is applied, Hmay be replaced by that simplified hypothesis. That is not the case, since E(T ) willdesignate the expected hitting time of an area at certain distance from the startingpoint of the algorithm, and, if E(T ) = ∞, there is no convergence at all. Hence we needalso E(T ) < ∞ for our analysis, and that is proved under the continuous EA paradigmin proposition 3.11 below, relying strongly on lemma 3.2, which in turn does not workunless all requirements in H are fulfilled.

We show next that the result of proposition 3.5 holds also for the expected hittingtime of the renewal process modelling the continuous EA. This is not trivial, sincefiniteness with probability 1 of a positive random variable does not imply finiteness ofits expected value, see e.g. the Cauchy distribution.

First we need a simple result:

Lemma 3.10 Let us consider a discrete random variable Z =(

0 11− p p

)and

Z1, Z2, . . . be independent, identically distributed as Z. Let also consider the stoppingtime M = infm | Z1 + . . . + Zm = 1. Then E(M) = 1/p.

Proof.For each positive integer k

PM = k = PZ1 = 0, . . . , Zk−1 = 0, Zk = 1 = p(1− p)k

and thus the mean of discrete r.v. M ,

M =(

1 2 . . . k . . .p p(1− p) . . . p(1− p)k . . .

)

is computed as

E(M) =∞∑

k=1

kp(1− p)k−1 = p

∞∑

k=1

kp(1− p)k−1 → 1p

where convergence comes from both-side derivation of the geometrical series∑∞

k=1(1−p)k. ¤

Proposition 3.11 Under hypothesis H, the expected hitting time of the continuous EAis finite.

Proof.We need to prove that E(Td) < ∞ for all d > 0. Under H, the existence of α and β

9

is guaranteed by H itself and lemma 3.2. Define a related renewal process Xtt≥1 bytruncating each Xt to

Xt =(

0 α1− β β

). (7)

Note that Xt does not depend on t anymore. Also, under hypothesis H, we haveXt ≤ X

′t, where

X′t =

0 if Xt < α

α if Xt ≥ α. (8)

Let now define T d = inft | X1 + . . . + Xt > d. For the related process, successiveiterations can move the algorithm only along the lattice d = tα, t = 0, 1, 2 . . .. Also, thenumber of iterations required for a success (a real jump of length α) are independentrandom variables with mean 1/β.

To see that, apply lemma 3.10 for the r.v. Zt = Xt/α which registers 1 for a successand 0 for a stagnation of the EA from iteration (t− 1) to iteration t. Thus,

E(T d) ≤ [d/α] + 1β

< ∞ (9)

and the rest follows since Xt ≤ Xt holds under H, and that implies T d ≥ Td.¤

3.2 Expected Hitting Time

The expression 1/E(Xt) is often called the progress rate between the (t− 1)st and thetth iteration. Following the general theory of renewal processes [11, 21], we prove nextthe highly intuitive result that the (expected) average number of iterations required perdistance unit converges to the progress rate. As E(Td) represents the expected hittingtime of an area situated at distance d from the starting point of the algorithm, the resultbelow will provide estimates of the convergence time for continuous EAs.

We stress again that the estimates given below are meaningless without the assertion’E(Td) < ∞ for all d > 0’, ensured under H through proposition 3.11 .

Theorem 3.12 Under hypothesis H we have, as d→∞1µ2

≤ E(Td)d

≤ 1µ1

.

Proof.Under H, E(Td) is finite due to proposition 3.11 and Wald’s inequation yields

µ1E(Td) ≤ E(STd) ≤ µ2E(Td). (10)

Since STd> d we also have µ2E(Td) ≥ E(STd

) > d and thus

lim infd→∞

E(Td)d

≥ 1µ2

. (11)

To go the other way, let us fix a constant M and define a new renewal process Xtt≥1

by letting for each t = 1, 2, . . .

Xt =

Xt if Xt ≤ M

M if Xt > M(12)

10

Let St = X1 + . . . + Xt and T d = inft | St > d. For each positive integer M theprocess Xt is non-negative. Since 0 ≤ Xt ≤ Xt, we have for each M > 0

0 ≤ µ1M ≤ E(Xt) ≤ µ2 for all t (13)

where µ1M does not depend on t and is defined for each M by

µ1M = minµ1, inf

tE(Xt). (14)

We show that µ1M → µ1 as M→∞. Reasoning by contradiction, assume non-convergence.

As µ1M ≤ µ1 always holds, for any ε > 0, there are infinitely many Mk outside the ε-

vicinity of µ1, the interval (µ1 − ε, µ1]. According to the definition of µ1M , we have for

an arbitrary ε > 0

E(Xt) < µ1 − ε for all t and all Mkk≥1. (15)

But if we apply Lebesque’s Dominated Convergence theorem to a fixed t0, 0 ≤ Xt0 ≤Xt0 , E(Xt0) ≤ µ2 < ∞ and Xt0 → Xt0 , as Mk→∞, it yields

E(Xt0) → E(Xt0) as Mk→∞, (16)

and combined with µ1 ≤ E(Xt0) provides an index k0 such that

E(Xto) ≥ µ1 − ε for all Mk with k ≥ k0 (17)

which obviously contradicts Eq. (15).So µ1

M → µ1 from below and as 0 < µ1, one can find a positive M0 such that

0 < µ1M ≤ E(Xt) ≤ µ2 for all t and all M ≥ M0. (18)

It is easy to see that, for sufficiently large M

0 ≤ at ≤ α < µ1M ≤ E(Xt) ≤ µ2 for all t (19)

which means that Xtt≥1 also satisfies an H condition, thus E(T d) < ∞ for all d andone can call Wald’s inequation on Xtt≥1 and T d, yielding

µ1ME(T d) ≤ E(ST d

) ≤ µ2E(T d). (20)

Return to the main proof to observe that ST d≤ d + M together with Xt ≤ Xt and

T d ≥ Td implyµ1

ME(Td) ≤ µ1ME(T d) ≤ E(ST d

) < d + M. (21)

We obtain

lim supd→∞

E(Td)d

≤ 1µ1

M

. (22)

and letting M→∞ yields

lim supd→∞

E(Td)d

≤ 1µ1

(23)

which together with Eq. (11) completes the proof.¤

11

As one can see from the proof of theorem 3.12, the left hand side of the inequality - theone giving a lower bound on E(Td) - is a simple consequence of Wald’s inequation. Mostof the effort was concentrated on validating the upper bound of the expected hittingtime - far more significant for computation time analysis.

Translated to our continuous EA paradigm, theorem 3.12 says that the expectedaverage7 hitting time is bounded, under hypothesis H, by the inverse bounds of theexpected progress in one step.

The estimates for the expected hitting time hold for a general (1+λ) EA, optimisingan arbitrary fitness function defined on n-dimensional continuous space. The case ofEA with constant parameters is obviously covered, but also the more practical situationwhere parameters are adapted (are allowed to vary) during the evolution.

The analysis performed so far on continuous EAs regarded as renewal processes issimilar to the Markov chain analysis of discrete EAs performed in [2, 3, 22] - see [23]for an accurate state of art in stochastic convergence for discrete EAs. It closes thetheoretical discussion on convergence of the algorithm, opening the door for particularestimations of local progress rates µ1 and µ2. As this calculus has a long history in EAtheory, we shall use some of the previous results in the remaining sections of the paper.

But first let us digress and see how theorem 3.12 extrapolates the drift theorems ofdiscrete EAs.

4 Drift Analysis

Drift analysis is relatively old within probabilistic optimisation theory [12], yet it wasonly recently introduced as a powerful tool in the study of convergence of evolutionaryalgorithms [13]. Applied exclusively to discrete EAs, drift analysis made obsolete thehighly technical proof given in [9] for the hitting time of the (1 + 1)EA with linear(pseudo-Boolean) functions [8].

We start by reviewing the definition and main drift theorem for a finite space Zcontaining all possible EA populations.

Define distance d := Z → <+ with d(P ) = 0 if and only if population P containsthe optimal solution - so we are minimising. Let T = inft | d(Pt) = 0 be a randomvariable, and consider a maximum distance of an arbitrary population to the optimum

M := maxd(P )|P ∈ Z.The fact that M < ∞ comes from the finiteness of the search space. As for each iterationt, the current population Pt is a random variable, so will be d(Pt) and also the so-calleddecrease in distance function Xt (Dt in the original approach), given by

Xt := d(Pt−1)− d(Pt).

By definition, E(Xt|T ≥ 0) is called drift. We also define

∆ := minE(Xt|T ≥ t)|t ≥ 1.Theorem 4.1 (Drift Theorem - Upper Bound) If ∆ > 0 then

E(T ) ≤ M

∆7With respect to distance on the progress axis.

12

¤

We replace M in the proof of theorem 3.12 by the distance d used in the continuousspace model of Section 3 (d may later grow to infinity). Similarly, T is replaced by Td

(the first hitting time of an area situated at distance d from the initial population). Theexistence of ∆ is postulated by hypothesis H: ∆ = µ1.

With these substitutions, the drift theorem is a corollary of theorem 3.12, since wecan use the right hand side (upper bound) limit:

E(Td)d

≤ 1µ1

.

A similar formulation of the drift theorem comes from [13].

Theorem 4.2 Let Xtt>0 be a Markov process over a set of states S, and g : S → <+

a function that assigns to every state a non-negative real number. Let the time to reachthe optimum be T := mint > 0 |g(Xt) = 0. If there exists δ > 0 such that at any timestep t > 0 and at any state Xt with g(Xt) > 0 the following condition holds:

E[g(Xt−1)− g(Xt)|g(Xt−1) > 0] ≥ δ (24)

the

E[T |X0, g(X0) > 0] ≤ g(X0)δ

. (25)

¤

Or, with a slightly different conclusion in [20]:

E(T ) ≤ E[g(X0)]δ

. (26)

To see the resemblance between theorems 3.12 and 4.2, notice first that as we set 0to be the minimum of the optimisation problem, the conditional probabilities in both(24)-(25) vanish. Second, g(X0) can be replaced by the (constant) maximal distance tothe optimum, d in our renewal process setting. Denoting µ1 := δ, and observing thatexpected value of a constant is the constant itself, inequations (25) and (26) read nowthe same as the right hand side of theorem 3.12

E(T )d

≤ 1µ1

.

In [8] a somehow different condition on the drift function is imposed, namely, there is aconstant δ > 0 such that for all n and all populations Pt,

E[d(Pt)] ≤(

1− δ

n

)d(Pt−1). (27)

This being a one-step condition, we can assume that Pt−1 is constant, and only Pt is (arandom) variable, which ensures Pt−1 = E(Pt−1) and after insertion in (27) we get

E[d(Pt)− d(Pt−1)] ≤ −δ · d(Pt−1)n

.

13

After introducing Xt and reversing the inequality we obtain

E(Xt) ≥ δ · d(Pt−1)n

,

thus a more elaborated version of the lower bound in H

E(Xt) ≥ µ1,

accounting also for the space-dimension n and for the current position Pt−1.Summing up, drift analysis provides conditions that ensure the existence of a strictly

positive lower bound on the expected one-step progress of the algorithm towards theoptimum, yielding finite upper bounds on the expected hitting time of the algorithm - allon the discrete case. Our renewal process analysis did the same, but for the continuouscase. Nota bene, lower bounds on the hitting time are also available under the newparadigm, provided the existence of a finite upper bound on the expected one-stepprogress.

5 Hypersphere Volumes and Centroids

According to Li [17], the volume of an n-dimensional hypersphere (hereafter n-sphere)of radius r is

Vn(r) = Cnrn, (28)

where the coefficient can be expressed for any n in terms of Gamma functions, but forcomputational reasons we prefer this pair of formulas:

Cn =2

n+12 π

n−12

n!!, when n is odd, and (29)

Cn =π

n2

(n2 )!

, when n is even. (30)

The function n!! in (29) is the double factorial, defined as the product of every othernumber from n down to either 2 or 1 (depending on n’s parity). One can easily expressthe double factorial in terms of the regular factorial:

(2k + 1)!! =(2k + 1)!

2kk!, (2k − 1)!! =

(2k)!2kk!

(31)

A similar formula exists for the double factorial of an even number 2k, but it is clearfrom (29) and (30) that we only need it for odd numbers.

To capture the limiting behavior of the factorials, we use Stirling’s formula [1, p.257,entry 6.1.38]

n! =√

2πn(n

e

)n[1 + O

(1n

)]≈√

2πn(n

e

)n

(32)

The incomplete Beta function is defined as [1, p.263, entry 6.6.1]

Bx(a, b) =∫ x

0

ta−1(1− t)b−1dt, 0 ≤ x ≤ 1, a > 0, b > 0 (33)

14

In [18], the following asymptotic expansion in the powers of 1/a is derived:

Bx(a, b) =xa(1− x)b−1

a

∞∑

k=0

(−1)k fk(b, x)ak

, (34)

where fk(b, x) are expressions that depend on k, b and x, but not on a:

fk(b, x) =(−1)k

(1− x)(b−1)· dk

dwk

((1− xe−w)b−1

) |w=0 (35)

It is also proved in [18] that the series above converges if a > 1, and that the error whenthe series is truncated is bounded in absolute value by the first neglected term.

For geometric derivations involving two n-spheres, we take the progress axis Ox1 topass through the centres of both spheres. To simplify the notation, we refer to all othern− 1 axes perpendicular to it (Ox2,..., Oxn) as Oxi. The x1-coordinate of the centroidof any n-volume is by definition

x =∫

x1 dV∫dV

=∫

x1A(x1) dx1∫A(x1) dx1

, (36)

where dV is the infinitesimal element of volume perpendicular to Ox1, and A(x1) is theintersection between the volume and the (hyper)plane projecting at x1. If the volumeis n-dimensional, the area A(x1) is (n− 1)-dimensional.

Following established ES literature, we denote by hi (i = 2...n) the coordinates ofthe centroid on Oxi, respectively. All hi are perpendicular to the progress axis, and,due to symmetry, all hi are equal.They are best expressed also as an integral over x1:

hi =∫

xiC(x1)A(x1) dx1∫A(x1) dx1

, ∀ i = 2 . . . n (37)

where xiC(x1) is the coordinate i of the centroid of A(x1).

5.1 Sphere ”Far-Away”: Centroid of a Quadrant

To evaluate the expected one-step progress of our EA when the starting point S isfar-away from the origin, we need the centroid of one (hyper)quadrant of the n-spherecentred at S, Figure 1. Without loss of generality, we consider the positive quadrant -i.e. the first quadrant, the part of the sphere where all coordinates are positive.

Consider an n-sphere of radius r = 1, centred at the origin. Due to symmetry,its first quadrant has a centroid whose projections on all axes are equal, i.e. x = hi,∀ i = 2 . . . n. We shall evaluate only x, the projection on Ox1. Also due to symmetry, xis the same for all quadrants with positive coordinate x1, so it is sufficient to calculatex for the entire positive semi-n-sphere.

Exact formulas

In (36), the intersection A(x1) is an (n − 1)-sphere of radius√

1− x21, according to

Pythagora’s theorem. With the notation from (28), we have

A(x1) = Vn−1(√

1− x21).

15

xx1

xi (x2, x3, ... , xn)

O

d

hiC

S

r

Figure 1: Mutation sphere far away

The integration limits are 0 and 1. Substituting all this into (36), we have

x =

∫ 1

0x1 Vn−1(

√1− x2

1)dx12Vn(1)

In the above expression, we use (28) and factor out the constant coefficients to obtain:

x =2Cn−1

Cn

∫ 1

0

x1 (1− x21)

n−12 dx1

Integration by parts shows that the integral is 1n−1 , which leads to

x =2Cn−1

(n + 1)Cn(38)

If we now allow an arbitrary radius r, the n-sphere is scaled by a factor of r. Thecentroid, being a first-order moment, also scales by r, so we simply multiply (38) by r.We change the variable on one hand to n = 2k + 1, and on the other to n = 2k, anduse the double factorial formulas (31). In the even and odd expressions obtained, wesubstitute, respectively, (29) and (30) to obtain the following exact expressions for thecentroid of the semi-n-sphere:

x = hi =(2k + 1)!

22k+1k(k!)2r, when n = 2k + 1 (39)

x = hi =22k+1(k!)2

π(2k + 1)(2k)!r, when n = 2k. (40)

The above formulas can be easily checked for the first three values of n: For n = 1,(39) gives 1/2, which is indeed the centroid of the segment [0..1]. For n = 2, (40) gives4/3π, which is the centroid of the positive half-disc of radius 1. For n = 3, (39) gives3/8, which is the centroid of the positive half-sphere of radius 1.

16

Limits for n →∞We apply Stirling’s formula (32) in (39) and (40) and find that, for n both odd and even,the limit of all coordinates of the centroid are

x = hi =r√πk

[1 + O

(1k

)]=

r√

2√πn

[1 + O

(1n

)]≈ r

√2√

πn(41)

Two cases are of particular importance:

A. If r is kept constant as n → ∞, x and all hi approach the origin as 1/√

n. Thedistance from the center S of the n-sphere to the centroid C of the quadrant(Figure 1) is the diagonal of an n-hypercube whose edge has this limit. Applynext Pythagora to obtain

Theorem 5.1 When n →∞, the distance ‖SC‖ to the centroid of a quadrant ofthe semi-n-sphere of constant radius r tends to a constant value:

limn→∞

‖SC‖ = r

√2π

(42)

B. If, on the other hand, we keep the volume of the n-sphere constant as n →∞, theradius also changes, according to (28). If Vn(r) = 1, we have that

r =1

(Cn)1n

As we did in the previous section, we use the odd-even expressions for Cn (29) and (30),we change the variable n = 2k + 1 for n odd and n = 2k for n even, use the Stirlingapproximation and take the limit n → ∞. After some work, we find that, for n bothodd and even, the limit of r is

r =

√k

πe

[1 + O

(1 +

1k

)]=

√n

2πe

[1 + O

(1 +

1n

)]≈

√n

2πe(43)

We proved the following:

Lemma 5.2 When n → ∞, the radius of an n-sphere of constant volume Vn = 1 alsotends to ∞. The growth is approximately

√n

2πe

Substituting into (41), we obtain

Proposition 5.3 When n →∞, all coordinates of the centroid of the positive quadrantof the n-sphere of volume Vn = 1 tend to the same fixed value:

limn→∞

x = limn→∞

hi =1

π√

e≈ 0.193 ∀ i = 2 . . . n (44)

17

The assumption V = 1 can now be dropped in both results 5.2 and 5.3, since

limn→∞

V 1/n = 1.

It follows that:

Theorem 5.4 When n →∞, all coordinates of the centroid of the positive quadrant ofthe n-sphere of constant volume Vn = V tend to a fixed value which is independent ofV :

limn→∞

x = limn→∞

hi =1

π√

e≈ 0.193 ∀ i = 2 . . . n (45)

When considering only one quadrant, the limit above applies individually to the coor-dinate x and each coordinate h. The distance from the center S of the n-sphere to thecentroid C of the quadrant (Figure 1) is the diagonal of an n-hypercube whose edgetends to this limit. Applying Pythagora we obtain

Theorem 5.5 When n → ∞, the distance ‖SC‖ to the centroid of a quadrant of thesemi-n-sphere of constant volume Vn = V tends to infinity:

‖SC‖ =√

n1

π√

e

[1 + O

(1n

)]≈ √

n · 0.193 (46)

5.2 Sphere ”Close-by”: Centroid of a Cap Quadrant

To evaluate the average 1-step progress of our EA when the starting point S is close tothe origin, we consider an n-sphere centered at S, where S is one radius away from theorigin. When a successful mutation is generated, the new point is closer to the origin,therefore inside the n-sphere of same radius r, centered at the origin. The intersectionof the two (equal) spheres is usually called a (symmetric) spherical lens, and it consistsof two mirror-image, back-to-back spherical caps. We consider again the restriction ofthis volume to the positive quadrant. Due to symmetry, x = r/2, all hi are equal, andthe two cap quadrants making up the lens quadrant have the same hi, so it is sufficientto calculate hi for only one cap quadrant, Figure 2.

Exact formulas

We apply formula (37) to the right-hand cap in Figure 2, for the radius r = 1. Theintersection area A(x) is the positive quadrant of an (n − 1)-sphere of radius

√1− x2

1

- Pythagora’s theorem. With the notation from (28), we have

A(x) =Vn−1(

√1− x2

1)n−1

2n−1.

To find xiC(x) (the xi-coordinate of the centroid of A(x), we note that all quadrants ofthis (n− 1)-sphere with positive coordinate xi have by symmetry the same xiC . Sincethese quadrants make up precisely a semi-(n − 1)-sphere, we apply the exact formula(38) to obtain

xiC(x) =√

1− x21 ·

2Cn−2

nCn−1∀ i = 2 . . . n

18

Ox

r

x1

xi (x2, x3, ... , xn)

S

Chi

Figure 2: Mutation sphere close-by

Substituting all this into (37) with integration limits 1/2 and 1 we obtain

hi =2Cn−2

nCn−1·

∫ 1

1/2(√

1− x2)ndx∫ 1

1/2(√

1− x2)n−1dx∀ i = 2 . . . n (47)

With the change of variable 1−x2 = t, each of the integrals above becomes an incompleteBeta function (33). Allowing for an arbitrary radius r, we obtain the following exactexpression for the centroid of the cap quadrant:

hi =2rCn−2

nCn−1· B3/4(n+2

2 , 12 )

B3/4(n+12 , 1

2 )∀ i = 2 . . . n (48)

The above formula can easily be checked for n = 2, where the y-coordinate of thecentroid of half of the circular segment of radius 1 can be calculated directly in closedform:

h2 =5

2(4π − 3√

3)= 0.339.

Limits for n →∞We apply the series expansion (34) to the Beta functions in (48), with b = 1/2 andx = 3/4. The first two coefficients in (35) are f0(b, x) = 1 and f1(b, x) = 2, which gives

√3

2[1 + O(1/n]]

for the ratio of the Betas. For the ratio Cn−2/Cn−1 we examine separately the cases neven and n odd, to find that, in both cases, the ratio is

√n√2π

[1 + O(1/n)].

19

Plugging these back into (48), we obtain

hi =r√n

√32π

[1 + O

(1n

)]∀ i = 2 . . . n (49)

As in Section 5.1, we examine two particular behaviors for the radius r, described in thetwo theorems below. The proofs are omitted for brevity, as they are parallel to thosein Section 5.1. A little more care is needed when using Pythagora’s Theorem, as thelimits of x and hi (n− 1 equal coordinates) are different.

Theorem 5.6 When n → ∞ and the radius r of the n-sphere is kept constant, thedistance ‖SC‖ to the centroid of the cap quadrant tends to a constant value:

limn→∞

‖SC‖ = r

√3 + π

2π≈ r · 0.989 (50)

The perpendicular coordinates all tend to zero:

hi =r√n

√32π

[1 + O

(1n

)]∀ i = 2 . . . n (51)

Theorem 5.7 When n → ∞, and the volume V of the n-sphere is kept constant, thedistance ‖SC‖ to the centroid of the cap quadrant tends to infinity:

‖SC‖ =√

n

√π + 6

π√

8e

[1 + O

(1√n

)]≈ √

n · 0.206 (52)

The perpendicular coordinates are all equal, tending to a fixed value which is independentof V :

hi =√

32π√

e

[1 + O

(1n

)]≈ 0.167 ∀ i = 2 . . . n (53)

The progress coordinate tends to infinity:

x =r

2=

√n√

8πe

[1 + O

(1n

)]≈ √

n · 0.121 (54)

It also follows that the volume of the cap quadrant is given by the following.

Theorem 5.8 If both n-dimensional spheres in Figure 2 have volume one, the volumeof the cap quadrant is

Vcap =Cn−1

CnB3/4

(n + 1

2,12

)(55)

which converges to zero quasi-exponentially, as n →∞,

Vcap =(

34

)n+12 4√

2 Π n≈ 0.75

n+12 · 1.596√

n. (56)

20

6 Convergence time analysis

We are going to use the centroid formulas derived in section 5 for estimating the upperand lower bounds on the expected one-step progress of the (1 + 1)EA with sphericalmutation along the ’progress axis’ Ox1 - namely µ1 and µ2 of theorem 3.12. Because ofthe symmetry of the SPHERE, we can assume without loss of generality that we rotatethe axes at each iteration such that the current EA position lies always on Ox1.

From a geometrical point of view, uniform mutation inside the sphere is moretractable than normal mutation. To see that, note the following simple fact.

Remark 6.1 The expected value of a uniform variable defined inside a figure of volume1 is the centroid (center of mass) of the corresponding figure. In case of the elitist EAon SPHERE, not all of the mutation sphere is active for next generation, the removedvolume (probability) being charged to a single point, the current position of the algo-rithm. That corresponds to a truncated volume, and the expected value in this case isthe centroid of the truncated figure, times the corresponding volume.

If we apply that to the case far-away, it yields a scaling factor 1/2 between the co-ordinates of the centroid, and the coordinates of the expected progress φ1+1. In caseclose-by, the adjustment is more severe, by the factor tending exponentially to zerow.r.t. space dimension n from Eq. (56). All these will be summed up in lemma 6.2.

On the other hand, if the mutation sphere is no longer of volume 1 - e.g., whenmutation radius r is decreased, one has to divide the uniform ’variable’ and consequentlyits expected value by the volume of the new sphere, in order for the non-unitary sphereto define a proper random variable.

Beyond geometrical considerations, there is also an algorithmic modelling issue, thatwe address in the following.

When starting the analysis, the first impulse was to consider for the expected one-step progress, the centroid of the whole active area - semi-hypersphere in Figure 1,respectively full lens in Figure 2. That would be wrong, for the centroid in that casewould lie on Ox1, ignoring completely the lateral components hi. But in n dimensions,these n− 1 lateral components are by a factor of

√n− 1 ≈ √

n larger than the centralcomponent8, which means that we expect (informally) the mutant not to lie close tothe line through the mutated individual and the optimum, but almost perpendicular toit. That is the reason for considering in section 5 only the upper hyper-quadrant of thearea when computing the centroid, yielding a representation of the lateral component(theorems 5.5, 5.7) in utter agreement with established ES literature.

We are now in the position of bringing together all the main results of the previoussections.

Lemma 6.2 If d = n and r =√

n/(2πe), the expected one-step progress on SPHEREof the (1 + 1)EA with uniform mutation satisfies, for large n,

φ1+1(far-away) > φ1+1(close-by) . (57)

Proof.8See Eq. (45)–(46) in section 5.1, or the ’deviation’ estimate E(cos α) = 0.8/

√n computed in [14].

21

Apply Eq. (1), then Eq. (45)–(46), and remark 6.1 to obtain

φ1+1(far-away) =E

d−

√(d− x

2

)2

+(

h

2

)2 =

=n−√

n2 − 2nx

2+

(x

2

)2

+( ||SC||

2

)2

−(x

2

)2

≈

≈n−√

n2 − 0.183n =0.183n

n +√

n2 − 0.183n.

A similar calculation for case close-by yields

φ1+1(close-by) =E

[r −

√(r − xVcap)

2 + (hVcap)2

]=

=2rxVcap − (||SC||Vcap)2

r +√

r2 − 2rxVcap + (||SC||Vcap)2≈

≈7.93 · 0.75n+1

2 .

One can easily see that for n large enough

0.183n

n +√

n2 − 0.183n> 7.93 · 0.75

n+12 , (58)

since the left-hand side converges to 0.091, while the right-hand side converges to zero,as n →∞. ¤

Rigorously speaking, lemma 6.2 depends on the following conjecture:

Conjecture 6.3 The progress function (1) is monotone w.r.t. d on [r,∞).

In order o prove the conjecture, one has to generalize the calculus of section 5.2 toan asymmetrical lens, since the left sphere is now larger. Finding the centroid of thenew figure analytically might be cumbersome, but there is also the variant of evaluatingnumerically the four β-integrals involved, for different values of n, as n →∞. Somethingsimilar was done in [6, p.56, Fig. 3.2], while analysing the progress rate φ versusmutation strength σ of the (1 + 1)ES on the SPHERE. Straightforward as it seems, theproof of conjecture 6.3 is still in progress.

Theorem 6.4 Let the (1+1)EA with uniform mutation inside the sphere of (constant)radius r =

√n/(2πe) minimise the n-dimensional SPHERE function. Assume the

algorithm starts at distance n from optimum. Then the expected number of steps toreach distance Θ(

√n) from optimum is in Ω(n), and in O(ec·n).

Proof.We check hypothesis H on random variable ’one-step progress of the algorithm’, confinedto distance [r,∞) from optimum. The variance of bounded uniform distribution isobviously bounded, while the confinement on the expected value is obtained due to Eq.(58) and Conjecture 6.3 as

7.93 · 0.75n+1

2 ≤ E(Xt) ≤ 0.183n

n +√

n2 − 0.183n. (59)

22

Then Theorem 3.12 ensures that, for d = n large enough

n +√

n2 − 0.183n

0.183n≤ E(Td−r)

d− r≤ 0.126 · 1.33

n+12

or, after substituting d, r, and multiplying the whole inequality by the denominator ofthe middle fraction,

(n− 0.242√

n)(n +

√n2 − 0.183n

)

0.183n≤ E(Tn−0.242

√n) ≤ 0.126(n− 0.242

√n)1.33

n+12 .

Recalling that E(Tn−0.242√

n) represents the expected number of steps required by the(1 + 1)EA starting in n to reach distance 0.242

√n from optimum, the lower and upper

bounds n and ec·n are computed from the left-, respectively right-hand side of the aboveinequality. ¤

As pointed out in section 2, the uniform mutation inside the sphere is isotropic, thustheorem 1.1 applies also to the algorithm of theorem 6.4, providing expected runtimeΩ(n) to reach a spatial gain of n−Θ(

√n) = Θ(n). So our result is in good agreement

with established EA literature.Compared to another result of Jagerskupper, showing that the adaptive, 1/5-rule

(1 + 1)EA converges in Θ(n) [14], we notice that the exponential upper bound fromtheorem 6.4 is due to the constant mutation rate assumed for our case study.

7 Conclusions

For the theoretical analysis of EAs to be of practical relevance, it should provide, besidesnecessary and sufficient conditions for global convergence, hitting time estimations forspecific algorithms on particular fitness landscapes. Both in discrete and continuousspace, significant insight into the algorithm’s behavior can be gained by mapping thesuccessive EA populations onto the infinite sequence of random variables defining astochastic process. So far, Markov processes and their variants seemed to be the perfectcandidate. Yet, there are also other options on the table and the present paper advocatesfor the renewal process, a stochastic model emerging from queueing theory.

We introduced a natural bounding assumption on the random variable that modelsthe one-step progress, H, corresponding to the practical case where the expected ad-vance of the EA is different from an iteration to another. Notice that this is also thecase with adaptive algorithms. Our approach then followed the general lines of renewaltheory, up to an estimate of the first hitting time of a distance d from the starting pointof the algorithm. The estimate is simple and highly intuitive: the expected averagenumber of iterations required per distance unit enters (when d → ∞) the interval ofinverted one-step progress bounds.

After finding good similarities with the drift analysis of discrete EAs, we testedthe efficiency of the new modeling on the (1 + 1)EA with constant, uniform mutationoptimizing the well-known SPHERE function. With extensive use of the geometricalproperties exhibited by the n-dimensional uniform distribution, we proved that thealgorithm in study exhibits a convergence time in Ω(n) and in O(ec·n).

We claim that the renewal process modeling is powerful enough to allow extensionsto more complex EAs and fitness landscapes. The analytical application of this the-ory to the algorithm with normal mutation is already in progress, while enlarging thepopulation size is another attractive option for future work.

23

Acknowledgement

This work was done while the first author was visiting the Technical University of Dort-mund. Financial support from DAAD - German Academic Exchange Service underGrant A/10/05445, and scientific support from Chair Computer Science XI are grate-fully acknowledged.

References

[1] Abramowitz, M. and Stegun, I.A. (Eds.): Handbook of Mathematical Functions.Dover, New York (1970)

[2] Agapie, A.: Theoretical analysis of mutation-adaptive evolutionary algorithms.Evolutionary Computation 9, 127–146 (2001)

[3] Agapie, A.: Estimation of Distribution Algorithms on Non-Separable Problems,International Journal of Computer Mathematics, 87(3), 491–508 (2010)

[4] Agapie, A., Agapie, M., Zbaganu, G.: Evolutionary Algorithms forContinuous Space Optimization, International Journal of System Science,DOI:10.1080/00207721.2011.605963 (2011)

[5] Auger, A.: Convergence results for the (1, λ)-SA-ES using the theory of φ-ireducibleMarkov chains. Theoretical Computer Science 334, 35-69 (2005)

[6] Beyer, H.-G.: The Theory of Evolution Strategies. Springer, Heidelberg (2001)

[7] Bienvenue, A., Francois, O.: Global convergence for evolution strategies in sphericalproblems: some simple proofs and difficulties. Theor. Comput. Sci. 306, 269–289(2003)

[8] Doerr, B., Goldberg, L.A.: Adaptive drift Analysis, in PPSN XI, Schaefer, R.,Cotta, C., Kolodziej, J. and Rudolph, G. (Eds.), LNCS 6238, Springer, Berlin51–81 (2010)

[9] Droste, S., Jansen, T., Wegener, I.: On the analysis of the (1+1) evolutionaryalgorithm. Theor. Comput. Sci. 276(1–2), 51–81 (2002)

[10] Fang, K.-T., Kotz, S., Ng, K.-W.: Symmetric Multivariate and Related Distribu-tions. Chapman and Hall, London (1990)

[11] Gut, A.: Stopped Random Walks: Limit Theorems and Applications. Springer,New York (2009)

[12] Hajek, B.: Hitting-time and occupation-time bounds implied by drift analysis withapplications. Advances in Applied Probability 13, 502–525 (1982)

[13] He, J., Yao, X.: A study of drift analysis for estimating computation time ofevolutionary algorithms. Natural Computing 3, 21–35 (2004)

[14] Jagerskupper, J.: Analysis of a simple evolutionary algorithm for minimisation inEuclidean spaces. In Proc. of 30th Int. Colloquium on Automata, Languages andProgramming (ICALP), LNCS 2719, Springer, Berlin, 1068–1079 (2003)

24

[15] Jagerskupper, J.: Lower bounds for hit-and-run direct search. In Proc of 4th Int.Symposium Stochastic Algorithms: Foundations and Applications (SAGA), LNCS4665 Springer, Berlin 118–129 (2007)

[16] Johnson, N., Kotz, S., Balakrishnan, N.: Continuous Univariate Distributions:Vol.1. Wiley, New York (1994)

[17] Li, S.: Concise Formulas for the Area and Volume of a Hyperspherical Cap. AsianJournal of Mathematics and Statistics 4(1), 66–70 (2011)

[18] Lopez, J. and Sesma, J.: Asymptotic Expansion of the Incomplete Gamma Func-tion for Large Values of the First Parameter. Integral Transforms and Special Func-tions 8(3-4), 233-236 (1999)

[19] Lovasz, L., Vempala, S.: Hit-and-run from a corner. SIAM Journal on Computing,35(4), 985–1005 (2006)

[20] Oliveto, P.S., Witt, C.: Simplified Drift Analysis for Proving Lower Bounds inEvolutionary Computation. Algorithmica, DOI: 10.1007/s00453-010-9387-z (2010)

[21] Ross, S.: Applied Probability Models with Optimization Applications. Dover, NewYork (1992)

[22] Rudolph, G.: Convergence Properties of Evolutionary Algorithms. Kovac, Ham-burg (1997)

[23] Rudolph, G.: Stochastic Convergence, in Handbook of Natural Computing, Rozen-berg, G., Back, T. and Kok, J. (Eds.) Springer, Berlin (2010)

[24] Rudolph, G.: Evolutionary Strategies, in Handbook of Natural Computing, Rozen-berg, G., Back, T. and Kok, J. (Eds.) Springer, Berlin (2010)

[25] Williams, D.: Probability with martingales. Cambrigde University Press (1991)

25

Renewing the Theory of Continuous Evolutionary Algorithms · Gheorghita Zbaganux Abstract Evolutionary algorithms (EAs) acting on continuous space require a more sophis-ticated modeling

Documents