Top Banner
Evolutionary Computation Lecture 11 Algorithm Configura4on and Theore4cal Analysis 1
45

Evolutionary Computation - USTCstaff.ustc.edu.cn › ~ketang › PPT › ECLec11.pdf · Theoretical Analysis 28 intro. to theory Markov chain problem dependency RTA analysis tools

Jun 23, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Evolutionary Computation - USTCstaff.ustc.edu.cn › ~ketang › PPT › ECLec11.pdf · Theoretical Analysis 28 intro. to theory Markov chain problem dependency RTA analysis tools

Evolutionary Computation

Lecture  11    

Algorithm  Configura4on  and  Theore4cal  Analysis

1  

Page 2: Evolutionary Computation - USTCstaff.ustc.edu.cn › ~ketang › PPT › ECLec11.pdf · Theoretical Analysis 28 intro. to theory Markov chain problem dependency RTA analysis tools

Outline

•  Algorithm Configuration

•  Theoretical Analysis

2  2  

Page 3: Evolutionary Computation - USTCstaff.ustc.edu.cn › ~ketang › PPT › ECLec11.pdf · Theoretical Analysis 28 intro. to theory Markov chain problem dependency RTA analysis tools

Algorithm Configuration

•  Question: If an EA toolbox is available (which is true), which operators/parameters shall I use?

•  Aim: To find an appropriate EA instance (configuration of EA) for an optimization task.

3  3  

Toolbo

x

Representa4on  1  Representa4on  2  Representa4on  3

Operator  1  Operator  2  Operator  3

Scheme  1  Scheme  2  Scheme  3

EA  instance

Operator  &  Parameter

Scheme  &  Parameter

Representa4on

Page 4: Evolutionary Computation - USTCstaff.ustc.edu.cn › ~ketang › PPT › ECLec11.pdf · Theoretical Analysis 28 intro. to theory Markov chain problem dependency RTA analysis tools

Algorithm Configuration

•  Configuring an EA involves setting the values of –  categorical variables (type of operators/schemes) –  numerical variables (e.g., crossover rate, population-size)

•  We refer both the categorical and numerical variables as parameter of EAs

•  More formally, Configuring an EA is another search problem, for which the best parameter vectors are searched to maximize the performance.

4  4  

Page 5: Evolutionary Computation - USTCstaff.ustc.edu.cn › ~ketang › PPT › ECLec11.pdf · Theoretical Analysis 28 intro. to theory Markov chain problem dependency RTA analysis tools

Algorithm Configuration

•  Performance measures may differ in different context: –  The best solution quality –  The time required to reach a threshold of solution quality

•  Due to the stochastic nature of EAs, a performance measure can be viewed as a random variable.

•  Multiple runs are usually required to get a reliable estimate of performance.

5  5  

Page 6: Evolutionary Computation - USTCstaff.ustc.edu.cn › ~ketang › PPT › ECLec11.pdf · Theoretical Analysis 28 intro. to theory Markov chain problem dependency RTA analysis tools

Algorithm Configuration

•  The rough idea: Generate-and-Test

•  Example 1-1: A naïve approach –  Generate p candidate parameter vectors –  Get p EA instances, each configured with a parameter vector –  Run each EA instance for r times –  In each run, n solution vectors is generated and tested –  Compare the parameter vectors using performance measures

to get the best configuration.

6  6  

Page 7: Evolutionary Computation - USTCstaff.ustc.edu.cn › ~ketang › PPT › ECLec11.pdf · Theoretical Analysis 28 intro. to theory Markov chain problem dependency RTA analysis tools

Algorithm Configuration

•  The naïve approach requires calculating the objective function f(x) for prn times. – sounds too costly.

•  Smarter approach is needed to reduce search effort. –  Reduce p –  Reduce r –  Reduce n

7  7  

Page 8: Evolutionary Computation - USTCstaff.ustc.edu.cn › ~ketang › PPT › ECLec11.pdf · Theoretical Analysis 28 intro. to theory Markov chain problem dependency RTA analysis tools

Algorithm Configuration

•  Racing procedures: An approach for reducing r.

•  Basic idea: Stop running (testing) an EA instance if there is indication that it is not promising.

•  By not promising, we mean an EA instance performs significantly worse than the best EA instance identified so far.

•  EA instances are compared after each run rather than after running all of them for r times.

8  8  

Page 9: Evolutionary Computation - USTCstaff.ustc.edu.cn › ~ketang › PPT › ECLec11.pdf · Theoretical Analysis 28 intro. to theory Markov chain problem dependency RTA analysis tools

Algorithm Configuration

•  Example 1-2:

•  Many parameter vectors may be removed without running r times. 9  9  

1.  Generate p candidate parameter vectors

2.  Set E as the set of p EA instances

3.  REPEAT: •  Run each EA instance once and update its (average) performance •  Identify the EA instance with the best performance •  Remove from E the EA instances whose performance is

significantly worse than the best EA instance

4.  Until size(E)=1 or the number of runs reaches r

Page 10: Evolutionary Computation - USTCstaff.ustc.edu.cn › ~ketang › PPT › ECLec11.pdf · Theoretical Analysis 28 intro. to theory Markov chain problem dependency RTA analysis tools

Algorithm Configuration

•  There are a few statistical tests that can be used to indicate significant difference in performance. –  Analysis of Variance (ANOVA). –  Kruskal-Wallis (Rank Based Analysis of Variance). –  Hoeffding’s bound. –  Unpaired Student T-Test.

10  10  

Page 11: Evolutionary Computation - USTCstaff.ustc.edu.cn › ~ketang › PPT › ECLec11.pdf · Theoretical Analysis 28 intro. to theory Markov chain problem dependency RTA analysis tools

Algorithm Configuration

•  Racing procedure may not be sufficient in case: –  p is very large –  The set of candidate parameter vectors is infinite

•  Basic idea: To apply an iterative search procedure to the parameter space. –  Meta-EA: using EAs to search for the best configuration –  Iterative local search

•  Hopefully, the best configuration can be found with less trial parameter vectors.

11  11  

Page 12: Evolutionary Computation - USTCstaff.ustc.edu.cn › ~ketang › PPT › ECLec11.pdf · Theoretical Analysis 28 intro. to theory Markov chain problem dependency RTA analysis tools

Algorithm Configuration

•  Example 1-3:

12  12  

1.  Generate p’ candidate parameter vectors (p’<p)

2.  Get p’ EA instances

3.  REPEAT

•  Run each EA instance for r times and assess its performance •  Generate another p’ parameter vectors

4.  Until halting criterion is satisfied

Page 13: Evolutionary Computation - USTCstaff.ustc.edu.cn › ~ketang › PPT › ECLec11.pdf · Theoretical Analysis 28 intro. to theory Markov chain problem dependency RTA analysis tools

Algorithm Configuration

•  Iterative search can be combined with a racing procedure to reduce both p and r .

•  Example 1-4:

13  13  

1.  Generate p’ candidate parameter vectors (p’<p)

2.  Get p’ EA instances

3.  REPEAT

•  Applying Racing procedure to compare performance of EA instances •  Generate another p’ parameter vectors

4.  Until halting criterion is satisfied

Page 14: Evolutionary Computation - USTCstaff.ustc.edu.cn › ~ketang › PPT › ECLec11.pdf · Theoretical Analysis 28 intro. to theory Markov chain problem dependency RTA analysis tools

Algorithm Configuration

•  Further enhancement: Modeling the landscape of performance measures –  For identifying promising parameter vectors –  For filtering out unpromising parameter vectors

•  The modeling task is a supervised learning problem –  Input data: parameter vectors –  Target: performance

14  14  

Page 15: Evolutionary Computation - USTCstaff.ustc.edu.cn › ~ketang › PPT › ECLec11.pdf · Theoretical Analysis 28 intro. to theory Markov chain problem dependency RTA analysis tools

Algorithm Configuration

•  Example 1-5 (Sequential Parameter Optimization):

15  15  

1.  Generate p’ candidate parameter vectors (p’<p)

2.  Get p’ EA instances

3.  Run each EA instance for r times and assess its performance

4.  Build a model M to approximate the performance landscape

5.  REPEAT

•  Generate another p’ parameter vectors •  Identify the most promising parameter vectors with M •  Only the promising parameter vectors are run for r times •  Update M with the newly “assessed” parameter vectors

6.  Until halting criterion is satisfied

Page 16: Evolutionary Computation - USTCstaff.ustc.edu.cn › ~ketang › PPT › ECLec11.pdf · Theoretical Analysis 28 intro. to theory Markov chain problem dependency RTA analysis tools

Algorithm Configuration

•  Remark on reducing n –  The key question involved: Will the “relative order” of two

EAs maintain unchanged during the evolution?

–  The lesson learned: the answer is problem and algorithm dependent, and thus few systematic approach have been investigated for this issue.

16  16  

0 1000 2000 3000 4000 50000.2

0.4

0.6

0.8

p

0 1000 2000 3000 4000 50000.4

0.5

0.6

0.7

fp

0 1000 2000 3000 4000 50000.4

0.6

0.8

1

f5

CRm

0 500 1000 15000.4

0.6

0.8

0 500 1000 15000.4

0.6

0.8

0 500 1000 15000

0.5

1

f9

p

fp

CRm

0 500 1000 1500 2000 2500 30000

0.5

1

0 500 1000 1500 2000 2500 30000.5

0.6

0.7

0 500 1000 1500 2000 2500 30000.4

0.6

0.8

1

fcec5

p

fp

CRm

0 500 1000 1500 2000 2500 30000.4

0.6

0.8

0 500 1000 1500 2000 2500 30000.5

0.6

0.7

0 500 1000 1500 2000 2500 30000

0.5

1

fcec9

p

fp

CRm

Fig. 2. The self-adaptation curves of p, fp and CRm for f5, f9, fcec5,and fcec9. On the vertical axes are shown their values (between 0 and 1),while on the horizontal axes are shown the number of generations.

0 1000 2000 3000 4000 500010

!30

10!25

10!20

10!15

10!10

10!5

100

105

1010

f5

SaNSDESaDENSDE

0 500 1000 150010

!8

10!6

10!4

10!2

100

102

104

f9

SaNSDESaDENSDE

0 500 1000 1500 2000 2500 300010

!1

100

101

102

103

104

105

fcec5

SaNSDESaDENSDE

0 500 1000 1500 2000 2500 300010

!20

10!15

10!10

10!5

100

105

fcec9

SaNSDESaDENSDE

Fig. 3. The evolution curves for f5, f9, fcec5 and fcec9. The verticalaxes show the distance to the optimum and the horizontal axes show thenumber of generations.

1114 2008 IEEE Congress on Evolutionary Computation (CEC 2008)

Yes:  Blue  curve  vs.  Green  Curve    No:  Blue  Curve  vs.  Red  Curve

Page 17: Evolutionary Computation - USTCstaff.ustc.edu.cn › ~ketang › PPT › ECLec11.pdf · Theoretical Analysis 28 intro. to theory Markov chain problem dependency RTA analysis tools

Algorithm Configuration

•  So far, we mainly touched the case of configuring EA on a single optimization task (problem instance)

•  What if we want to find a more general configuration that can be

applied to multiple problem instances? •  Previously introduced ideas can be adapted to this scenario. •  Instead of looking at the results of r runs, comparisons are made

mainly based on the performance on multiple problem instances. –  F-RACE –  ParamILS

17  17  

Page 18: Evolutionary Computation - USTCstaff.ustc.edu.cn › ~ketang › PPT › ECLec11.pdf · Theoretical Analysis 28 intro. to theory Markov chain problem dependency RTA analysis tools

Algorithm Configuration

•  Additional Remarks –  Operators/parameters can also be adjusted on-the-fly. This

type of EAs are usually called adaptive/self-adaptive EAs.

–  With a unified “representation” of EA’s behaviors, there will huge room for involving machine learning techniques.

–  Algorithm configuration is also closely related to the emerging terminology “hyper-heuristic”.

18  18  

Page 19: Evolutionary Computation - USTCstaff.ustc.edu.cn › ~ketang › PPT › ECLec11.pdf · Theoretical Analysis 28 intro. to theory Markov chain problem dependency RTA analysis tools

Outline

•  Algorithm Configuration

•  Theoretical Analysis

本部分内容取自南京大学俞扬博士IJCAI’2013报告的PPT,特此表示感谢。

19  19  

Page 20: Evolutionary Computation - USTCstaff.ustc.edu.cn › ~ketang › PPT › ECLec11.pdf · Theoretical Analysis 28 intro. to theory Markov chain problem dependency RTA analysis tools

Theoretical Analysis

20  20  

intro. to theory Markov chain problem dependency RTA analysis tools on parameters on comparison with classics on real-world situations

An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances

summary

Conventional algorithm analysis

ProblemAlgorithm

Measurement

Sorting

Shortest Path

Linear Programming

Quick Sort average time complexity

Dijkstra’s algorithm

O(n log n)

O(|V |2)average time complexity

Simplex

worst case time complexity: exponentialsmoothed complexity: polynomial

Page 21: Evolutionary Computation - USTCstaff.ustc.edu.cn › ~ketang › PPT › ECLec11.pdf · Theoretical Analysis 28 intro. to theory Markov chain problem dependency RTA analysis tools

Theoretical Analysis

21  21  

intro. to theory Markov chain problem dependency RTA analysis tools on parameters on comparison with classics on real-world situations

An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances

summary

Time complexity

What about an algorithm sorts (5,4,2,8,9) in 3 steps?

measured in a class of problem instances

measure the growing rate as the problem size increases

e.g. all possible arrays of 5 numbers

e.g.

average complexityworst case complexity

2n2

asymptotical notation O(n2)

Page 22: Evolutionary Computation - USTCstaff.ustc.edu.cn › ~ketang › PPT › ECLec11.pdf · Theoretical Analysis 28 intro. to theory Markov chain problem dependency RTA analysis tools

Theoretical Analysis

22  22  

intro. to theory Markov chain problem dependency RTA analysis tools on parameters on comparison with classics on real-world situations

An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances

summary

But for EAs...

ProblemAlgorithm

Measurement

nature phenomena

problem unknownnot designed with knowledge of problems

theoretical understanding is even more important

Page 23: Evolutionary Computation - USTCstaff.ustc.edu.cn › ~ketang › PPT › ECLec11.pdf · Theoretical Analysis 28 intro. to theory Markov chain problem dependency RTA analysis tools

Theoretical Analysis

23  23  

intro. to theory Markov chain problem dependency RTA analysis tools

An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances

on parameters on comparison with classics on real-world situations summary

Markov chain modeling

A general procedure: initialization

population

evaluation & selection

offspring

reproduction

population 0

expand along time:

population 1

population 2

population 3

...

Markov chain:

state 0

state 1

state 2

state 3

...

Page 24: Evolutionary Computation - USTCstaff.ustc.edu.cn › ~ketang › PPT › ECLec11.pdf · Theoretical Analysis 28 intro. to theory Markov chain problem dependency RTA analysis tools

Theoretical Analysis

24  24  

intro. to theory Markov chain problem dependency RTA analysis tools

An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances

on parameters on comparison with classics on real-world situations summary

Markov chain:

state 0

state 1

state 2

state 3

...

P (⇠3 | ⇠2, ⇠1, ⇠0) = P (⇠3 | ⇠2)

⇠0 ⇠1 ⇠2 ⇠3

Markov property

solution space

optimal solutions

S⇤S

population space

optimal populations

X ⇤X = Sm

involves at least one optimal solution

Page 25: Evolutionary Computation - USTCstaff.ustc.edu.cn › ~ketang › PPT › ECLec11.pdf · Theoretical Analysis 28 intro. to theory Markov chain problem dependency RTA analysis tools

Theoretical Analysis

25  25  

intro. to theory Markov chain problem dependency RTA analysis tools

An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances

on parameters on comparison with classics on real-world situations summary

Let ⇠ be a Markov chain. Define

t

=

X

x/2X

P (⇠

t+1 2 X ⇤ | ⇠t

= x)P (⇠

t

= x)�X

x2X

P (⇠

t+1 /2 X ⇤ | ⇠t

= x)P (⇠

t

= x).

Then ⇠ converges to X ⇤if and only if ↵ satisfies:

Convergence

Does an EA converge to the global optimal solutions?lim

t!+1P (⇠t 2 X ⇤) = 1

Considered as closed:

Theorem: (discrete version derived from [He & Yu, 01])

gain of optimality in one steploss of optimality in one step

P (⇠0 2 X ⇤) ++1X

t=0

↵t = 1

Page 26: Evolutionary Computation - USTCstaff.ustc.edu.cn › ~ketang › PPT › ECLec11.pdf · Theoretical Analysis 28 intro. to theory Markov chain problem dependency RTA analysis tools

Theoretical Analysis

26  26  

intro. to theory Markov chain problem dependency RTA analysis tools

An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances

on parameters on comparison with classics on real-world situations summary

Convergence

Does an EA converge to the global optimal solutions?lim

t!+1P (⇠t 2 X ⇤) = 1

An EA that1. uses global operators2. preserves the best solutionalways converges to the optimal solutions

But life is limited! How fast does it converge?

Considered as closed:

gain of optimality > 0loss of optimality = 0

Page 27: Evolutionary Computation - USTCstaff.ustc.edu.cn › ~ketang › PPT › ECLec11.pdf · Theoretical Analysis 28 intro. to theory Markov chain problem dependency RTA analysis tools

Theoretical Analysis

27  27  

intro. to theory Markov chain problem dependency RTA analysis tools

An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances

on parameters on comparison with classics on real-world situations summary

Examples in simple cases

ProblemAlgorithm

Measurement

OneMax

(1+1)-EALeadingOnes Expected Running Time(ERT)

Linear Pseudo-Boolean Functions

LongPath

...

Page 28: Evolutionary Computation - USTCstaff.ustc.edu.cn › ~ketang › PPT › ECLec11.pdf · Theoretical Analysis 28 intro. to theory Markov chain problem dependency RTA analysis tools

Theoretical Analysis

28  28  

intro. to theory Markov chain problem dependency RTA analysis tools

An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances

on parameters on comparison with classics on real-world situations summary

Running time analysis

Running time of an EA:the number of solutions evaluated until reaching an optimal solution of the given problem for the first time

the most time consuming stepmay meet many times

Running time analysis:running time with respect to the problem size (e.g. n)the expected running time/ERT

ERT with high probabilitye.g. expected running timeO(n2)

e.g. expected running time with probability at leastO(n lnn) 1� 1

2n

computational complexity

Page 29: Evolutionary Computation - USTCstaff.ustc.edu.cn › ~ketang › PPT › ECLec11.pdf · Theoretical Analysis 28 intro. to theory Markov chain problem dependency RTA analysis tools

Theoretical Analysis

29  29  

intro. to theory Markov chain problem dependency RTA analysis tools

An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances

on parameters on comparison with classics on real-world situations summary

(1+1)-EA

1: s a randomly drawn solution from X2: for t=1,2,. . . do

3: s0 mutate(s)4: if f(s0) � f(s) then5: s s0

6: end if

7: terminate if meets a stopping criterion

8: end for

A simple EA: (1+1)-EA

An extremely simplified EAmissing some features of real EAs

one-bit mutation

bitwise mutation

randomly choose one bit and change its value

change every bit with some probability (e.g. )

for maximization, allow neutral changes

find an optimal solution

no population

no crossover

1

n

Page 30: Evolutionary Computation - USTCstaff.ustc.edu.cn › ~ketang › PPT › ECLec11.pdf · Theoretical Analysis 28 intro. to theory Markov chain problem dependency RTA analysis tools

Theoretical Analysis

30  30  

intro. to theory Markov chain problem dependency RTA analysis tools

An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances

on parameters on comparison with classics on real-world situations summary

OneMax Problem: count the number of 1 bits

f(x) =nX

i=1

xi

EAs do not have the knowledge of the problems only able to call f (x)no difference with any other functions f : {0, 1}n ! R

not only optimizing the problem, but also guessing the problem

argmax

x2{0,1}n

nX

i=1

x

i

fitness:

Page 31: Evolutionary Computation - USTCstaff.ustc.edu.cn › ~ketang › PPT › ECLec11.pdf · Theoretical Analysis 28 intro. to theory Markov chain problem dependency RTA analysis tools

Theoretical Analysis

31  31  

intro. to theory Markov chain problem dependency RTA analysis tools

An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances

on parameters on comparison with classics on real-world situations summary

OneMax: f(x) =nX

i=1

xi

(1+1)-EA with bitwise mutation (flip each bit with probability ):

the probability of flipping i particular bits: (1

n)i(1� 1

n)n�i

1

n

0 1 2 3 4 5 6 7 8 9 100

0.05

0.1

0.15

0.2

0.25

0.3

0.35

i

prob

abili

ty

monotonically decreasing

but always positive

Page 32: Evolutionary Computation - USTCstaff.ustc.edu.cn › ~ketang › PPT › ECLec11.pdf · Theoretical Analysis 28 intro. to theory Markov chain problem dependency RTA analysis tools

Theoretical Analysis

32  32  

intro. to theory Markov chain problem dependency RTA analysis tools

An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances

on parameters on comparison with classics on real-world situations summary

OneMax: f(x) =nX

i=1

xi

the solutions with the same number of 1-bits share the same f valuesolutions with

0 1-bitssolutions with

1 1-bitssolutions with

2 1-bitssolutions with

n 1-bits

many transitions

......S0 = S⇤S1 S2 Sm

(1+1)-EA with bitwise mutation (flip each bit with probability ):1

n

Page 33: Evolutionary Computation - USTCstaff.ustc.edu.cn › ~ketang › PPT › ECLec11.pdf · Theoretical Analysis 28 intro. to theory Markov chain problem dependency RTA analysis tools

Theoretical Analysis

33  33  

intro. to theory Markov chain problem dependency RTA analysis tools

An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances

on parameters on comparison with classics on real-world situations summary

OneMax: f(x) =nX

i=1

xi

the solutions with the same number of 1-bits share the same f valuesolutions with

0 1-bitssolutions with

1 1-bitssolutions with

2 1-bitssolutions with

n 1-bits

an upper bound: a path visits all subspaces

......S0 = S⇤S1 S2 Sm

(1+1)-EA with bitwise mutation (flip each bit with probability ):1

n

Page 34: Evolutionary Computation - USTCstaff.ustc.edu.cn › ~ketang › PPT › ECLec11.pdf · Theoretical Analysis 28 intro. to theory Markov chain problem dependency RTA analysis tools

Theoretical Analysis

34  34  

intro. to theory Markov chain problem dependency RTA analysis tools

An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances

on parameters on comparison with classics on real-world situations summary

OneMax: f(x) =nX

i=1

xi

the solutions with the same number of 1-bits share the same f valuesolutions with

0 1-bitssolutions with

1 1-bitssolutions with

2 1-bitssolutions with

n 1-bits

p = n(1

n)(n� 1

n)n�1

......S0 = S⇤S1 S2 Sm

(1+1)-EA with bitwise mutation (flip each bit with probability ):1

n

Page 35: Evolutionary Computation - USTCstaff.ustc.edu.cn › ~ketang › PPT › ECLec11.pdf · Theoretical Analysis 28 intro. to theory Markov chain problem dependency RTA analysis tools

Theoretical Analysis

35  35  

intro. to theory Markov chain problem dependency RTA analysis tools

An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances

on parameters on comparison with classics on real-world situations summary

OneMax: f(x) =nX

i=1

xi

the solutions with the same number of 1-bits share the same f valuesolutions with

0 1-bitssolutions with

1 1-bitssolutions with

2 1-bitssolutions with

n 1-bits

p = n(1

n)(n� 1

n)n�1

p �✓n� 1

1

◆(1

n)(n� 1

n)n�1

......S0 = S⇤S1 S2 Sm

(1+1)-EA with bitwise mutation (flip each bit with probability ):1

n

Page 36: Evolutionary Computation - USTCstaff.ustc.edu.cn › ~ketang › PPT › ECLec11.pdf · Theoretical Analysis 28 intro. to theory Markov chain problem dependency RTA analysis tools

Theoretical Analysis

36  36  

intro. to theory Markov chain problem dependency RTA analysis tools

An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances

on parameters on comparison with classics on real-world situations summary

OneMax: f(x) =nX

i=1

xi

the solutions with the same number of 1-bits share the same f valuesolutions with

0 1-bitssolutions with

1 1-bitssolutions with

2 1-bitssolutions with

n 1-bits

p = n(1

n)(n� 1

n)n�1

p �✓n� 1

1

◆(1

n)(n� 1

n)n�1

p �✓n� i

1

◆(1

n)(n� 1

n)n�1

......S0 = S⇤S1 S2 Sm

(1+1)-EA with bitwise mutation (flip each bit with probability ):1

n

Page 37: Evolutionary Computation - USTCstaff.ustc.edu.cn › ~ketang › PPT › ECLec11.pdf · Theoretical Analysis 28 intro. to theory Markov chain problem dependency RTA analysis tools

Theoretical Analysis

37  37  

intro. to theory Markov chain problem dependency RTA analysis tools

An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances

on parameters on comparison with classics on real-world situations summary

OneMax: f(x) =nX

i=1

xi

probability of transition

expected #steps the transition happens 1

n� i· n · (1 + 1

n� 1)n�1 ⇠ 1

n� i· n · e

p �✓n� i

1

◆(1

n)(n� 1

n)n�1

summed up

ERT upper bound O(n lnn)

n�1X

i=0

en

i= enHn ⇠ en lnn

(1+1)-EA with bitwise mutation (flip each bit with probability ):1

n

Page 38: Evolutionary Computation - USTCstaff.ustc.edu.cn › ~ketang › PPT › ECLec11.pdf · Theoretical Analysis 28 intro. to theory Markov chain problem dependency RTA analysis tools

Theoretical Analysis

38  38  

intro. to theory Markov chain problem dependency RTA analysis tools

An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances

on parameters on comparison with classics on real-world situations summary

General analysis tools

running time analysis is commonly problem specific

going to derive the ERT of an EA in a problem

“where to look” “what to calculate”

need a guide to tell what to look and what to follow to accomplish the analysis

- Fitness Level Method- Drift Analysis- Convergence-rate Based Method

Page 39: Evolutionary Computation - USTCstaff.ustc.edu.cn › ~ketang › PPT › ECLec11.pdf · Theoretical Analysis 28 intro. to theory Markov chain problem dependency RTA analysis tools

Theoretical Analysis

39  39  

intro. to theory Markov chain problem dependency RTA analysis tools

An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances

on parameters on comparison with classics on real-world situations summary

Fitness Level Method

solution space

partition the solution space into subspaces ...

with increasing fitness

= S⇤

S1

S2

Sm

[Wegener, 02] a population is treated as the best solution in the population

1. initialization probability of being in each subspace ⇡0(Si)

Then calculate:

Note that elitist (i.e. never lose the best solution) EAs select solutions with better fitness. The level

sets, intuitively, form stairs, for which an upper bound can be derived by summing up the time

taken for getting off every stair, and a lower bound is the minimum time of getting off a stair. This is

formally described in Lemma 2.

Lemma 2 (Fitness Level Method [39])

For an elitist EA process ⇠ on a problem f , letS1

, ...,Sm be a<f -partition, let vi P (⇠t+1

2 [mj=i+1

Sj |

⇠t = x) for all x 2 Si, and ui � P (⇠t+1

2 [mj=i+1

Sj | ⇠t = x) for all x 2 Si. Then, the DCFHT of the EA

process is at most

X

1im�1

⇡0

(Si) ·m�1X

j=i

1

vj,

and is at least

X

1im�1

⇡0

(Si) ·1

ui.

Later on, more elaborated fitness level method was discovered by Sudholt [36], which we call as

the refined fitness level method in this paper as in Lemma 3. In the lemma, when the EA uses a

population of solutions, the notation ⇠ will denote that the best solution of the population is in the

solution space S.

Lemma 3 (Refined Fitness Level Method [35, 36])

For an elitist EA process ⇠ with a fitness function f , letS1

, ...,Sm be a<f -partition, let vi minj1

�i,j

P (⇠t+1

2

Sj | ⇠t = x) and ui � maxj1

�i,j

P (⇠t+1

2 Sj | ⇠t = x) for all x 2 Si wherePm

j=i+1

�i,j = 1, and let

�u,�l 2 [0, 1] be constants such that �u � �i,j/Pm

k=j �i,k � �l for all j > i, �u � 1 � vj+1

/vj and

�l � 1� uj+1

/uj for all 1 j m� 2. Then the DCFHT of the process is at most

m�1X

i=1

⇡0

(Si) ·�1

vi+ �u

m�1X

j=i+1

1

vj

�,

and the DCFHT of the process is at least

m�1X

i=1

⇡0

(Si) · (1

ui+ �l

m�1X

j=i+1

1

uj).

The refined fitness level method follows the general idea of the fitness level method, while intro-

duces a variable � that reflects the distribution of the probability that the EA jumps to better levels.

When � is small, the EA has a high probability to jump across many levels and thus make a large

progress; when � is large, the EA can only take a small progress in every step. Obviously, � can take

1 for upper bounds and 0 for lower bounds, which degrades the refined method to be the original

fitness level method. Therefore, the original fitness level method is a special case of the refined one.

20

Note that elitist (i.e. never lose the best solution) EAs select solutions with better fitness. The level

sets, intuitively, form stairs, for which an upper bound can be derived by summing up the time

taken for getting off every stair, and a lower bound is the minimum time of getting off a stair. This is

formally described in Lemma 2.

Lemma 2 (Fitness Level Method [39])

For an elitist EA process ⇠ on a problem f , letS1

, ...,Sm be a<f -partition, let vi P (⇠t+1

2 [mj=i+1

Sj |

⇠t = x) for all x 2 Si, and ui � P (⇠t+1

2 [mj=i+1

Sj | ⇠t = x) for all x 2 Si. Then, the DCFHT of the EA

process is at most

X

1im�1

⇡0

(Si) ·m�1X

j=i

1

vj,

and is at least

X

1im�1

⇡0

(Si) ·1

ui.

Later on, more elaborated fitness level method was discovered by Sudholt [36], which we call as

the refined fitness level method in this paper as in Lemma 3. In the lemma, when the EA uses a

population of solutions, the notation ⇠ will denote that the best solution of the population is in the

solution space S.

Lemma 3 (Refined Fitness Level Method [35, 36])

For an elitist EA process ⇠ with a fitness function f , letS1

, ...,Sm be a<f -partition, let vi minj1

�i,j

P (⇠t+1

2

Sj | ⇠t = x) and ui � maxj1

�i,j

P (⇠t+1

2 Sj | ⇠t = x) for all x 2 Si wherePm

j=i+1

�i,j = 1, and let

�u,�l 2 [0, 1] be constants such that �u � �i,j/Pm

k=j �i,k � �l for all j > i, �u � 1 � vj+1

/vj and

�l � 1� uj+1

/uj for all 1 j m� 2. Then the DCFHT of the process is at most

m�1X

i=1

⇡0

(Si) ·�1

vi+ �u

m�1X

j=i+1

1

vj

�,

and the DCFHT of the process is at least

m�1X

i=1

⇡0

(Si) · (1

ui+ �l

m�1X

j=i+1

1

uj).

The refined fitness level method follows the general idea of the fitness level method, while intro-

duces a variable � that reflects the distribution of the probability that the EA jumps to better levels.

When � is small, the EA has a high probability to jump across many levels and thus make a large

progress; when � is large, the EA can only take a small progress in every step. Obviously, � can take

1 for upper bounds and 0 for lower bounds, which degrades the refined method to be the original

fitness level method. Therefore, the original fitness level method is a special case of the refined one.

20

Note that elitist (i.e. never lose the best solution) EAs select solutions with better fitness. The level

sets, intuitively, form stairs, for which an upper bound can be derived by summing up the time

taken for getting off every stair, and a lower bound is the minimum time of getting off a stair. This is

formally described in Lemma 2.

Lemma 2 (Fitness Level Method [39])

For an elitist EA process ⇠ on a problem f , letS1

, ...,Sm be a<f -partition, let vi P (⇠t+1

2 [mj=i+1

Sj |

⇠t = x) for all x 2 Si, and ui � P (⇠t+1

2 [mj=i+1

Sj | ⇠t = x) for all x 2 Si. Then, the DCFHT of the EA

process is at most

X

1im�1

⇡0

(Si) ·m�1X

j=i

1

vj,

and is at least

X

1im�1

⇡0

(Si) ·1

ui.

Later on, more elaborated fitness level method was discovered by Sudholt [36], which we call as

the refined fitness level method in this paper as in Lemma 3. In the lemma, when the EA uses a

population of solutions, the notation ⇠ will denote that the best solution of the population is in the

solution space S.

Lemma 3 (Refined Fitness Level Method [35, 36])

For an elitist EA process ⇠ with a fitness function f , letS1

, ...,Sm be a<f -partition, let vi minj1

�i,j

P (⇠t+1

2

Sj | ⇠t = x) and ui � maxj1

�i,j

P (⇠t+1

2 Sj | ⇠t = x) for all x 2 Si wherePm

j=i+1

�i,j = 1, and let

�u,�l 2 [0, 1] be constants such that �u � �i,j/Pm

k=j �i,k � �l for all j > i, �u � 1 � vj+1

/vj and

�l � 1� uj+1

/uj for all 1 j m� 2. Then the DCFHT of the process is at most

m�1X

i=1

⇡0

(Si) ·�1

vi+ �u

m�1X

j=i+1

1

vj

�,

and the DCFHT of the process is at least

m�1X

i=1

⇡0

(Si) · (1

ui+ �l

m�1X

j=i+1

1

uj).

The refined fitness level method follows the general idea of the fitness level method, while intro-

duces a variable � that reflects the distribution of the probability that the EA jumps to better levels.

When � is small, the EA has a high probability to jump across many levels and thus make a large

progress; when � is large, the EA can only take a small progress in every step. Obviously, � can take

1 for upper bounds and 0 for lower bounds, which degrades the refined method to be the original

fitness level method. Therefore, the original fitness level method is a special case of the refined one.

20

2. bounds of progress probability for :

Note that elitist (i.e. never lose the best solution) EAs select solutions with better fitness. The level

sets, intuitively, form stairs, for which an upper bound can be derived by summing up the time

taken for getting off every stair, and a lower bound is the minimum time of getting off a stair. This is

formally described in Lemma 2.

Lemma 2 (Fitness Level Method [39])

For an elitist EA process ⇠ on a problem f , letS1

, ...,Sm be a<f -partition, let vi P (⇠t+1

2 [mj=i+1

Sj |

⇠t = x) for all x 2 Si, and ui � P (⇠t+1

2 [mj=i+1

Sj | ⇠t = x) for all x 2 Si. Then, the DCFHT of the EA

process is at most

X

1im�1

⇡0

(Si) ·m�1X

j=i

1

vj,

and is at least

X

1im�1

⇡0

(Si) ·1

ui.

Later on, more elaborated fitness level method was discovered by Sudholt [36], which we call as

the refined fitness level method in this paper as in Lemma 3. In the lemma, when the EA uses a

population of solutions, the notation ⇠ will denote that the best solution of the population is in the

solution space S.

Lemma 3 (Refined Fitness Level Method [35, 36])

For an elitist EA process ⇠ with a fitness function f , letS1

, ...,Sm be a<f -partition, let vi minj1

�i,j

P (⇠t+1

2

Sj | ⇠t = x) and ui � maxj1

�i,j

P (⇠t+1

2 Sj | ⇠t = x) for all x 2 Si wherePm

j=i+1

�i,j = 1, and let

�u,�l 2 [0, 1] be constants such that �u � �i,j/Pm

k=j �i,k � �l for all j > i, �u � 1 � vj+1

/vj and

�l � 1� uj+1

/uj for all 1 j m� 2. Then the DCFHT of the process is at most

m�1X

i=1

⇡0

(Si) ·�1

vi+ �u

m�1X

j=i+1

1

vj

�,

and the DCFHT of the process is at least

m�1X

i=1

⇡0

(Si) · (1

ui+ �l

m�1X

j=i+1

1

uj).

The refined fitness level method follows the general idea of the fitness level method, while intro-

duces a variable � that reflects the distribution of the probability that the EA jumps to better levels.

When � is small, the EA has a high probability to jump across many levels and thus make a large

progress; when � is large, the EA can only take a small progress in every step. Obviously, � can take

1 for upper bounds and 0 for lower bounds, which degrades the refined method to be the original

fitness level method. Therefore, the original fitness level method is a special case of the refined one.

20

Note that elitist (i.e. never lose the best solution) EAs select solutions with better fitness. The level

sets, intuitively, form stairs, for which an upper bound can be derived by summing up the time

taken for getting off every stair, and a lower bound is the minimum time of getting off a stair. This is

formally described in Lemma 2.

Lemma 2 (Fitness Level Method [39])

For an elitist EA process ⇠ on a problem f , letS1

, ...,Sm be a<f -partition, let vi P (⇠t+1

2 [mj=i+1

Sj |

⇠t = x) for all x 2 Si, and ui � P (⇠t+1

2 [mj=i+1

Sj | ⇠t = x) for all x 2 Si. Then, the DCFHT of the EA

process is at most

X

1im�1

⇡0

(Si) ·m�1X

j=i

1

vj,

and is at least

X

1im�1

⇡0

(Si) ·1

ui.

Later on, more elaborated fitness level method was discovered by Sudholt [36], which we call as

the refined fitness level method in this paper as in Lemma 3. In the lemma, when the EA uses a

population of solutions, the notation ⇠ will denote that the best solution of the population is in the

solution space S.

Lemma 3 (Refined Fitness Level Method [35, 36])

For an elitist EA process ⇠ with a fitness function f , letS1

, ...,Sm be a<f -partition, let vi minj1

�i,j

P (⇠t+1

2

Sj | ⇠t = x) and ui � maxj1

�i,j

P (⇠t+1

2 Sj | ⇠t = x) for all x 2 Si wherePm

j=i+1

�i,j = 1, and let

�u,�l 2 [0, 1] be constants such that �u � �i,j/Pm

k=j �i,k � �l for all j > i, �u � 1 � vj+1

/vj and

�l � 1� uj+1

/uj for all 1 j m� 2. Then the DCFHT of the process is at most

m�1X

i=1

⇡0

(Si) ·�1

vi+ �u

m�1X

j=i+1

1

vj

�,

and the DCFHT of the process is at least

m�1X

i=1

⇡0

(Si) · (1

ui+ �l

m�1X

j=i+1

1

uj).

The refined fitness level method follows the general idea of the fitness level method, while intro-

duces a variable � that reflects the distribution of the probability that the EA jumps to better levels.

When � is small, the EA has a high probability to jump across many levels and thus make a large

progress; when � is large, the EA can only take a small progress in every step. Obviously, � can take

1 for upper bounds and 0 for lower bounds, which degrades the refined method to be the original

fitness level method. Therefore, the original fitness level method is a special case of the refined one.

20

the ERT is then upper bounded by: and lower bounded by:

x 2 Si

Page 40: Evolutionary Computation - USTCstaff.ustc.edu.cn › ~ketang › PPT › ECLec11.pdf · Theoretical Analysis 28 intro. to theory Markov chain problem dependency RTA analysis tools

Theoretical Analysis

40  40  

intro. to theory Markov chain problem dependency RTA analysis tools

An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances

on parameters on comparison with classics on real-world situations summary

Example in OneMax

solutions with 0 1-bit

solutions with 1 1-bit

solutions with 2 1-bits

solutions with n 1-bits

...... = S⇤S1 S2 Sm

partition

progress probability for :x 2 Si

a lower bound: flipping one 0-bit but no 1-bits:✓n� i

1

◆(1

n)(n� 1

n)n�1

⇡0(Si) =

�ni

2ninitialization distribution:

ERT:

Note that elitist (i.e. never lose the best solution) EAs select solutions with better fitness. The level

sets, intuitively, form stairs, for which an upper bound can be derived by summing up the time

taken for getting off every stair, and a lower bound is the minimum time of getting off a stair. This is

formally described in Lemma 2.

Lemma 2 (Fitness Level Method [39])

For an elitist EA process ⇠ on a problem f , letS1

, ...,Sm be a<f -partition, let vi P (⇠t+1

2 [mj=i+1

Sj |

⇠t = x) for all x 2 Si, and ui � P (⇠t+1

2 [mj=i+1

Sj | ⇠t = x) for all x 2 Si. Then, the DCFHT of the EA

process is at most

X

1im�1

⇡0

(Si) ·m�1X

j=i

1

vj,

and is at least

X

1im�1

⇡0

(Si) ·1

ui.

Later on, more elaborated fitness level method was discovered by Sudholt [36], which we call as

the refined fitness level method in this paper as in Lemma 3. In the lemma, when the EA uses a

population of solutions, the notation ⇠ will denote that the best solution of the population is in the

solution space S.

Lemma 3 (Refined Fitness Level Method [35, 36])

For an elitist EA process ⇠ with a fitness function f , letS1

, ...,Sm be a<f -partition, let vi minj1

�i,j

P (⇠t+1

2

Sj | ⇠t = x) and ui � maxj1

�i,j

P (⇠t+1

2 Sj | ⇠t = x) for all x 2 Si wherePm

j=i+1

�i,j = 1, and let

�u,�l 2 [0, 1] be constants such that �u � �i,j/Pm

k=j �i,k � �l for all j > i, �u � 1 � vj+1

/vj and

�l � 1� uj+1

/uj for all 1 j m� 2. Then the DCFHT of the process is at most

m�1X

i=1

⇡0

(Si) ·�1

vi+ �u

m�1X

j=i+1

1

vj

�,

and the DCFHT of the process is at least

m�1X

i=1

⇡0

(Si) · (1

ui+ �l

m�1X

j=i+1

1

uj).

The refined fitness level method follows the general idea of the fitness level method, while intro-

duces a variable � that reflects the distribution of the probability that the EA jumps to better levels.

When � is small, the EA has a high probability to jump across many levels and thus make a large

progress; when � is large, the EA can only take a small progress in every step. Obviously, � can take

1 for upper bounds and 0 for lower bounds, which degrades the refined method to be the original

fitness level method. Therefore, the original fitness level method is a special case of the refined one.

20

⇡0(S0)m�1X

j=1

1

vj2 O(n lnn)

S3

Page 41: Evolutionary Computation - USTCstaff.ustc.edu.cn › ~ketang › PPT › ECLec11.pdf · Theoretical Analysis 28 intro. to theory Markov chain problem dependency RTA analysis tools

Theoretical Analysis

41  41  

intro. to theory Markov chain problem dependency RTA analysis tools

An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances

on parameters on comparison with classics on real-world situations summary

Drift Analysis [Hajek, 82][Sasaki & Hajek, 88][He & Yao, 01][He & Yao, 04]

distance function V measuring “distance” of a solution to optimal solutions. V(x*)=0

optimal solutions

S⇤

xV(x)

Then calculate:

2. bounds of progress distance for every step:

the ERT is then upper bounded by: and lower bounded by:

1. initialization probability of solutions ⇡0(x)

progress toward the optimum of every step of an EA, which directly derives the number of steps

the EA takes to arrive at the optimum. Note that in Lemma 8, the distance function can be arbi-

trary (possibly depends on the objective function), thus it covers the variants including the original

definition of drift analysis [17] and the multiplicative drift analysis [10].

Definition 12 (Distance Function)

For a space X with the optimal subspace X ⇤, a function V satisfying V (x) = 0 for all x 2 X ⇤ and

V (x) > 0 for all x 2 X � X ⇤ is called a distance function.

Lemma 8 (Drift Analysis)

For an EA process ⇠ 2 X , let V be a distance function, if there exists a positive value cl such that

8t : cl E[[V (⇠t)� V (⇠t+1

) | ⇠t]],

we have E[[⌧ | ⇠0

⇠ ⇡0

]] P

x2X ⇡0

(x)V (x)/cl;

and if there exists a non-negative value cu such that

8t : cu � E[[V (⇠t)� V (⇠t+1

) | ⇠t]],

we have E[[⌧ | ⇠0

⇠ ⇡0

]] �P

x2X ⇡0

(x)V (x)/cu.

It should be noted that, when cl or cu is negative, the obtained running time bound is also negative

and thus meaningless. In this case, we will say that the drift is invalid and the analysis fails.

Characterization 3 (Drift Analysis)

For an EA process ⇠ 2 X , the drift analysis ADA is defined by its parameters, input and output:

Paramters: a distance function V .

Input:

cl > 0 for upper bound analysis such that cl E[[V (⇠t)� V (⇠t+1

) | ⇠t]] for all t � 0;

cu > 0 for lower bound analysis such that cu � E[[V (⇠t)� V (⇠t+1

) | ⇠t]] for all t � 0.

Output:

AuDA =

Px2X ⇡

0

(x)V (x)/cl;

AlDA =

Px2X ⇡

0

(x)V (x)/cu.

7.2. The Power of Switch Analysis from Drift Analysis

Theorem 4

ADA is reducible to ASA.

Lemma 9

ADA is upper-bound reducible to ASA.

28

progress toward the optimum of every step of an EA, which directly derives the number of steps

the EA takes to arrive at the optimum. Note that in Lemma 8, the distance function can be arbi-

trary (possibly depends on the objective function), thus it covers the variants including the original

definition of drift analysis [17] and the multiplicative drift analysis [10].

Definition 12 (Distance Function)

For a space X with the optimal subspace X ⇤, a function V satisfying V (x) = 0 for all x 2 X ⇤ and

V (x) > 0 for all x 2 X � X ⇤ is called a distance function.

Lemma 8 (Drift Analysis)

For an EA process ⇠ 2 X , let V be a distance function, if there exists a positive value cl such that

8t : cl E[[V (⇠t)� V (⇠t+1

) | ⇠t]],

we have E[[⌧ | ⇠0

⇠ ⇡0

]] P

x2X ⇡0

(x)V (x)/cl;

and if there exists a non-negative value cu such that

8t : cu � E[[V (⇠t)� V (⇠t+1

) | ⇠t]],

we have E[[⌧ | ⇠0

⇠ ⇡0

]] �P

x2X ⇡0

(x)V (x)/cu.

It should be noted that, when cl or cu is negative, the obtained running time bound is also negative

and thus meaningless. In this case, we will say that the drift is invalid and the analysis fails.

Characterization 3 (Drift Analysis)

For an EA process ⇠ 2 X , the drift analysis ADA is defined by its parameters, input and output:

Paramters: a distance function V .

Input:

cl > 0 for upper bound analysis such that cl E[[V (⇠t)� V (⇠t+1

) | ⇠t]] for all t � 0;

cu > 0 for lower bound analysis such that cu � E[[V (⇠t)� V (⇠t+1

) | ⇠t]] for all t � 0.

Output:

AuDA =

Px2X ⇡

0

(x)V (x)/cl;

AlDA =

Px2X ⇡

0

(x)V (x)/cu.

7.2. The Power of Switch Analysis from Drift Analysis

Theorem 4

ADA is reducible to ASA.

Lemma 9

ADA is upper-bound reducible to ASA.

28

most simplified version

cl E[V (⇠t)� V (⇠t+1) | ⇠t]cu � E[V (⇠t)� V (⇠t+1) | ⇠t]

Page 42: Evolutionary Computation - USTCstaff.ustc.edu.cn › ~ketang › PPT › ECLec11.pdf · Theoretical Analysis 28 intro. to theory Markov chain problem dependency RTA analysis tools

Theoretical Analysis

42  42  

intro. to theory Markov chain problem dependency RTA analysis tools

An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances

on parameters on comparison with classics on real-world situations summary

Example in LeadingOnes

LeadingOnes Problem: count the number of leading 1-bitsf(11011111) = 2f(x) =

nX

i=1

iY

j=1

xi

argmax

x2{0,1}n

nX

i=1

iY

j=1

x

i

fitness:

Distance function: V (x) = n� f(x)distance of optimal solutions is zero

zerozero

E[V (⇠t)� V (⇠t+1) | ⇠t] =I(V (⇠t) > V (⇠t+1))E[V (⇠t)� V (⇠t+1) | ⇠t]+I(V (⇠t) < V (⇠t+1))E[V (⇠t)� V (⇠t+1) | ⇠t]+I(V (⇠t) = V (⇠t+1))E[V (⇠t)� V (⇠t+1) | ⇠t]

The drift:

Only need to care the expected progress:11...10......keep flip

probability of making progress >=probability of increasing at least one leading 1-bit

E[V (⇠t)� V (⇠t+1) | ⇠t] � 1 · 1n(1� 1

n)i � 1

n(1� 1

n)n�1 � 1

en

Page 43: Evolutionary Computation - USTCstaff.ustc.edu.cn › ~ketang › PPT › ECLec11.pdf · Theoretical Analysis 28 intro. to theory Markov chain problem dependency RTA analysis tools

Theoretical Analysis

43  43  

intro. to theory Markov chain problem dependency RTA analysis tools

An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances

on parameters on comparison with classics on real-world situations summary

Example in LeadingOnes

LeadingOnes Problem:

fitness:

Distance function: V (x) = n� f(x)distance of optimal solutions is zero

E[V (⇠t)� V (⇠t+1) | ⇠t] � 1 · 1n(1� 1

n)i � 1

n(1� 1

n)n�1 � 1

en

ERT is then upper bounded asX

x2X

⇡0(x)V (x)1en

V ((00 . . . 0))1en

=n1en

2 O(n2)

the exact running time is approximate 0.86n2 [Böttcher, et al., 10]

f(x) =nX

i=1

iY

j=1

xi

argmax

x2{0,1}n

nX

i=1

iY

j=1

x

i

count the number of leading 1-bitsf(11011111) = 2

Page 44: Evolutionary Computation - USTCstaff.ustc.edu.cn › ~ketang › PPT › ECLec11.pdf · Theoretical Analysis 28 intro. to theory Markov chain problem dependency RTA analysis tools

Theoretical Analysis

•  Additional Remarks –  There are more analytical tools than what we’ve talked today.

–  Different analytical tools may suit different algorithms/problems (at least from an intuitive perspective).

–  Theoretical analysis of EAs have been extended to more complicated problems (e.g., NP-hard problems such as Traveling Salesman, Minimum Vertex Cover, etc).

–  In addition to deepening our understanding, theoretical analysis may also help configuring EAs for a specific problem.

44  44  

Page 45: Evolutionary Computation - USTCstaff.ustc.edu.cn › ~ketang › PPT › ECLec11.pdf · Theoretical Analysis 28 intro. to theory Markov chain problem dependency RTA analysis tools

End of Lecture 11

45