Top Banner
GoBack
43

GoBack - Department of Cyberneticslabe.felk.cvut.cz/~posik/xe33scp/xe33scp-eda2.pdf · Finite number of elements 0000 1000 0100 1100 ... By analogy to discrete EDAs... No Interactions

Mar 26, 2018

Download

Documents

nguyennhi
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: GoBack - Department of Cyberneticslabe.felk.cvut.cz/~posik/xe33scp/xe33scp-eda2.pdf · Finite number of elements 0000 1000 0100 1100 ... By analogy to discrete EDAs... No Interactions

GoBack

Page 2: GoBack - Department of Cyberneticslabe.felk.cvut.cz/~posik/xe33scp/xe33scp-eda2.pdf · Finite number of elements 0000 1000 0100 1100 ... By analogy to discrete EDAs... No Interactions

P. Pošík c© 2007 Soft Computing – 1 / 23

8. Estimation of Distribution Algorithms. Continuous Domain.

Petr Pošík

Czech Technical University in Prague

Faculty of Electrical Engineering

Department of Cybernetics

Page 3: GoBack - Department of Cyberneticslabe.felk.cvut.cz/~posik/xe33scp/xe33scp-eda2.pdf · Finite number of elements 0000 1000 0100 1100 ... By analogy to discrete EDAs... No Interactions

Contents

Last week. . .

Features of continuousspaces

Real-valued EDAs

Optimization usingGaussian distribution

P. Pošík c© 2007 Soft Computing – 2 / 23

Last week. . .

Features of continuous spaces

Real-valued EDAs

Optimization using Gaussian distribution

Page 4: GoBack - Department of Cyberneticslabe.felk.cvut.cz/~posik/xe33scp/xe33scp-eda2.pdf · Finite number of elements 0000 1000 0100 1100 ... By analogy to discrete EDAs... No Interactions

Last week. . .

Last week. . .

Intro to EDAs

Content of the lectures

Features of continuousspaces

Real-valued EDAs

Optimization usingGaussian distribution

P. Pošík c© 2007 Soft Computing – 3 / 23

Page 5: GoBack - Department of Cyberneticslabe.felk.cvut.cz/~posik/xe33scp/xe33scp-eda2.pdf · Finite number of elements 0000 1000 0100 1100 ... By analogy to discrete EDAs... No Interactions

Intro to EDAs

Last week. . .

Intro to EDAs

Content of the lectures

Features of continuousspaces

Real-valued EDAs

Optimization usingGaussian distribution

P. Pošík c© 2007 Soft Computing – 4 / 23

Black-box optimization

GA vs. EDA

✔ GA approach: select — crossover — mutate

✔ EDA approach: select — model — sample

EDA with binary representation

✔ the best possible (general, flexible) model: joint probability

✘ determine the probability of each possible combination of bits

✘ 2D − 1 parameters, exponential complexity

✔ less precise (less flexible), but simpler probabilistic models

Page 6: GoBack - Department of Cyberneticslabe.felk.cvut.cz/~posik/xe33scp/xe33scp-eda2.pdf · Finite number of elements 0000 1000 0100 1100 ... By analogy to discrete EDAs... No Interactions

Content of the lectures

Last week. . .

Intro to EDAs

Content of the lectures

Features of continuousspaces

Real-valued EDAs

Optimization usingGaussian distribution

P. Pošík c© 2007 Soft Computing – 5 / 23

Binary EDAs

✔ Without interactions

✘ 1-dimensional marginal probabilities p(X = x)

✘ PBIL, UMDA, cGA

✔ Pairwise interactions

✘ conditional probabilities p(X = x|Y = y)

✘ sequences (MIMIC), trees (COMIT), forrest (BMDA)

✔ Multivariate interactions

✘ conditional probabilities p(X = x|Y = y, Z = z, . . .)

✘ Bayesian networks (BOA, EBNA, LFDA)

Continuous EDAs

✔ Histograms

✔ Gaussian distribution

✔ Evolutionary strategies, CMA-ES

Page 7: GoBack - Department of Cyberneticslabe.felk.cvut.cz/~posik/xe33scp/xe33scp-eda2.pdf · Finite number of elements 0000 1000 0100 1100 ... By analogy to discrete EDAs... No Interactions

Features of continuous spaces

Last week. . .

Features of continuousspaces

The difference of binaryand real space

Local neighborhood

Real-valued EDAs

Optimization usingGaussian distribution

P. Pošík c© 2007 Soft Computing – 6 / 23

Page 8: GoBack - Department of Cyberneticslabe.felk.cvut.cz/~posik/xe33scp/xe33scp-eda2.pdf · Finite number of elements 0000 1000 0100 1100 ... By analogy to discrete EDAs... No Interactions

The difference of binary and real space

Last week. . .

Features of continuousspaces

The difference of binaryand real space

Local neighborhood

Real-valued EDAs

Optimization usingGaussian distribution

P. Pošík c© 2007 Soft Computing – 7 / 23

Binary space

✔ Each possible solution is placedin one of the corners ofD-dimensional hypercube

✔ No values lying between them

✔ Finite number of elements

0000 1000

11000100

0010 1010

11100110

0111

0101

111110110011

0001 1001

1101

Real space

✔ The space in each dimension need not be bounded

✔ Even when bounded by a hypercube, there are infinitely many points betweenthe bounds (theoretically; in practice we are limited by the numerical precisionof given machine)

✔ Infinitely many (even uncountably many) candidate solutions

Page 9: GoBack - Department of Cyberneticslabe.felk.cvut.cz/~posik/xe33scp/xe33scp-eda2.pdf · Finite number of elements 0000 1000 0100 1100 ... By analogy to discrete EDAs... No Interactions

Local neighborhood

Last week. . .

Features of continuousspaces

The difference of binaryand real space

Local neighborhood

Real-valued EDAs

Optimization usingGaussian distribution

P. Pošík c© 2007 Soft Computing – 8 / 23

How do you define a local neighborhood?

✔ . . . as a set of points that do not have the distance to a reference point largerthan a threshold?

Page 10: GoBack - Department of Cyberneticslabe.felk.cvut.cz/~posik/xe33scp/xe33scp-eda2.pdf · Finite number of elements 0000 1000 0100 1100 ... By analogy to discrete EDAs... No Interactions

Local neighborhood

Last week. . .

Features of continuousspaces

The difference of binaryand real space

Local neighborhood

Real-valued EDAs

Optimization usingGaussian distribution

P. Pošík c© 2007 Soft Computing – 8 / 23

How do you define a local neighborhood?

✔ . . . as a set of points that do not have the distance to a reference point largerthan a threshold?

✘ The volume of the local neighborhood relative to the volume of thewhole space exponentially drops

✘ With increasing dimensionality the neighborhood becomes increasinglymore local

Page 11: GoBack - Department of Cyberneticslabe.felk.cvut.cz/~posik/xe33scp/xe33scp-eda2.pdf · Finite number of elements 0000 1000 0100 1100 ... By analogy to discrete EDAs... No Interactions

Local neighborhood

Last week. . .

Features of continuousspaces

The difference of binaryand real space

Local neighborhood

Real-valued EDAs

Optimization usingGaussian distribution

P. Pošík c© 2007 Soft Computing – 8 / 23

How do you define a local neighborhood?

✔ . . . as a set of points that do not have the distance to a reference point largerthan a threshold?

✘ The volume of the local neighborhood relative to the volume of thewhole space exponentially drops

✘ With increasing dimensionality the neighborhood becomes increasinglymore local

✔ . . . as a set of points that are closest to the reference point and their unificationcovers part of the search space of certain (constant) size?

Page 12: GoBack - Department of Cyberneticslabe.felk.cvut.cz/~posik/xe33scp/xe33scp-eda2.pdf · Finite number of elements 0000 1000 0100 1100 ... By analogy to discrete EDAs... No Interactions

Local neighborhood

Last week. . .

Features of continuousspaces

The difference of binaryand real space

Local neighborhood

Real-valued EDAs

Optimization usingGaussian distribution

P. Pošík c© 2007 Soft Computing – 8 / 23

How do you define a local neighborhood?

✔ . . . as a set of points that do not have the distance to a reference point largerthan a threshold?

✘ The volume of the local neighborhood relative to the volume of thewhole space exponentially drops

✘ With increasing dimensionality the neighborhood becomes increasinglymore local

✔ . . . as a set of points that are closest to the reference point and their unificationcovers part of the search space of certain (constant) size?

✘ The size of the local neighborhood rises with dimensionality of the searchspace

✘ With increasing dimensionality of the search space the neighborhood isincreasingly less local

Curse of dimensionality!

Page 13: GoBack - Department of Cyberneticslabe.felk.cvut.cz/~posik/xe33scp/xe33scp-eda2.pdf · Finite number of elements 0000 1000 0100 1100 ... By analogy to discrete EDAs... No Interactions

Real-valued EDAs

Last week. . .

Features of continuousspaces

Real-valued EDAsBy analogy to discreteEDAs. . .No Interactions AmongVariablesResults: Two PeaksFunctionHistogram UMDA:Summary

Optimization usingGaussian distribution

P. Pošík c© 2007 Soft Computing – 9 / 23

Page 14: GoBack - Department of Cyberneticslabe.felk.cvut.cz/~posik/xe33scp/xe33scp-eda2.pdf · Finite number of elements 0000 1000 0100 1100 ... By analogy to discrete EDAs... No Interactions

By analogy to discrete EDAs. . .

Last week. . .

Features of continuousspaces

Real-valued EDAsBy analogy to discreteEDAs. . .No Interactions AmongVariablesResults: Two PeaksFunctionHistogram UMDA:Summary

Optimization usingGaussian distribution

P. Pošík c© 2007 Soft Computing – 10 / 23

Without interactions

✔ UMDA: the same principle, only the marginal probability model is of differenttype

✔ Univariate histograms?

✔ Univariate Gaussian distribution?

✔ Univariate mixture of Gaussians?

Pairwise and higher-order interactions:

✔ Many different types of interactions!

✔ Model which would describe all possible kinds of interaction is virtuallyimpossible to find!

Page 15: GoBack - Department of Cyberneticslabe.felk.cvut.cz/~posik/xe33scp/xe33scp-eda2.pdf · Finite number of elements 0000 1000 0100 1100 ... By analogy to discrete EDAs... No Interactions

No Interactions Among Variables

Last week. . .

Features of continuousspaces

Real-valued EDAsBy analogy to discreteEDAs. . .No Interactions AmongVariablesResults: Two PeaksFunctionHistogram UMDA:Summary

Optimization usingGaussian distribution

P. Pošík c© 2007 Soft Computing – 11 / 23

UMDA: EDA with marginal productmodel:

p(x) =D

∏d=1

p(xd) (1)

The following univariate models werecompared:

✔ Equi-width histogram

✔ Equi-height histogram

✔ Max-diff histogram

✔ Univariate mixture of Gaussians

Features:

✔ the most straightforward analogywith discrete histograms

✔ if any bin is empty, there is noway to create new individual inthat bin

0 5 10 15 200

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

Equi−width Histogram

Page 16: GoBack - Department of Cyberneticslabe.felk.cvut.cz/~posik/xe33scp/xe33scp-eda2.pdf · Finite number of elements 0000 1000 0100 1100 ... By analogy to discrete EDAs... No Interactions

No Interactions Among Variables

Last week. . .

Features of continuousspaces

Real-valued EDAsBy analogy to discreteEDAs. . .No Interactions AmongVariablesResults: Two PeaksFunctionHistogram UMDA:Summary

Optimization usingGaussian distribution

P. Pošík c© 2007 Soft Computing – 11 / 23

UMDA: EDA with marginal productmodel:

p(x) =D

∏d=1

p(xd) (1)

The following univariate models werecompared:

✔ Equi-width histogram

✔ Equi-height histogram

✔ Max-diff histogram

✔ Univariate mixture of Gaussians

Features:

✔ instead of fixing the bin width, fixthe number of points in each bin

✔ no empty bins, always possibleto generate any point in thehyperrectangle

0 5 10 15 200

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

Equi−height Histogram

Page 17: GoBack - Department of Cyberneticslabe.felk.cvut.cz/~posik/xe33scp/xe33scp-eda2.pdf · Finite number of elements 0000 1000 0100 1100 ... By analogy to discrete EDAs... No Interactions

No Interactions Among Variables

Last week. . .

Features of continuousspaces

Real-valued EDAsBy analogy to discreteEDAs. . .No Interactions AmongVariablesResults: Two PeaksFunctionHistogram UMDA:Summary

Optimization usingGaussian distribution

P. Pošík c© 2007 Soft Computing – 11 / 23

UMDA: EDA with marginal productmodel:

p(x) =D

∏d=1

p(xd) (1)

The following univariate models werecompared:

✔ Equi-width histogram

✔ Equi-height histogram

✔ Max-diff histogram

✔ Univariate mixture of Gaussians

Features:

✔ place the bin boundaries to thelargest gaps between the points

✔ no empty bins, always possibleto generate any point in thehyperrectangle

0 5 10 15 200

0.02

0.04

0.06

0.08

0.1

0.12

Max−diff Histogram

Page 18: GoBack - Department of Cyberneticslabe.felk.cvut.cz/~posik/xe33scp/xe33scp-eda2.pdf · Finite number of elements 0000 1000 0100 1100 ... By analogy to discrete EDAs... No Interactions

No Interactions Among Variables

Last week. . .

Features of continuousspaces

Real-valued EDAsBy analogy to discreteEDAs. . .No Interactions AmongVariablesResults: Two PeaksFunctionHistogram UMDA:Summary

Optimization usingGaussian distribution

P. Pošík c© 2007 Soft Computing – 11 / 23

UMDA: EDA with marginal productmodel:

p(x) =D

∏d=1

p(xd) (1)

The following univariate models werecompared:

✔ Equi-width histogram

✔ Equi-height histogram

✔ Max-diff histogram

✔ Univariate mixture of Gaussians

Features:

✔ built by the EM algorithm(probabilistic version of k-meansclustering)

✔ more suitable for unboundedspaces

0 5 10 15 20 250

0.05

0.1

0.15

0.2

0.25

0.3

0.35Mixture of Gaussians

Page 19: GoBack - Department of Cyberneticslabe.felk.cvut.cz/~posik/xe33scp/xe33scp-eda2.pdf · Finite number of elements 0000 1000 0100 1100 ... By analogy to discrete EDAs... No Interactions

No Interactions Among Variables

Last week. . .

Features of continuousspaces

Real-valued EDAsBy analogy to discreteEDAs. . .No Interactions AmongVariablesResults: Two PeaksFunctionHistogram UMDA:Summary

Optimization usingGaussian distribution

P. Pošík c© 2007 Soft Computing – 11 / 23

UMDA: EDA with marginal productmodel:

p(x) =D

∏d=1

p(xd) (1)

The following univariate models werecompared:

✔ Equi-width histogram

✔ Equi-height histogram

✔ Max-diff histogram

✔ Univariate mixture of Gaussians

The winner of comparison:

Equi-height histogram

✔ precise

✔ non-parametric

0 5 10 15 200

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

Equi−height Histogram

Page 20: GoBack - Department of Cyberneticslabe.felk.cvut.cz/~posik/xe33scp/xe33scp-eda2.pdf · Finite number of elements 0000 1000 0100 1100 ... By analogy to discrete EDAs... No Interactions

Results: Two Peaks Function

Last week. . .

Features of continuousspaces

Real-valued EDAsBy analogy to discreteEDAs. . .No Interactions AmongVariablesResults: Two PeaksFunctionHistogram UMDA:Summary

Optimization usingGaussian distribution

P. Pošík c© 2007 Soft Computing – 12 / 23

✔ optimum in (1, 1, . . . , 1)

✔ 2D local optima

0 2 4 6 8 10 120

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

Evolution of bin boundaries (component centers for MOG):

0 20 40 60 80 1000

2

4

6

8

10

12Evolution of bin boundaries for the HEH model

0 20 40 60 80 1000

2

4

6

8

10

12Evolution of bin boundaries for the HMD model

0 20 40 60 80 1000

2

4

6

8

10

12Evolution of component centers for the MOG model

Page 21: GoBack - Department of Cyberneticslabe.felk.cvut.cz/~posik/xe33scp/xe33scp-eda2.pdf · Finite number of elements 0000 1000 0100 1100 ... By analogy to discrete EDAs... No Interactions

Histogram UMDA: Summary

Last week. . .

Features of continuousspaces

Real-valued EDAsBy analogy to discreteEDAs. . .No Interactions AmongVariablesResults: Two PeaksFunctionHistogram UMDA:Summary

Optimization usingGaussian distribution

P. Pošík c© 2007 Soft Computing – 13 / 23

Suitable when

✔ the search space is bounded by a hyperrectangle

✔ there are no strong interactions among variables

Possible extension:

✔ Rotation of the coordinate system → UMDA is then able to work with linearinteractions

Page 22: GoBack - Department of Cyberneticslabe.felk.cvut.cz/~posik/xe33scp/xe33scp-eda2.pdf · Finite number of elements 0000 1000 0100 1100 ... By analogy to discrete EDAs... No Interactions

Optimization using Gaussian distribution

Last week. . .

Features of continuousspaces

Real-valued EDAs

Optimization usingGaussian distributionCase study: QuadraticfunctionWhat happens on theslope?

Preventing prematureconvergence

Evolutionary Strategies

ES: Increasing modelflexibility

CMA-ES

CMA-ES: Summary

Real-valued EDAs:Summary

P. Pošík c© 2007 Soft Computing – 14 / 23

Page 23: GoBack - Department of Cyberneticslabe.felk.cvut.cz/~posik/xe33scp/xe33scp-eda2.pdf · Finite number of elements 0000 1000 0100 1100 ... By analogy to discrete EDAs... No Interactions

Case study: Quadratic function

Last week. . .

Features of continuousspaces

Real-valued EDAs

Optimization usingGaussian distributionCase study: QuadraticfunctionWhat happens on theslope?

Preventing prematureconvergence

Evolutionary Strategies

ES: Increasing modelflexibility

CMA-ES

CMA-ES: Summary

Real-valued EDAs:Summary

P. Pošík c© 2007 Soft Computing – 15 / 23

Consider simple EDA with the following settings:

✔ Truncation selection: use τ · N best individuals to build the model

✔ Gaussian distribution: fit the Gaussian using maximum likelihood (ML)estimate

Two situations:

Population centered around optimum(population in the valley):

Population far away from optimum(population on the slope):

Page 24: GoBack - Department of Cyberneticslabe.felk.cvut.cz/~posik/xe33scp/xe33scp-eda2.pdf · Finite number of elements 0000 1000 0100 1100 ... By analogy to discrete EDAs... No Interactions

Case study: Quadratic function

Last week. . .

Features of continuousspaces

Real-valued EDAs

Optimization usingGaussian distributionCase study: QuadraticfunctionWhat happens on theslope?

Preventing prematureconvergence

Evolutionary Strategies

ES: Increasing modelflexibility

CMA-ES

CMA-ES: Summary

Real-valued EDAs:Summary

P. Pošík c© 2007 Soft Computing – 15 / 23

Consider simple EDA with the following settings:

✔ Truncation selection: use τ · N best individuals to build the model

✔ Gaussian distribution: fit the Gaussian using maximum likelihood (ML)estimate

Two situations:

Population centered around optimum(population in the valley):

−3 −2 −1 0 1 2 30

0.2

0.4

0.6

0.8

1

1.2

1.4τ = 0.8

Population far away from optimum(population on the slope):

Page 25: GoBack - Department of Cyberneticslabe.felk.cvut.cz/~posik/xe33scp/xe33scp-eda2.pdf · Finite number of elements 0000 1000 0100 1100 ... By analogy to discrete EDAs... No Interactions

Case study: Quadratic function

Last week. . .

Features of continuousspaces

Real-valued EDAs

Optimization usingGaussian distributionCase study: QuadraticfunctionWhat happens on theslope?

Preventing prematureconvergence

Evolutionary Strategies

ES: Increasing modelflexibility

CMA-ES

CMA-ES: Summary

Real-valued EDAs:Summary

P. Pošík c© 2007 Soft Computing – 15 / 23

Consider simple EDA with the following settings:

✔ Truncation selection: use τ · N best individuals to build the model

✔ Gaussian distribution: fit the Gaussian using maximum likelihood (ML)estimate

Two situations:

Population centered around optimum(population in the valley):

−3 −2 −1 0 1 2 30

0.2

0.4

0.6

0.8

1

1.2

1.4τ = 0.8

Population far away from optimum(population on the slope):

Page 26: GoBack - Department of Cyberneticslabe.felk.cvut.cz/~posik/xe33scp/xe33scp-eda2.pdf · Finite number of elements 0000 1000 0100 1100 ... By analogy to discrete EDAs... No Interactions

Case study: Quadratic function

Last week. . .

Features of continuousspaces

Real-valued EDAs

Optimization usingGaussian distributionCase study: QuadraticfunctionWhat happens on theslope?

Preventing prematureconvergence

Evolutionary Strategies

ES: Increasing modelflexibility

CMA-ES

CMA-ES: Summary

Real-valued EDAs:Summary

P. Pošík c© 2007 Soft Computing – 15 / 23

Consider simple EDA with the following settings:

✔ Truncation selection: use τ · N best individuals to build the model

✔ Gaussian distribution: fit the Gaussian using maximum likelihood (ML)estimate

Two situations:

Population centered around optimum(population in the valley):

−3 −2 −1 0 1 2 30

0.2

0.4

0.6

0.8

1

1.2

1.4τ = 0.8

Population far away from optimum(population on the slope):

Page 27: GoBack - Department of Cyberneticslabe.felk.cvut.cz/~posik/xe33scp/xe33scp-eda2.pdf · Finite number of elements 0000 1000 0100 1100 ... By analogy to discrete EDAs... No Interactions

Case study: Quadratic function

Last week. . .

Features of continuousspaces

Real-valued EDAs

Optimization usingGaussian distributionCase study: QuadraticfunctionWhat happens on theslope?

Preventing prematureconvergence

Evolutionary Strategies

ES: Increasing modelflexibility

CMA-ES

CMA-ES: Summary

Real-valued EDAs:Summary

P. Pošík c© 2007 Soft Computing – 15 / 23

Consider simple EDA with the following settings:

✔ Truncation selection: use τ · N best individuals to build the model

✔ Gaussian distribution: fit the Gaussian using maximum likelihood (ML)estimate

Two situations:

Population centered around optimum(population in the valley):

−3 −2 −1 0 1 2 30

0.2

0.4

0.6

0.8

1

1.2

1.4τ = 0.8

Population far away from optimum(population on the slope):

Page 28: GoBack - Department of Cyberneticslabe.felk.cvut.cz/~posik/xe33scp/xe33scp-eda2.pdf · Finite number of elements 0000 1000 0100 1100 ... By analogy to discrete EDAs... No Interactions

Case study: Quadratic function

Last week. . .

Features of continuousspaces

Real-valued EDAs

Optimization usingGaussian distributionCase study: QuadraticfunctionWhat happens on theslope?

Preventing prematureconvergence

Evolutionary Strategies

ES: Increasing modelflexibility

CMA-ES

CMA-ES: Summary

Real-valued EDAs:Summary

P. Pošík c© 2007 Soft Computing – 15 / 23

Consider simple EDA with the following settings:

✔ Truncation selection: use τ · N best individuals to build the model

✔ Gaussian distribution: fit the Gaussian using maximum likelihood (ML)estimate

Two situations:

Population centered around optimum(population in the valley):

−3 −2 −1 0 1 2 30

0.2

0.4

0.6

0.8

1

1.2

1.4τ = 0.8

Population far away from optimum(population on the slope):

−103 −102 −101 −100 −99 −98 −970

0.5

1

1.5

2

2.5

3

3.5

4τ = 0.2

Page 29: GoBack - Department of Cyberneticslabe.felk.cvut.cz/~posik/xe33scp/xe33scp-eda2.pdf · Finite number of elements 0000 1000 0100 1100 ... By analogy to discrete EDAs... No Interactions

Case study: Quadratic function

Last week. . .

Features of continuousspaces

Real-valued EDAs

Optimization usingGaussian distributionCase study: QuadraticfunctionWhat happens on theslope?

Preventing prematureconvergence

Evolutionary Strategies

ES: Increasing modelflexibility

CMA-ES

CMA-ES: Summary

Real-valued EDAs:Summary

P. Pošík c© 2007 Soft Computing – 15 / 23

Consider simple EDA with the following settings:

✔ Truncation selection: use τ · N best individuals to build the model

✔ Gaussian distribution: fit the Gaussian using maximum likelihood (ML)estimate

Two situations:

Population centered around optimum(population in the valley):

−3 −2 −1 0 1 2 30

0.2

0.4

0.6

0.8

1

1.2

1.4τ = 0.8

Population far away from optimum(population on the slope):

−103 −102 −101 −100 −99 −98 −970

0.5

1

1.5

2

2.5

3

3.5

4τ = 0.2

Page 30: GoBack - Department of Cyberneticslabe.felk.cvut.cz/~posik/xe33scp/xe33scp-eda2.pdf · Finite number of elements 0000 1000 0100 1100 ... By analogy to discrete EDAs... No Interactions

Case study: Quadratic function

Last week. . .

Features of continuousspaces

Real-valued EDAs

Optimization usingGaussian distributionCase study: QuadraticfunctionWhat happens on theslope?

Preventing prematureconvergence

Evolutionary Strategies

ES: Increasing modelflexibility

CMA-ES

CMA-ES: Summary

Real-valued EDAs:Summary

P. Pošík c© 2007 Soft Computing – 15 / 23

Consider simple EDA with the following settings:

✔ Truncation selection: use τ · N best individuals to build the model

✔ Gaussian distribution: fit the Gaussian using maximum likelihood (ML)estimate

Two situations:

Population centered around optimum(population in the valley):

−3 −2 −1 0 1 2 30

0.2

0.4

0.6

0.8

1

1.2

1.4τ = 0.8

Population far away from optimum(population on the slope):

−103 −102 −101 −100 −99 −98 −970

0.5

1

1.5

2

2.5

3

3.5

4τ = 0.2

Page 31: GoBack - Department of Cyberneticslabe.felk.cvut.cz/~posik/xe33scp/xe33scp-eda2.pdf · Finite number of elements 0000 1000 0100 1100 ... By analogy to discrete EDAs... No Interactions

Case study: Quadratic function

Last week. . .

Features of continuousspaces

Real-valued EDAs

Optimization usingGaussian distributionCase study: QuadraticfunctionWhat happens on theslope?

Preventing prematureconvergence

Evolutionary Strategies

ES: Increasing modelflexibility

CMA-ES

CMA-ES: Summary

Real-valued EDAs:Summary

P. Pošík c© 2007 Soft Computing – 15 / 23

Consider simple EDA with the following settings:

✔ Truncation selection: use τ · N best individuals to build the model

✔ Gaussian distribution: fit the Gaussian using maximum likelihood (ML)estimate

Two situations:

Population centered around optimum(population in the valley):

−3 −2 −1 0 1 2 30

0.2

0.4

0.6

0.8

1

1.2

1.4τ = 0.8

Population far away from optimum(population on the slope):

−103 −102 −101 −100 −99 −98 −970

0.5

1

1.5

2

2.5

3

3.5

4τ = 0.2

Page 32: GoBack - Department of Cyberneticslabe.felk.cvut.cz/~posik/xe33scp/xe33scp-eda2.pdf · Finite number of elements 0000 1000 0100 1100 ... By analogy to discrete EDAs... No Interactions

What happens on the slope?

Last week. . .

Features of continuousspaces

Real-valued EDAs

Optimization usingGaussian distributionCase study: QuadraticfunctionWhat happens on theslope?

Preventing prematureconvergence

Evolutionary Strategies

ES: Increasing modelflexibility

CMA-ES

CMA-ES: Summary

Real-valued EDAs:Summary

P. Pošík c© 2007 Soft Computing – 16 / 23

The change of population statistics in 1 generation:

Expected value:

µt+1 = E(X|X > xmin) = µt + σ · d(τ),

where

d(τ) =φ(Φ−1(τ))

τ.

Page 33: GoBack - Department of Cyberneticslabe.felk.cvut.cz/~posik/xe33scp/xe33scp-eda2.pdf · Finite number of elements 0000 1000 0100 1100 ... By analogy to discrete EDAs... No Interactions

What happens on the slope?

Last week. . .

Features of continuousspaces

Real-valued EDAs

Optimization usingGaussian distributionCase study: QuadraticfunctionWhat happens on theslope?

Preventing prematureconvergence

Evolutionary Strategies

ES: Increasing modelflexibility

CMA-ES

CMA-ES: Summary

Real-valued EDAs:Summary

P. Pošík c© 2007 Soft Computing – 16 / 23

The change of population statistics in 1 generation:

Expected value:

µt+1 = E(X|X > xmin) = µt + σ · d(τ),

where

d(τ) =φ(Φ−1(τ))

τ.

Variance:

(σt+1)2 = Var(X|X > xmin) = (σt)2 · c(τ),

where

c(τ) = 1 +Φ−1(1 − τ) · φ(Φ−1(τ))

τ− d(τ)2.

Page 34: GoBack - Department of Cyberneticslabe.felk.cvut.cz/~posik/xe33scp/xe33scp-eda2.pdf · Finite number of elements 0000 1000 0100 1100 ... By analogy to discrete EDAs... No Interactions

What happens on the slope?

Last week. . .

Features of continuousspaces

Real-valued EDAs

Optimization usingGaussian distributionCase study: QuadraticfunctionWhat happens on theslope?

Preventing prematureconvergence

Evolutionary Strategies

ES: Increasing modelflexibility

CMA-ES

CMA-ES: Summary

Real-valued EDAs:Summary

P. Pošík c© 2007 Soft Computing – 16 / 23

The change of population statistics in 1 generation:

Expected value:

µt+1 = E(X|X > xmin) = µt + σ · d(τ),

where

d(τ) =φ(Φ−1(τ))

τ.

Variance:

(σt+1)2 = Var(X|X > xmin) = (σt)2 · c(τ),

where

c(τ) = 1 +Φ−1(1 − τ) · φ(Φ−1(τ))

τ− d(τ)2.

0 0.2 0.4 0.6 0.8 1−0.5

0

0.5

1

1.5

2

2.5

3

τ

d

On slopeIn the valley

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

τ

c

On slopeIn the valley

Page 35: GoBack - Department of Cyberneticslabe.felk.cvut.cz/~posik/xe33scp/xe33scp-eda2.pdf · Finite number of elements 0000 1000 0100 1100 ... By analogy to discrete EDAs... No Interactions

What happens on the slope (cont.)

Last week. . .

Features of continuousspaces

Real-valued EDAs

Optimization usingGaussian distributionCase study: QuadraticfunctionWhat happens on theslope?

Preventing prematureconvergence

Evolutionary Strategies

ES: Increasing modelflexibility

CMA-ES

CMA-ES: Summary

Real-valued EDAs:Summary

P. Pošík c© 2007 Soft Computing – 17 / 23

Population statistics in generation t:

µt = µ0 + σ0 · d(τ) · ∑ti=1

c(τ)i−1

σt = σ0 ·√

c(τ)t

Convergence of population statistics:

limt→∞

µt = µ0 + σ0 · d(τ) · 1

1−√

c(τ)

limt→∞

σt = 0

Page 36: GoBack - Department of Cyberneticslabe.felk.cvut.cz/~posik/xe33scp/xe33scp-eda2.pdf · Finite number of elements 0000 1000 0100 1100 ... By analogy to discrete EDAs... No Interactions

What happens on the slope (cont.)

Last week. . .

Features of continuousspaces

Real-valued EDAs

Optimization usingGaussian distributionCase study: QuadraticfunctionWhat happens on theslope?

Preventing prematureconvergence

Evolutionary Strategies

ES: Increasing modelflexibility

CMA-ES

CMA-ES: Summary

Real-valued EDAs:Summary

P. Pošík c© 2007 Soft Computing – 17 / 23

Population statistics in generation t:

µt = µ0 + σ0 · d(τ) · ∑ti=1

c(τ)i−1

σt = σ0 ·√

c(τ)t

Convergence of population statistics:

limt→∞

µt = µ0 + σ0 · d(τ) · 1

1−√

c(τ)

limt→∞

σt = 0

Geometric series

Page 37: GoBack - Department of Cyberneticslabe.felk.cvut.cz/~posik/xe33scp/xe33scp-eda2.pdf · Finite number of elements 0000 1000 0100 1100 ... By analogy to discrete EDAs... No Interactions

What happens on the slope (cont.)

Last week. . .

Features of continuousspaces

Real-valued EDAs

Optimization usingGaussian distributionCase study: QuadraticfunctionWhat happens on theslope?

Preventing prematureconvergence

Evolutionary Strategies

ES: Increasing modelflexibility

CMA-ES

CMA-ES: Summary

Real-valued EDAs:Summary

P. Pošík c© 2007 Soft Computing – 17 / 23

Population statistics in generation t:

µt = µ0 + σ0 · d(τ) · ∑ti=1

c(τ)i−1

σt = σ0 ·√

c(τ)t

Convergence of population statistics:

limt→∞

µt = µ0 + σ0 · d(τ) · 1

1−√

c(τ)

limt→∞

σt = 0

Geometric series

The distance the population can “travel” in this algorithm is bounded!

Premature convergence!

Page 38: GoBack - Department of Cyberneticslabe.felk.cvut.cz/~posik/xe33scp/xe33scp-eda2.pdf · Finite number of elements 0000 1000 0100 1100 ... By analogy to discrete EDAs... No Interactions

Preventing premature convergence

Last week. . .

Features of continuousspaces

Real-valued EDAs

Optimization usingGaussian distributionCase study: QuadraticfunctionWhat happens on theslope?

Preventing prematureconvergence

Evolutionary Strategies

ES: Increasing modelflexibility

CMA-ES

CMA-ES: Summary

Real-valued EDAs:Summary

P. Pošík c© 2007 Soft Computing – 18 / 23

Artificially enlarge the ML estimate of variance:

✔ keep variance at values greater than a specified threshold

✔ use self-adaptation (let the variance be part of the chromosome)

✔ adaptive variance scaling when population is on the slope, ML estimate ofvariance when population is in the valley

✔ . . .

Conclusions:

✔ Maximum likelihood estimates are suitable in situations when model fits thefitness function well (at least in local neighborhood)

✘ Gaussian distribution is suitable in the neighborhood of optimum

✘ Gaussian distribution is not suitable on the slope of fitness function!

Page 39: GoBack - Department of Cyberneticslabe.felk.cvut.cz/~posik/xe33scp/xe33scp-eda2.pdf · Finite number of elements 0000 1000 0100 1100 ... By analogy to discrete EDAs... No Interactions

Evolutionary Strategies

Last week. . .

Features of continuousspaces

Real-valued EDAs

Optimization usingGaussian distributionCase study: QuadraticfunctionWhat happens on theslope?

Preventing prematureconvergence

Evolutionary Strategies

ES: Increasing modelflexibility

CMA-ES

CMA-ES: Summary

Real-valued EDAs:Summary

P. Pošík c© 2007 Soft Computing – 19 / 23

✔ (µ, λ)-ES or (µ + λ)-ES (µ parents, λ offspring)

✘ (µ, λ)-ES: offspring completely replace parents

✘ (µ + λ)-ES: parents are joined with offspring, both fight for survival

✔ offspring individuals are created using mutation as

x′ = x + ND(0, σ),

where x is parent, x′ is offspring, and ND(0, σ) is D-dimensional isotropicGaussian distribution

Increasing flexibility of ES: σ is not constant during ES run

✔ decrease σ deterministicaly

✔ feedback control of σ ( 15 rule)

✔ autoadaptation of σ

✘ σ is part of chromosome

σ′ = σ · exp(N1(0, ∆σ))

x′ = x + ND(0, σ′)

Page 40: GoBack - Department of Cyberneticslabe.felk.cvut.cz/~posik/xe33scp/xe33scp-eda2.pdf · Finite number of elements 0000 1000 0100 1100 ... By analogy to discrete EDAs... No Interactions

ES: Increasing model flexibility

Last week. . .

Features of continuousspaces

Real-valued EDAs

Optimization usingGaussian distributionCase study: QuadraticfunctionWhat happens on theslope?

Preventing prematureconvergence

Evolutionary Strategies

ES: Increasing modelflexibility

CMA-ES

CMA-ES: Summary

Real-valued EDAs:Summary

P. Pošík c© 2007 Soft Computing – 20 / 23

Use different σ in all dimensions

✔ Gaussiam with diagonal covariance matrix

σ = (σ1, σ2, . . . , σD)

x′ = x + ND(0, I · σ)

✔ Gaussian with full covariance matrix

x′ = x + ND(0, C)

✔ Autoadaptation of σ and C

✔ Changes in the covariance structure are still very random!

Page 41: GoBack - Department of Cyberneticslabe.felk.cvut.cz/~posik/xe33scp/xe33scp-eda2.pdf · Finite number of elements 0000 1000 0100 1100 ... By analogy to discrete EDAs... No Interactions

CMA-ES

Last week. . .

Features of continuousspaces

Real-valued EDAs

Optimization usingGaussian distributionCase study: QuadraticfunctionWhat happens on theslope?

Preventing prematureconvergence

Evolutionary Strategies

ES: Increasing modelflexibility

CMA-ES

CMA-ES: Summary

Real-valued EDAs:Summary

P. Pošík c© 2007 Soft Computing – 21 / 23

✔ Derandomized evolutionary strategy

✔ (1, λ)-ES with covariance matrix adaptation:

1. Generate λ offspring

xt = µt + ND(0, σt · Ct)

µt ∈ RD , σt ∈ R+, Ct ∈ RD×D

2. Based on the offspring, adapt the model parameters:

Adapt the Gaussian center µ and covariance matrix C using maximumlikelihood estimation:

µt+1 = arg maxµt+1

P(xtsel|µt+1), µt+1 = x̄t

sel

Ct+1 = arg maxCt+1

P

(

xtsel − µt

σt|Ct+1

)

, Ct+1 = Cov

(

xtsel − µt

σt

)

Adapt the global step size σ so that two consecutive steps, µt → µt+1 andµt+1 → µt+2, are conjugated, i.e. conceptually

(

µt+2 − µt+1)

× C−1 ×(

µt+1 − µt

(σt+1)2

)

= 0

Page 42: GoBack - Department of Cyberneticslabe.felk.cvut.cz/~posik/xe33scp/xe33scp-eda2.pdf · Finite number of elements 0000 1000 0100 1100 ... By analogy to discrete EDAs... No Interactions

CMA-ES: Summary

Last week. . .

Features of continuousspaces

Real-valued EDAs

Optimization usingGaussian distributionCase study: QuadraticfunctionWhat happens on theslope?

Preventing prematureconvergence

Evolutionary Strategies

ES: Increasing modelflexibility

CMA-ES

CMA-ES: Summary

Real-valued EDAs:Summary

P. Pošík c© 2007 Soft Computing – 22 / 23

✔ CMA-ES has its roots in ES, but seems to be an instance of EDA (lerning ofprobabilistic model)

✔ behaves like local optimizer, but often does not get stuck in local optimum

✔ state-of-the-art in real-valued black-box ooptimization, its advantages aresignificant already in spaces of 5–10 dimensions

✔ it was used to solve many real-world problems (tuning of electronic filters,aerodynamic and hydrodynamic design, etc.)

Page 43: GoBack - Department of Cyberneticslabe.felk.cvut.cz/~posik/xe33scp/xe33scp-eda2.pdf · Finite number of elements 0000 1000 0100 1100 ... By analogy to discrete EDAs... No Interactions

Real-valued EDAs: Summary

Last week. . .

Features of continuousspaces

Real-valued EDAs

Optimization usingGaussian distributionCase study: QuadraticfunctionWhat happens on theslope?

Preventing prematureconvergence

Evolutionary Strategies

ES: Increasing modelflexibility

CMA-ES

CMA-ES: Summary

Real-valued EDAs:Summary

P. Pošík c© 2007 Soft Computing – 23 / 23

✔ Much less developed than EDAs for binary representation

✔ The difficulties are caused mainly by

✘ much severe effects of the curse of dimensionality

✘ many different types of interactions among variables

✔ Despite of that, EDA (and EAs generally) are able to gain better results thenconventional optimization techniques (line search, Nelder-Mead search, . . . )