Analysis of Epistasis Correlation on NK Landscapes with Nearest Neighbor Interactions

Analysis of Epistasis Correlation on NKLandscapes with Nearest Neighbor Interactions

Martin PelikanMissouri Estimation of Distribution Algorithms Laboratory (MEDAL)

University of Missouri, St. Louis, MOhttp://medal.cs.umsl.edu/

[email protected]

Download MEDAL Report No. 2011002

http://medal.cs.umsl.edu/files/2011002.pdf

Martin Pelikan Analysis of Epistasis Correlation on NK Landscapes

http://medal.cs.umsl.edu/

http://medal.cs.umsl.edu/files/2011002.pdf

Motivation

Problem difficulty measures

I Important for understanding and estimating problem difficulty.I Should be useful in designing, chosing and setting up

optimization algorithms.I Most past work considers few isolated instances.

This study

I Focuses on measures of epistasis (variable interactions).I Analyzes epistasis measures on a large number of instances of

nearest-neighbor NK landscapes.I Compares the measures with actual performance of hybrid GA.I Complements last year’s GECCO paper on other measures.


Outline

1. Epistasis.

2. Epistasis variance and epistasis correlation.

3. NK landscapes.

4. Experiments.

5. Conclusions and future work.


Epistasis

Epistasis

I Epistasis refers to interactions between problem variables.I Effects of one variable depend on values of other variable(s).I In biology phenotype mapping of a gene is affected by another.

Why should we care?

I Absence of epistasis indicates a simple, linear problem.I Epistasis may make a problem more difficult.


Critical View on Epistasis

Criticism

I Epistasis is of little use unless we understand its nature.I There exist many easy problems with high epistasis.I There exist many hard problems with little epistasis.

I Epistasis is difficult to measure using finite samples.

Examples

I Epistasis in a difficult problemI Needle in a haystack.I Deceptive problem.

I Epistasis in a simple problemI Onemax with additional contribution of optimum (simple).


Linear Fitness Approximation

Linear fitness approximation

I Assume candidate solutions are n-bit binary srings.I Assume population P of N solutions.I Pi(vi) denotes solutions with vi ∈ {0, 1} in position i.I Ni(vi) is the number of solutions in Pi(vi).I fi(vi) approximates contribution of vi to fitness

fi(vi) =1

Ni(vi)

∑x∈Pi(vi)

f(x) − f(P )

I Approximate fitness as follows

flin(X1, X2, . . . , Xn) =n∑

i=1

fi(Xi) + f(P ).


Epistasis Variance

Epistasis variance (Davidor, 1990)

I In short: Sum of square differences between f and flin.I Epistasis variance ξP (f) is defined as

ξP (f) =

√1

N

∑x∈P

(f(x) − flin(x))2


Epistasis Correlation

Epistasis correlation (Rochet et al., 1997)

I In short: Correlation coefficient between f and flin.I Sum of square differences between f and its average f(P )

sP (f) =∑x∈P

(f(x) − f(P )

)2

I Sum of square differences between flin and its average flin(P )

sP (flin) =∑x∈P

(flin(x) − flin(P )

)2

I Epistasis correlation ξP (f) is defined as

epicP (f) =

∑x∈P

(f(x) − f(P )

) (flin(P ) − flin(P )

)√sP (f)sP (flin)


Evaluating Epistasis Measures

Epistasis variance

I Not invariant w.r.t. linear transformations of f .I Not within a fixed range of values.I Smaller epistasis variance indicates weaker epistasis.

Epistasis correlation

I Inviariant w.r.t. linear transformations of f .I Value is within range [0, 1].I Greater epistasis correlation indicates weaker epistasis.


Experiments: Algorithms

Genetic algorithm (Holland, 1975)

I Uniform crossover.I Bit-flip mutation.I Tournament selection.I Restricted tournaments for niching.I Steepest ascent hill climber for local search.

Hierarchical BOA (Pelikan et al., 2001)

I Variation by learning and sampling Bayesian networks withdecision trees.

I Tournament selection.I Restricted tournaments for niching.I Steepest ascent hill climber for local search.


Experiments: Problems

NK landscapes with nearest neighbors

I Defined on n-bit binary strings.I Fitness is sum of n subproblems of order k + 1.I Subproblem i uses ith variable and the following k variables.I Neighborhoods wrap around (as on a circle).I Subproblems defined as lookup tables generated from [0, 1).

Example for n = 6 and k = 2

f(X1, . . . , X6) = f1(X1, X2, X3)+= f2(X2, X3, X4)+= f3(X3, X4, X5)+= f4(X4, X5, X6)+= f5(X5, X6, X1)+= f6(X6, X1, X2)

X1 X2 X3 f1(·)0 0 0 0.510 0 1 0.180 1 0 0.970 1 1 0.681 0 0 0.471 0 1 0.731 1 0 0.061 1 1 0.41

. . .


Experiments: Problems

NK parameters

I k ∈ {2, 3, 4, 5, 6}I n ∈ {20, 30, 40, 50, 60, 70, 80, 90, 100}I For each (n, k), we use 10,000 instances.

Difficulty of nearest-neighbor NK landscapes

I Difficulty grows with k.

I Polynomially solvable using dynamic programming.

I For larger n and k, hBOA outperforms GA.


Results: Scatter Plot for hBOA

I Epistasis correlation decreases with k (expected).I For any k, epistasis correlation does not seem to closely

correspond to the actual problem difficulty.


Results: Epistasis Correlation vs. n and k for hBOA

20 40 60 80 100 1200

0.10.20.30.40.50.60.70.80.9

1

Number of bits, n

Epi

stas

is c

orre

latio

n

k=2k=3k=4k=5k=6

(a) Epistasis correlation with respectto n.

2 3 4 5 60

0.10.20.30.40.50.60.70.80.9

1

Number of neighbors, k

Epi

stas

is c

orre

latio

n

Avg. epistasis corr., n=100

(b) Epistasis correlation with respectto k.

Figure 3: Epistasis correlation with respect to the number n of bits and the number k of neighbors ofnearest-neighbor NK landscapes.

an increased level of epistasis. In fact, for GA, the resultsare in agreement with our understanding of epistasis andproblem difficulty even for larger values of k, although thedifferences between the values of epistasis in different sub-sets decrease with k.

The differences between the results for hBOA and GA con-firm that the effect of epistasis should be weaker for hBOAthan for GA because hBOA can deal with epistasis betterthan conventional GAs by detecting and using interactionsbetween problem variables. The differences are certainlysmall, but so are the differences between the epistasis corre-lation values between the subsets of problems that are evenorders of magnitude different in terms of the computationaltime. The differences between a conventional GA with nolinkage learning and one of the most advanced EDAs areamong the most interesting results in this paper.

5. SUMMARY AND CONCLUSIONSThis paper discussed epistasis and its relationship with

problem difficulty. To measure epistasis, epistasis correla-tion was used. The empirical analysis considered hybridsof two qualitatively different evolutionary algorithms anda large number of instances of nearest-neighbor NK land-scapes.

The use of epistasis correlation in assessing problem diffi-culty has received a lot of criticism [23, 35]. The main reasonfor this is that although the absence of epistasis does implythat a problem is easy, the presence of epistasis does notnecessarily imply that the problem is difficult. Nonetheless,given our current understanding of problem difficulty, thereis no doubt that introducing epistasis increases the potential

of a problem to be difficult.This paper indicated that for randomly generated NK

landscapes with nearest-neighbor interactions, epistasis cor-relation correctly captures the fact that the problem in-stances become more difficult as the order of interactions(number of neighbors) increases. Additionally, the resultsconfirmed that for a fixed problem size and order of inter-actions, sets of more difficult problem instances have lowervalues of epistasis correlation (and, thus, stronger epistasis).The results indicated also that evolutionary algorithms ca-pable of linkage learning are less sensitive to epistasis thanconventional evolutionary algorithms.

The bad news is that the results confirmed that epistasis

correlation does not provide a single input for the practi-tioner to assess problem difficulty, even if we assume thatthe problem size and the order of interactions are fixed, andall instances are generated from the same distribution. Inmany cases, simple problems included strong epistasis andhard problems included weak epistasis. A similar observa-tion has been made in ref. [25] for the correlation lengthand the fitness distance correlation. However, compared tothese other popular measures of problem difficulty, epista-sis correlation belongs to one of the more accurate ones,at least for the class of randomly generated NK landscapeswith nearest-neighbor interactions.

One of the important topics of future work would be tocompile some of the past results in analysis of various mea-sures of problem difficulty with the results presented here,and explore the ways in which different measures of problemdifficulty can be combined to provide the practitioner a bet-ter indication of what problem instances are more difficultand what problem instances are easier. The experimentalstudy presented in this paper should also be extended toother classes of problems, especially those that allow one togenerate a large set of random problem instances. Classesof spin glass optimization problems and graph problems aregood candidates for these efforts.

Acknowledgments

6. REFERENCES[1] S. Baluja. Population-based incremental learning: A

method for integrating genetic search based functionoptimization and competitive learning. Tech. Rep. No.CMU-CS-94-163, Carnegie Mellon University, Pittsburgh,PA, 1994.

[2] P. A. N. Bosman and D. Thierens. Continuous iterateddensity estimation evolutionary algorithms within theIDEA framework. Workshop Proc. of the Genetic and Evol.Comp. Conf. (GECCO-2000), pages 197–200, 2000.

20 40 60 80 100 1200

0.10.20.30.40.50.60.70.80.9

1

Number of bits, n

Epi

stas

is c

orre

latio

n

k=2k=3k=4k=5k=6

(a) Epistasis correlation with respectto n.

2 3 4 5 60

0.10.20.30.40.50.60.70.80.9

1

Number of neighbors, k

Epi

stas

is c

orre

latio

n

Avg. epistasis corr., n=100

(b) Epistasis correlation with respectto k.

Figure 3: Epistasis correlation with respect to the number n of bits and the number k of neighbors ofnearest-neighbor NK landscapes.

an increased level of epistasis. In fact, for GA, the resultsare in agreement with our understanding of epistasis andproblem difficulty even for larger values of k, although thedifferences between the values of epistasis in different sub-sets decrease with k.

The differences between the results for hBOA and GA con-firm that the effect of epistasis should be weaker for hBOAthan for GA because hBOA can deal with epistasis betterthan conventional GAs by detecting and using interactionsbetween problem variables. The differences are certainlysmall, but so are the differences between the epistasis corre-lation values between the subsets of problems that are evenorders of magnitude different in terms of the computationaltime. The differences between a conventional GA with nolinkage learning and one of the most advanced EDAs areamong the most interesting results in this paper.

5. SUMMARY AND CONCLUSIONSThis paper discussed epistasis and its relationship with

problem difficulty. To measure epistasis, epistasis correla-tion was used. The empirical analysis considered hybridsof two qualitatively different evolutionary algorithms anda large number of instances of nearest-neighbor NK land-scapes.

The use of epistasis correlation in assessing problem diffi-culty has received a lot of criticism [23, 35]. The main reasonfor this is that although the absence of epistasis does implythat a problem is easy, the presence of epistasis does notnecessarily imply that the problem is difficult. Nonetheless,given our current understanding of problem difficulty, thereis no doubt that introducing epistasis increases the potential

of a problem to be difficult.This paper indicated that for randomly generated NK

landscapes with nearest-neighbor interactions, epistasis cor-relation correctly captures the fact that the problem in-stances become more difficult as the order of interactions(number of neighbors) increases. Additionally, the resultsconfirmed that for a fixed problem size and order of inter-actions, sets of more difficult problem instances have lowervalues of epistasis correlation (and, thus, stronger epistasis).The results indicated also that evolutionary algorithms ca-pable of linkage learning are less sensitive to epistasis thanconventional evolutionary algorithms.

The bad news is that the results confirmed that epistasis

correlation does not provide a single input for the practi-tioner to assess problem difficulty, even if we assume thatthe problem size and the order of interactions are fixed, andall instances are generated from the same distribution. Inmany cases, simple problems included strong epistasis andhard problems included weak epistasis. A similar observa-tion has been made in ref. [25] for the correlation lengthand the fitness distance correlation. However, compared tothese other popular measures of problem difficulty, epista-sis correlation belongs to one of the more accurate ones,at least for the class of randomly generated NK landscapeswith nearest-neighbor interactions.

One of the important topics of future work would be tocompile some of the past results in analysis of various mea-sures of problem difficulty with the results presented here,and explore the ways in which different measures of problemdifficulty can be combined to provide the practitioner a bet-ter indication of what problem instances are more difficultand what problem instances are easier. The experimentalstudy presented in this paper should also be extended toother classes of problems, especially those that allow one togenerate a large set of random problem instances. Classesof spin glass optimization problems and graph problems aregood candidates for these efforts.

Acknowledgments

6. REFERENCES[1] S. Baluja. Population-based incremental learning: A

method for integrating genetic search based functionoptimization and competitive learning. Tech. Rep. No.CMU-CS-94-163, Carnegie Mellon University, Pittsburgh,PA, 1994.

[2] P. A. N. Bosman and D. Thierens. Continuous iterateddensity estimation evolutionary algorithms within theIDEA framework. Workshop Proc. of the Genetic and Evol.Comp. Conf. (GECCO-2000), pages 197–200, 2000.

I Epistasis correlation does not change with n.I Epistasis correlation increases with k.


Results: Problem Difficulty and Epistasis Correlation

GA for n = 80 and k = 6:

(a) hBOA, n = 100, k = 2

desc. of DHC steps until epistasisinstances optimum correlation10% easiest 3330.9 (163.9) 0.6645 (0.030)25% easiest 3550.2 (217.0) 0.6608 (0.030)50% easiest 3758.6 (265.2) 0.6580 (0.030)all instances 4436.2 (1019.5) 0.6534 (0.031)50% hardest 5113.8 (1044.2) 0.6487 (0.031)25% hardest 5805.5 (1089.4) 0.6466 (0.031)10% hardest 6767.6 (1152.3) 0.6447 (0.032)

(b) hBOA, n = 100, k = 3


(c) hBOA, n = 100, k = 4


(d) hBOA, n = 100, k = 5


(e) hBOA, n = 100, k = 6


Table 1: Epistasis correlation for easy and hard in-stances for hBOA. The difficulty of instances is mea-sured by the overall number of steps of the localsearcher.

(a) GA (uniform), n = 100, k = 2


(b) GA (uniform), n = 100, k = 3


(c) GA (uniform), n = 100, k = 4


(d) GA (uniform), n = 100, k = 5


(e) GA (uniform), n = 80, k = 6


Table 2: Epistasis correlation for easy and hard in-stances for GA with uniform crossover. The diffi-culty of instances is measured by the overall numberof steps of the local searcher.

hBOA for n = 100 and k = 6:

(a) hBOA, n = 100, k = 2


(b) hBOA, n = 100, k = 3


(c) hBOA, n = 100, k = 4


(d) hBOA, n = 100, k = 5


(e) hBOA, n = 100, k = 6


Table 1: Epistasis correlation for easy and hard in-stances for hBOA. The difficulty of instances is mea-sured by the overall number of steps of the localsearcher.

(a) GA (uniform), n = 100, k = 2


(b) GA (uniform), n = 100, k = 3


(c) GA (uniform), n = 100, k = 4


(d) GA (uniform), n = 100, k = 5


(e) GA (uniform), n = 80, k = 6


Table 2: Epistasis correlation for easy and hard in-stances for GA with uniform crossover. The diffi-culty of instances is measured by the overall numberof steps of the local searcher.

I For fixed n and k, epistasis correlation changes only a little.I Epistasis is stronger for more difficult problems, but the

differences are nearly negligible.


Conclusions and Future Work

Conclusions

I For NK landscapes, epistasis correlation is certainly not useless, it providedsome input on problem difficulty of NK landscapes.

I Epistasis correlation succeeded in providing a clear indication that problemdifficulty increases with k.

I Epistasis correlation failed to capture the increase of problem difficulty withproblem size.

I Epistasis correlation failed to provide a clear indication of problem difficultyfor a fixed n and k.

Future work

I Compare different measures of problem difficulty.I Identify problem features that these measures do not capture.I Create new problem difficulty measures that provide better

input for optimization practitioners.I Key goals of these efforts:

I Tune algorithm to problem (parameters, operators).I Choose best optimization algorithm.I Drive design of new optimization algorithms.


Acknowledgments

Acknowledgments

I NSF; NSF CAREER grant ECS-0547013.

I University of Missouri; High Performance ComputingCollaboratory sponsored by Information Technology Services;Research Award; Research Board.


Analysis of Epistasis Correlation on NK Landscapes with Nearest Neighbor Interactions

Documents

epistasis epistasis

epistasis criticism

weaker epistasis

little epistasis

high epistasis

examples epistasis

absence of epistasis

hboa epistasis correlation