Analysis of Epistasis Correlation on NK Landscapes with Nearest Neighbor Interactions Martin Pelikan Missouri Estimation of Distribution Algorithms Laboratory (MEDAL) University of Missouri, St. Louis, MO http://medal.cs.umsl.edu/ [email protected]Download MEDAL Report No. 2011002 http://medal.cs.umsl.edu/files/2011002.pdf Martin Pelikan Analysis of Epistasis Correlation on NK Landscapes
17
Embed
Analysis of Epistasis Correlation on NK Landscapes with Nearest Neighbor Interactions
Epistasis correlation is a measure that estimates the strength of interactions between problem variables. This paper presents an empirical study of epistasis correlation on a large number of random problem instances of NK landscapes with nearest neighbor interactions. The results are analyzed with respect to the performance of hybrid variants of two evolutionary algorithms: (1) the genetic algorithm with uniform crossover and (2) the hierarchical Bayesian optimization algorithm. http://medal.cs.umsl.edu/files/2011002.pdf
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Analysis of Epistasis Correlation on NKLandscapes with Nearest Neighbor Interactions
Martin PelikanMissouri Estimation of Distribution Algorithms Laboratory (MEDAL)
University of Missouri, St. Louis, MOhttp://medal.cs.umsl.edu/
I Important for understanding and estimating problem difficulty.I Should be useful in designing, chosing and setting up
optimization algorithms.I Most past work considers few isolated instances.
This study
I Focuses on measures of epistasis (variable interactions).I Analyzes epistasis measures on a large number of instances of
nearest-neighbor NK landscapes.I Compares the measures with actual performance of hybrid GA.I Complements last year’s GECCO paper on other measures.
Martin Pelikan Analysis of Epistasis Correlation on NK Landscapes
Outline
1. Epistasis.
2. Epistasis variance and epistasis correlation.
3. NK landscapes.
4. Experiments.
5. Conclusions and future work.
Martin Pelikan Analysis of Epistasis Correlation on NK Landscapes
Epistasis
Epistasis
I Epistasis refers to interactions between problem variables.I Effects of one variable depend on values of other variable(s).I In biology phenotype mapping of a gene is affected by another.
Why should we care?
I Absence of epistasis indicates a simple, linear problem.I Epistasis may make a problem more difficult.
Martin Pelikan Analysis of Epistasis Correlation on NK Landscapes
Critical View on Epistasis
Criticism
I Epistasis is of little use unless we understand its nature.I There exist many easy problems with high epistasis.I There exist many hard problems with little epistasis.
I Epistasis is difficult to measure using finite samples.
Examples
I Epistasis in a difficult problemI Needle in a haystack.I Deceptive problem.
I Epistasis in a simple problemI Onemax with additional contribution of optimum (simple).
Martin Pelikan Analysis of Epistasis Correlation on NK Landscapes
Linear Fitness Approximation
Linear fitness approximation
I Assume candidate solutions are n-bit binary srings.I Assume population P of N solutions.I Pi(vi) denotes solutions with vi ∈ {0, 1} in position i.I Ni(vi) is the number of solutions in Pi(vi).I fi(vi) approximates contribution of vi to fitness
fi(vi) =1
Ni(vi)
∑x∈Pi(vi)
f(x) − f(P )
I Approximate fitness as follows
flin(X1, X2, . . . , Xn) =n∑
i=1
fi(Xi) + f(P ).
Martin Pelikan Analysis of Epistasis Correlation on NK Landscapes
Epistasis Variance
Epistasis variance (Davidor, 1990)
I In short: Sum of square differences between f and flin.I Epistasis variance ξP (f) is defined as
ξP (f) =
√1
N
∑x∈P
(f(x) − flin(x))2
Martin Pelikan Analysis of Epistasis Correlation on NK Landscapes
Epistasis Correlation
Epistasis correlation (Rochet et al., 1997)
I In short: Correlation coefficient between f and flin.I Sum of square differences between f and its average f(P )
sP (f) =∑x∈P
(f(x) − f(P )
)2
I Sum of square differences between flin and its average flin(P )
sP (flin) =∑x∈P
(flin(x) − flin(P )
)2
I Epistasis correlation ξP (f) is defined as
epicP (f) =
∑x∈P
(f(x) − f(P )
) (flin(P ) − flin(P )
)√sP (f)sP (flin)
Martin Pelikan Analysis of Epistasis Correlation on NK Landscapes
Evaluating Epistasis Measures
Epistasis variance
I Not invariant w.r.t. linear transformations of f .I Not within a fixed range of values.I Smaller epistasis variance indicates weaker epistasis.
Epistasis correlation
I Inviariant w.r.t. linear transformations of f .I Value is within range [0, 1].I Greater epistasis correlation indicates weaker epistasis.
Martin Pelikan Analysis of Epistasis Correlation on NK Landscapes
Experiments: Algorithms
Genetic algorithm (Holland, 1975)
I Uniform crossover.I Bit-flip mutation.I Tournament selection.I Restricted tournaments for niching.I Steepest ascent hill climber for local search.
Hierarchical BOA (Pelikan et al., 2001)
I Variation by learning and sampling Bayesian networks withdecision trees.
I Tournament selection.I Restricted tournaments for niching.I Steepest ascent hill climber for local search.
Martin Pelikan Analysis of Epistasis Correlation on NK Landscapes
Experiments: Problems
NK landscapes with nearest neighbors
I Defined on n-bit binary strings.I Fitness is sum of n subproblems of order k + 1.I Subproblem i uses ith variable and the following k variables.I Neighborhoods wrap around (as on a circle).I Subproblems defined as lookup tables generated from [0, 1).
Martin Pelikan Analysis of Epistasis Correlation on NK Landscapes
Experiments: Problems
NK parameters
I k ∈ {2, 3, 4, 5, 6}I n ∈ {20, 30, 40, 50, 60, 70, 80, 90, 100}I For each (n, k), we use 10,000 instances.
Difficulty of nearest-neighbor NK landscapes
I Difficulty grows with k.
I Polynomially solvable using dynamic programming.
I For larger n and k, hBOA outperforms GA.
Martin Pelikan Analysis of Epistasis Correlation on NK Landscapes
Results: Scatter Plot for hBOA
I Epistasis correlation decreases with k (expected).I For any k, epistasis correlation does not seem to closely
correspond to the actual problem difficulty.
Martin Pelikan Analysis of Epistasis Correlation on NK Landscapes
Results: Epistasis Correlation vs. n and k for hBOA
20 40 60 80 100 1200
0.10.20.30.40.50.60.70.80.9
1
Number of bits, n
Epi
stas
is c
orre
latio
n
k=2k=3k=4k=5k=6
(a) Epistasis correlation with respectto n.
2 3 4 5 60
0.10.20.30.40.50.60.70.80.9
1
Number of neighbors, k
Epi
stas
is c
orre
latio
n
Avg. epistasis corr., n=100
(b) Epistasis correlation with respectto k.
Figure 3: Epistasis correlation with respect to the number n of bits and the number k of neighbors ofnearest-neighbor NK landscapes.
an increased level of epistasis. In fact, for GA, the resultsare in agreement with our understanding of epistasis andproblem difficulty even for larger values of k, although thedifferences between the values of epistasis in different sub-sets decrease with k.
The differences between the results for hBOA and GA con-firm that the effect of epistasis should be weaker for hBOAthan for GA because hBOA can deal with epistasis betterthan conventional GAs by detecting and using interactionsbetween problem variables. The differences are certainlysmall, but so are the differences between the epistasis corre-lation values between the subsets of problems that are evenorders of magnitude different in terms of the computationaltime. The differences between a conventional GA with nolinkage learning and one of the most advanced EDAs areamong the most interesting results in this paper.
5. SUMMARY AND CONCLUSIONSThis paper discussed epistasis and its relationship with
problem difficulty. To measure epistasis, epistasis correla-tion was used. The empirical analysis considered hybridsof two qualitatively different evolutionary algorithms anda large number of instances of nearest-neighbor NK land-scapes.
The use of epistasis correlation in assessing problem diffi-culty has received a lot of criticism [23, 35]. The main reasonfor this is that although the absence of epistasis does implythat a problem is easy, the presence of epistasis does notnecessarily imply that the problem is difficult. Nonetheless,given our current understanding of problem difficulty, thereis no doubt that introducing epistasis increases the potential
of a problem to be difficult.This paper indicated that for randomly generated NK
landscapes with nearest-neighbor interactions, epistasis cor-relation correctly captures the fact that the problem in-stances become more difficult as the order of interactions(number of neighbors) increases. Additionally, the resultsconfirmed that for a fixed problem size and order of inter-actions, sets of more difficult problem instances have lowervalues of epistasis correlation (and, thus, stronger epistasis).The results indicated also that evolutionary algorithms ca-pable of linkage learning are less sensitive to epistasis thanconventional evolutionary algorithms.
The bad news is that the results confirmed that epistasis
correlation does not provide a single input for the practi-tioner to assess problem difficulty, even if we assume thatthe problem size and the order of interactions are fixed, andall instances are generated from the same distribution. Inmany cases, simple problems included strong epistasis andhard problems included weak epistasis. A similar observa-tion has been made in ref. [25] for the correlation lengthand the fitness distance correlation. However, compared tothese other popular measures of problem difficulty, epista-sis correlation belongs to one of the more accurate ones,at least for the class of randomly generated NK landscapeswith nearest-neighbor interactions.
One of the important topics of future work would be tocompile some of the past results in analysis of various mea-sures of problem difficulty with the results presented here,and explore the ways in which different measures of problemdifficulty can be combined to provide the practitioner a bet-ter indication of what problem instances are more difficultand what problem instances are easier. The experimentalstudy presented in this paper should also be extended toother classes of problems, especially those that allow one togenerate a large set of random problem instances. Classesof spin glass optimization problems and graph problems aregood candidates for these efforts.
Acknowledgments
6. REFERENCES[1] S. Baluja. Population-based incremental learning: A
method for integrating genetic search based functionoptimization and competitive learning. Tech. Rep. No.CMU-CS-94-163, Carnegie Mellon University, Pittsburgh,PA, 1994.
[2] P. A. N. Bosman and D. Thierens. Continuous iterateddensity estimation evolutionary algorithms within theIDEA framework. Workshop Proc. of the Genetic and Evol.Comp. Conf. (GECCO-2000), pages 197–200, 2000.
20 40 60 80 100 1200
0.10.20.30.40.50.60.70.80.9
1
Number of bits, n
Epi
stas
is c
orre
latio
n
k=2k=3k=4k=5k=6
(a) Epistasis correlation with respectto n.
2 3 4 5 60
0.10.20.30.40.50.60.70.80.9
1
Number of neighbors, k
Epi
stas
is c
orre
latio
n
Avg. epistasis corr., n=100
(b) Epistasis correlation with respectto k.
Figure 3: Epistasis correlation with respect to the number n of bits and the number k of neighbors ofnearest-neighbor NK landscapes.
an increased level of epistasis. In fact, for GA, the resultsare in agreement with our understanding of epistasis andproblem difficulty even for larger values of k, although thedifferences between the values of epistasis in different sub-sets decrease with k.
The differences between the results for hBOA and GA con-firm that the effect of epistasis should be weaker for hBOAthan for GA because hBOA can deal with epistasis betterthan conventional GAs by detecting and using interactionsbetween problem variables. The differences are certainlysmall, but so are the differences between the epistasis corre-lation values between the subsets of problems that are evenorders of magnitude different in terms of the computationaltime. The differences between a conventional GA with nolinkage learning and one of the most advanced EDAs areamong the most interesting results in this paper.
5. SUMMARY AND CONCLUSIONSThis paper discussed epistasis and its relationship with
problem difficulty. To measure epistasis, epistasis correla-tion was used. The empirical analysis considered hybridsof two qualitatively different evolutionary algorithms anda large number of instances of nearest-neighbor NK land-scapes.
The use of epistasis correlation in assessing problem diffi-culty has received a lot of criticism [23, 35]. The main reasonfor this is that although the absence of epistasis does implythat a problem is easy, the presence of epistasis does notnecessarily imply that the problem is difficult. Nonetheless,given our current understanding of problem difficulty, thereis no doubt that introducing epistasis increases the potential
of a problem to be difficult.This paper indicated that for randomly generated NK
landscapes with nearest-neighbor interactions, epistasis cor-relation correctly captures the fact that the problem in-stances become more difficult as the order of interactions(number of neighbors) increases. Additionally, the resultsconfirmed that for a fixed problem size and order of inter-actions, sets of more difficult problem instances have lowervalues of epistasis correlation (and, thus, stronger epistasis).The results indicated also that evolutionary algorithms ca-pable of linkage learning are less sensitive to epistasis thanconventional evolutionary algorithms.
The bad news is that the results confirmed that epistasis
correlation does not provide a single input for the practi-tioner to assess problem difficulty, even if we assume thatthe problem size and the order of interactions are fixed, andall instances are generated from the same distribution. Inmany cases, simple problems included strong epistasis andhard problems included weak epistasis. A similar observa-tion has been made in ref. [25] for the correlation lengthand the fitness distance correlation. However, compared tothese other popular measures of problem difficulty, epista-sis correlation belongs to one of the more accurate ones,at least for the class of randomly generated NK landscapeswith nearest-neighbor interactions.
One of the important topics of future work would be tocompile some of the past results in analysis of various mea-sures of problem difficulty with the results presented here,and explore the ways in which different measures of problemdifficulty can be combined to provide the practitioner a bet-ter indication of what problem instances are more difficultand what problem instances are easier. The experimentalstudy presented in this paper should also be extended toother classes of problems, especially those that allow one togenerate a large set of random problem instances. Classesof spin glass optimization problems and graph problems aregood candidates for these efforts.
Acknowledgments
6. REFERENCES[1] S. Baluja. Population-based incremental learning: A
method for integrating genetic search based functionoptimization and competitive learning. Tech. Rep. No.CMU-CS-94-163, Carnegie Mellon University, Pittsburgh,PA, 1994.
[2] P. A. N. Bosman and D. Thierens. Continuous iterateddensity estimation evolutionary algorithms within theIDEA framework. Workshop Proc. of the Genetic and Evol.Comp. Conf. (GECCO-2000), pages 197–200, 2000.
I Epistasis correlation does not change with n.I Epistasis correlation increases with k.
Martin Pelikan Analysis of Epistasis Correlation on NK Landscapes
Results: Problem Difficulty and Epistasis Correlation
Table 1: Epistasis correlation for easy and hard in-stances for hBOA. The difficulty of instances is mea-sured by the overall number of steps of the localsearcher.
Table 2: Epistasis correlation for easy and hard in-stances for GA with uniform crossover. The diffi-culty of instances is measured by the overall numberof steps of the local searcher.
Table 1: Epistasis correlation for easy and hard in-stances for hBOA. The difficulty of instances is mea-sured by the overall number of steps of the localsearcher.
Table 2: Epistasis correlation for easy and hard in-stances for GA with uniform crossover. The diffi-culty of instances is measured by the overall numberof steps of the local searcher.
I For fixed n and k, epistasis correlation changes only a little.I Epistasis is stronger for more difficult problems, but the
differences are nearly negligible.
Martin Pelikan Analysis of Epistasis Correlation on NK Landscapes
Conclusions and Future Work
Conclusions
I For NK landscapes, epistasis correlation is certainly not useless, it providedsome input on problem difficulty of NK landscapes.
I Epistasis correlation succeeded in providing a clear indication that problemdifficulty increases with k.
I Epistasis correlation failed to capture the increase of problem difficulty withproblem size.
I Epistasis correlation failed to provide a clear indication of problem difficultyfor a fixed n and k.
Future work
I Compare different measures of problem difficulty.I Identify problem features that these measures do not capture.I Create new problem difficulty measures that provide better
input for optimization practitioners.I Key goals of these efforts:
I Tune algorithm to problem (parameters, operators).I Choose best optimization algorithm.I Drive design of new optimization algorithms.
Martin Pelikan Analysis of Epistasis Correlation on NK Landscapes
Acknowledgments
Acknowledgments
I NSF; NSF CAREER grant ECS-0547013.
I University of Missouri; High Performance ComputingCollaboratory sponsored by Information Technology Services;Research Award; Research Board.
Martin Pelikan Analysis of Epistasis Correlation on NK Landscapes