Evolutionary Computing

Evolutionary ComputingComputer Science 348Dr. T presents

IntroductionThe field of Evolutionary Computing studies the theory and application of Evolutionary Algorithms.

Evolutionary Algorithms can be described as a class of stochastic, population-based local search algorithms inspired by neo-Darwinian Evolution Theory.

Computational Basis

Trial-and-error (aka Generate-and-test)Graduated solution qualityStochastic local search of solution landscape

Biological MetaphorsDarwinian EvolutionMacroscopic view of evolutionNatural selectionSurvival of the fittestRandom variation

Biological Metaphors(Mendelian) GeneticsGenotype (functional unit of inheritance)Genotypes vs. phenotypesPleitropy: one gene affects multiple phenotypic traitsPolygeny: one phenotypic trait is affected by multiple genesChromosomes (haploid vs. diploid)Loci and alleles

More general purpose than traditional optimization algorithms; i.e., less problem specific knowledge requiredAbility to solve difficult problemsSolution availabilityRobustnessInherent parallelism

EA Pros

Fitness function and genetic operators often not obviousPremature convergenceComputationally intensiveDifficult parameter optimizationEA Cons

EA components

Search spaces: representation & sizeEvaluation of trial solutions: fitness functionExploration versus exploitationSelective pressure ratePremature convergence

Nature versus the digital realm

EnvironmentProblem (search space)FitnessFitness functionPopulationSetIndividualDatastructureGenesElementsAllelesDatatype

EA Strategy ParametersPopulation sizeInitialization related parametersSelection related parametersNumber of offspringRecombination chanceMutation chanceMutation rateTermination related parameters

Problem solving stepsCollect problem knowledgeChoose gene representationDesign fitness functionCreation of initial populationParent selectionDecide on genetic operatorsCompetition / survivalChoose termination conditionFind good parameter values

Function optimization problem

Given the function

f(x,y) = x2y + 5xy 3xy2

for what integer values of x and y is f(x,y) minimal?

Solution space: Z x ZTrial solution: (x,y)Gene representation: integerGene initialization: randomFitness function: -f(x,y)Population size: 4Number of offspring: 2Parent selection: exponentialFunction optimization problem

Function optimization problemGenetic operators:1-point crossoverMutation (-1,0,1)Competition:remove the two individuals with the lowest fitness value

f(x,y) = x2y + 5xy - 3xy2

Measuring performanceCase 1: goal unknown or never reachedSolution quality: global average/best population fitness

Case 2: goal known and sometimes reachedOptimal solution reached percentage

Case 3: goal known and always reachedConvergence speed

InitializationUniform randomHeuristic basedKnowledge basedGenotypes from previous runsSeeding

Representation (2.3.1)Genotype spacePhenotype spaceEncoding & DecodingKnapsack Problem (2.4.2)Surjective, injective, and bijective decoder functions

Simple Genetic Algorithm (SGA)

Representation: Bit-stringsRecombination: 1-Point CrossoverMutation: Bit FlipParent Selection: Fitness ProportionalSurvival Selection: Generational

Trace example errataPage 39, line 5, 729 -> 784Table 3.4, x Value, 26 -> 28, 18 -> 20Table 3.4, Fitness:676 -> 784324 -> 4002354 -> 2538588.5 -> 634.5729 -> 784

RepresentationsBit StringsScaling Hamming CliffsBinary vs. Gray coding (Appendix A)IntegersOrdinal vs. cardinal attributesPermutationsAbsolute order vs. adjacencyReal-Valued, etc.Homogeneous vs. heterogeneous

Permutation RepresentationOrder based (e.g., job shop scheduling)Adjacency based (e.g., TSP)

Problem space: [A,B,C,D]Permutation: [3,1,2,4]Mapping 1: [C,A,B,D]Mapping 2: [B,C,A,D]

Mutation vs. Recombination

Mutation = Stochastic unary variation operator

Recombination = Stochastic multi-ary variation operator

MutationBit-String Representation:Bit-FlipE[#flips] = L * pm

Integer Representation:Random Reset (cardinal attributes)Creep Mutation (ordinal attributes)

Mutation cont.Floating-PointUniformNonuniform from fixed distributionGaussian, Cauche, Levy, etc.PermutationSwapInsertScrambleInversion

Permutation MutationSwap MutationInsert MutationScramble MutationInversion Mutation (good for adjacency based problems)

RecombinationRecombination rate: asexual vs. sexualN-Point Crossover (positional bias)Uniform Crossover (distributional bias)Discrete recombination (no new alleles)(Uniform) arithmetic recombinationSimple recombinationSingle arithmetic recombinationWhole arithmetic recombination

Recombination (cont.)Adjacency-based permutationPartially Mapped Crossover (PMX)Edge Crossover

Order-based permutationOrder CrossoverCycle Crossover

Permutation RecombinationAdjacency based problemsPartially Mapped Crossover (PMX)Edge Crossover

Order based problemsOrder CrossoverCycle Crossover

PMXChoose 2 random crossover points & copy mid-segment from p1 to offspringLook for elements in mid-segment of p2 that were not copiedFor each of these (i), look in offspring to see what copied in its place (j)Place i into position occupied by j in p2If place occupied by j in p2 already filled in offspring by k, put i in position occupied by k in p2Rest of offspring filled by copying p2

Order CrossoverChoose 2 random crossover points & copy mid-segment from p1 to offspringStarting from 2nd crossover point in p2, copy unused numbers into offspring in the order they appear in p2, wrapping around at end of list

Population ModelsTwo historical modelsGenerational ModelSteady State ModelGenerational Gap

General modelPopulation sizeMating pool sizeOffspring pool size

Parent selectionFitness Proportional Selection (FPS)High risk of premature convergenceUneven selective pressureFitness function not transposition invariantWindowing, Sigma ScalingRank-Based SelectionMapping function (ala SA cooling schedule)Linear ranking vs. exponential ranking

Sampling methodsRoulette WheelStochastic Universal Sampling (SUS)

Rank based sampling methodsTournament SelectionTournament Size

Survivor selectionAge-basedFitness-basedTruncationElitism

TerminationCPU time / wall timeNumber of fitness evaluationsLack of fitness improvementLack of genetic diversitySolution quality / solution foundCombination of the above

Behavioral observablesSelective pressurePopulation diversityFitness valuesPhenotypesGenotypesAlleles

Report writing tipsUse easily readable fonts, including in tables & graphs (11 pnt fonts are typically best, 10 pnt is the absolute smallest)Number all figures and tables and refer to each and every one in the main text body (hint: use autonumbering)Capitalize named articles (e.g., ``see Table 5'', not ``see table 5'')Keep important figures and tables as close to the referring text as possible, while placing less important ones in an appendixAlways provide standard deviations (typically in between parentheses) when listing averages

Report writing tipsUse descriptive titles, captions on tables and figures so that they are self-explanatoryAlways include axis labels in graphsWrite in a formal style (never use first person, instead say, for instance, ``the author'')Format tabular material in proper tables with grid linesProvide all the required information, but avoid extraneous data (information is good, data is bad)

Evolution Strategies (ES)

Birth year: 1963Birth place: Technical University of Berlin, GermanyParents: Ingo Rechenberg & Hans-Paul Schwefel

ES history & parameter controlTwo-membered ES: (1+1)Original multi-membered ES: (+1)Multi-membered ES: (+), (,)Parameter tuning vs. parameter controlFixed parameter controlRechenbergs 1/5 success ruleSelf-adaptationMutation Step control

Uncorrelated mutation with one Chromosomes: x1,,xn, = exp( N(0,1))xi = xi + N(0,1)Typically the learning rate 1/ nAnd we have a boundary rule < 0 = 0

Mutants with equal likelihoodCircle: mutants having same chance to be created

Mutation case 2:Uncorrelated mutation with n sChromosomes: x1,,xn, 1,, n i = i exp( N(0,1) + Ni (0,1))xi = xi + i Ni (0,1)Two learning rate parmeters: overall learning rate coordinate wise learning rate 1/(2 n) and 1/(2 n) and have individual proportionality constants which both have default values of 1i < 0 i = 0

Mutants with equal likelihoodEllipse: mutants having the same chance to be created

Mutation case 3:Correlated mutations Chromosomes: x1,,xn, 1,, n ,1,, k where k = n (n-1)/2 and the covariance matrix C is defined as:cii = i2cij = 0 if i and j are not correlated cij = ( i2 - j2 ) tan(2 ij) if i and j are correlatedNote the numbering / indices of the s

Correlated mutations contdThe mutation mechanism is then:i = i exp( N(0,1) + Ni (0,1))j = j + N (0,1)x = x + N(0,C)x stands for the vector x1,,xn C is the covariance matrix C after mutation of the values 1/(2 n) and 1/(2 n) and 5 i < 0 i = 0 and | j | > j = j - 2 sign(j)

Mutants with equal likelihoodEllipse: mutants having the same chance to be created

RecombinationCreates one childActs per variable / position by eitherAveraging parental values, orSelecting one of the parental valuesFrom two or more parents by either:Using two selected parents to make a childSelecting two parents for each position anew

Names of recombinations

Two fixed parentsTwo parents selected for each izi = (xi + yi)/2 Local intermediaryGlobal intermediaryzi is xi or yi chosen randomly Local discreteGlobal discrete

Evolutionary Programming (EP)Traditional application domain: machine learning by FSMsContemporary application domain: (numerical) optimizationarbitrary representation and mutation operators, no recombinationcontemporary EP = traditional EP + ESself-adaptation of parameters

EP technical summary tableau

RepresentationReal-valued vectorsRecombinationNoneMutationGaussian perturbationParent selectionDeterministic Survivor selectionProbabilistic (+)SpecialtySelf-adaptation of mutation step sizes (in meta-EP)

Historical EP perspectiveEP aimed at achieving intelligenceIntelligence viewed as adaptive behaviourPrediction of the environment was considered a prerequisite to adaptive behaviourThus: capability to predict is key to intelligence

Prediction by finite state machinesFinite state machine (FSM): States SInputs IOutputs O Transition function : S x I S x OTransforms input stream into output streamCan be used for predictions, e.g. to predict next input symbol in a sequence

FSM exampleConsider the FSM with: S = {A, B, C}I = {0, 1}O = {a, b, c} given by a diagram

FSM as predictorConsider the following FSMTask: predict next inputQuality: % of in(i+1) = outi Given initial state CInput sequence 011101Leads to output 110111Quality: 3 out of 5

Introductory example:evolving FSMs to predict primesP(n) = 1 if n is prime, 0 otherwiseI = N = {1,2,3,, n, }O = {0,1}Correct prediction: outi= P(in(i+1)) Fitness function:1 point for correct prediction of next input0 point for incorrect predictionPenalty for too many states

Introductory example:evolving FSMs to predict primesParent selection: each FSM is mutated onceMutation operators (one selected randomly):Change an output symbolChange a state transition (i.e. redirect edge) Add a stateDelete a stateChange the initial stateSurvivor selection: (+)Results: overfitting, after 202 inputs best FSM had one state and both outputs were 0, i.e., it always predicted not prime

Modern EPNo predefined representation in generalThus: no predefined mutation (must match representation)Often applies self-adaptation of mutation parametersIn the sequel we present one EP variant, not the canonical EP

Representation For continuous parameter optimisationChromosomes consist of two parts:Object variables: x1,,xnMutation step sizes: 1,,nFull size: x1,,xn, 1,,n

MutationChromosomes: x1,,xn, 1,,n i = i (1 + N(0,1))xi = xi + i Ni(0,1) 0.2boundary rule: < 0 = 0 Other variants proposed & tried:Lognormal scheme as in ESUsing variance instead of standard deviationMutate -lastOther distributions, e.g, Cauchy instead of Gaussian

Recombination NoneRationale: one point in the search space stands for a species, not for an individual and there can be no crossover between speciesMuch historical debate mutation vs. crossoverPragmatic approach seems to prevail today

Parent selectionEach individual creates one child by mutationThus: DeterministicNot biased by fitness

Survivor selectionP(t): parents, P(t): offspring Pairwise competitions, round-robin format:Each solution x from P(t) P(t) is evaluated against q other randomly chosen solutions For each comparison, a "win" is assigned if x is better than its opponentThe solutions with greatest number of wins are retained to be parents of next generationParameter q allows tuning selection pressure (typically q = 10)

Example application: the Ackley function (Bck et al 93)The Ackley function (with n =30):

Representation: -30 < xi < 30 (coincidence of 30s!)30 variances as step sizesMutation with changing object variables first! Population size = 200, selection q = 10Termination after 200,000 fitness evalsResults: average best solution is 1.4 10 2

Example application: evolving checkers players (Fogel02)Neural nets for evaluating future values of moves are evolvedNNs have fixed structure with 5046 weights, these are evolved + one weight for kingsRepresentation: vector of 5046 real numbers for object variables (weights)vector of 5046 real numbers for sMutation: Gaussian, lognormal scheme with -firstPlus special mechanism for the kings weightPopulation size 15

Example application: evolving checkers players (Fogel02)Tournament size q = 5Programs (with NN inside) play against other programs, no human trainer or hard-wired intelligenceAfter 840 generation (6 months!) best strategy was tested against humans via InternetProgram earned expert class ranking outperforming 99.61% of all rated players

Genetic Programming (GP)Characteristic property: variable-size hierarchical representation vs. fixed-size linear in traditional EAsApplication domain: model optimization vs. input values in traditional EAsUnifying Paradigm: Program Induction

Program induction examplesOptimal controlPlanningSymbolic regressionAutomatic programmingDiscovering game playing strategiesForecastingInverse problem solvingDecision Tree inductionEvolution of emergent behaviorEvolution of cellular automata

GP specificationS-expressionsFunction setTerminal setArityCorrect expressionsClosure propertyStrongly typed GP

GP notes

Mutation or recombination (not both)Bloat (survival of the fattest)Parsimony pressure

Learning Classifier Systems (LCS)Note: LCS is technically not a type of EA, but can utilize an EACondition-Action Rule Based Systems rule format: Reinforcement LearningLCS rule format: predicted payoffdont care symbols

LCS specificsMulti-step credit allocation Bucket Brigade algorithmRule Discovery Cycle EAPitt approach: each individual represents a complete rule setMichigan approach: each individual represents a single rule, a population represents the complete rule set

Parameter Tuning vs Control

Parameter Tuning: A priori optimization of fixed strategy parameters

Parameter Control: On-the-fly optimization of dynamic strategy parameters

Parameter Tuning methodsStart with stock parameter valuesManually adjust based on user intuitionMonte Carlo sampling of parameter values on a few (short) runsMeta-tuning algorithm (e.g., meta-EA)

Parameter Tuning drawbacksExhaustive search for optimal values of parameters, even assuming independency, is infeasibleParameter dependenciesExtremely time consumingOptimal values are very problem specificDifferent values may be optimal at different evolutionary stages

Parameter Control methodsDeterministicExample: replace pi with pi(t)akin to cooling schedule in Simulated AnnealingAdaptiveExample: Rechenbergs 1/5 success ruleSelf-adaptiveExample: Mutation-step size control in ES

Evaluation Function ControlExample 1: Parsimony Pressure in GPExample 2: Penalty Functions in Constraint Satisfaction Problems (aka Constrained Optimization Problems)

Penalty Function Controleval(x)=f(x)+W penalty(x)

Deterministic ex: W=W(t)=(C t) with C,1

Adaptive ex (page 135 of textbook)

Self-adaptive ex (pages 135-136 of textbook)Note: this allows evolution to cheat!

Parameter Control aspectsWhat is changed?Parameters vs. operatorsWhat evidence informs the change?Absolute vs. relativeWhat is the scope of the change?Gene vs. individual vs. populationEx: one-bit allele for recombination operator selection (pairwise vs. vote)

Parameter control examplesRepresentation (GP:ADFs, delta coding)Evaluation function (objective function/)Mutation (ES)Recombination (Davis adaptive operator fitness:implicit bucket brigade)Selection (Boltzmann)PopulationMultiple

Parameterless EAsPrevious workDr. Ts EvoFree project

Multimodal ProblemsMultimodal def.: multiple local optima and at least one local optimum is not globally optimalBasins of attraction & NichesMotivation for identifying a diverse set of high quality solutions:Allow for human judgementSharp peak niches may be overfitted

Restricted MatingPanmictic vs. restricted matingFinite pop size + panmictic mating -> genetic driftLocal Adaptation (environmental niche)Punctuated EquilibriaEvolutionary StasisDemesSpeciation (end result of increasingly specialized adaptation to particular environmental niches)

EA spaces

BiologyEAGeographicalAlgorithmicGenotypeRepresentationPhenotypeSolution

Implicit diverse solution identification (1)Multiple runs of standard EANon-uniform basins of attraction problematicIsland Model (coarse-grain parallel)Punctuated EquilibriaEpoch, migrationCommunication characteristicsInitialization: number of islands and respective population sizes

Implicit diverse solution identification (2)Diffusion Model EAsSingle Population, Single SpeciesOverlapping demes distributed within Algorithmic Space (e.g., grid)Equivalent to cellular automataAutomatic SpeciationGenotype/phenotype mating restrictions

Explicit diverse solution identificationFitness Sharing: individuals share fitness within their nicheCrowding: replace similar parents

Game-Theoretic ProblemsAdversarial search: multi-agent problem with conflicting utility functions

Ultimatum GameSelect two subjects, A and BSubject A gets 10 units of currencyA has to make an offer (ultimatum) to B, anywhere from 0 to 10 of his unitsB has the option to accept or reject (no negotiation)If B accepts, A keeps the remaining units and B the offered units; otherwise they both loose all units

Real-World Game-Theoretic ProblemsReal-world examples: economic & military strategyarms controlcyber securitybargainingCommon problem: real-world games are typically incomputable

Armsraces

Military armsracesPrisoners DilemmaBiological armsraces

Approximating incomputable gamesConsider the space of each users actionsPerform local search in these spacesSolution quality in one space is dependent on the search in the other spacesThe simultaneous search of co-dependent spaces is naturally modeled as an armsrace

Evolutionary armsraces

Iterated evolutionary armsracesBiological armsraces revisitedIterated armsrace optimization is doomed!

Coevolutionary Algorithm (CoEA)A special type of EAs where the fitness of an individual is dependent on other individuals. (i.e., individuals are explicitely part of the environment)

Single species vs. multiple speciesCooperative vs. competitive coevolution

CoEA difficulties (1)DisengagementOccurs when one population evolves so much faster than the other that all individuals of the other are utterly defeated, making it impossible to differentiate between better and worse individuals without which there can be no evolution

CoEA difficulties (2)CyclingOccurs when populations have lost the genetic knowledge of how to defeat an earlier generation adversary and that adversary re-evolvesPotentially this can cause an infinite loop in which the populations continue to evolve but do not improve

CoEA difficulties (3)Suboptimal Equilibrium(aka Mediocre Stability)Occurs when the system stabilizes in a suboptimal equilibrium

Case Study from Critical Infrastructure Protection

Infrastructure HardeningHardenings (defenders) versus contingencies (attackers)Hardenings need to balance spare flow capacity with flow control

Case Study from Automated Software Engineering

Automated Software CorrectionPrograms (defenders) versus test cases (attackers)Programs encoded with Genetic ProgrammingProgram specification encoded in fitness function (correctness critical!)

Multi-Objective EAs (MOEAs)

Extension of regular EA which maps multiple objective values to single fitness valueObjectives typically conflictIn a standard EA, an individual A is said to be better than an individual B if A has a higher fitness value than BIn a MOEA, an individual A is said to be better than an individual B if A dominates B

Domination in MOEAs

An individual A is said to dominate individual B iff:A is no worse than B in all objectivesA is strictly better than B in at least one objective

Pareto Optimality (Vilfredo Pareto)Given a set of alternative allocations of, say, goods or income for a set of individuals, a movement from one allocation to another that can make at least one individual better off without making any other individual worse off is called a Pareto Improvement. An allocation is Pareto Optimal when no further Pareto Improvements can be made. This is often called a Strong Pareto Optimum (SPO).

Pareto Optimality in MOEAsAmong a set of solutions P, the non-dominated subset of solutions P are those that are not dominated by any member of the set PThe non-dominated subset of the entire feasible search space S is the globally Pareto-optimal set

Goals of MOEAsIdentify the Global Pareto-Optimal set of solutions (aka the Pareto Optimal Front)Find a sufficient coverage of that setFind an even distribution of solutions

MOEA metrics

Convergence: How close is a generated solution set to the true Pareto-optimal front

Diversity: Are the generated solutions evenly distributed, or are they in clusters

Deterioration in MOEAsCompetition can result in the loss of a non-dominated solution which dominated a previously generated solutionThis loss in its turn can result in the previously generated solution being regenerated and surviving

NSGA-IIInitialization before primary loopCreate initial population P0Sort P0 on the basis of non-dominationBest level is level 1Fitness is set to level number; lower number, higher fitnessBinary Tournament SelectionMutation and Recombination create Q0

NSGA-II (cont.)Primary LoopRt = Pt + QtSort Rt on the basis of non-dominationCreate Pt + 1 by adding the best individuals from RtCreate Qt + 1 by performing Binary Tournament Selection, Mutation, and Recombination on Pt + 1

Epsilon-MOEASteady StateElitistNo deterioration

Epsilon-MOEA (cont.)Create an initial population P(0)Epsilon non-dominated solutions from P(0) are put into an archive population E(0)Choose one individual from E, and one from PThese individuals mate and produce an offspring, cA special array B is created for c, which consists of abbreviated versions of the objective values from c

Epsilon-MOEA (cont.)An attempt to insert c into the archive population EThe domination check is conducted using the B array instead of the actual objective valuesIf c dominates a member of the archive, that member will be replaced with cThe individual c can also be inserted into P in a similar manner using a standard domination check

SNDL-MOEADesired FeaturesDeterioration PreventionStored non-domination levels (NSGA-II)Number and size of levels user configurableSelection methods utilizing levels in different waysProblem specific representationProblem specific compartments (E-MOEA)Problem specific mutation and crossover

Evolutionary Computing

Documents

solution space

local search algorithms

fitness functionexploration

reachedoptimal solution

z x ztrial solution

y minimal

x value

lowest fitness valuefx