Evolutionary Computing Computer Science 348 Dr. T presents…
Feb 02, 2016
Evolutionary ComputingComputer Science 348Dr. T presents
IntroductionThe field of Evolutionary Computing studies the theory and application of Evolutionary Algorithms.
Evolutionary Algorithms can be described as a class of stochastic, population-based local search algorithms inspired by neo-Darwinian Evolution Theory.
Computational Basis
Trial-and-error (aka Generate-and-test)Graduated solution qualityStochastic local search of solution landscape
Biological MetaphorsDarwinian EvolutionMacroscopic view of evolutionNatural selectionSurvival of the fittestRandom variation
Biological Metaphors(Mendelian) GeneticsGenotype (functional unit of inheritance)Genotypes vs. phenotypesPleitropy: one gene affects multiple phenotypic traitsPolygeny: one phenotypic trait is affected by multiple genesChromosomes (haploid vs. diploid)Loci and alleles
More general purpose than traditional optimization algorithms; i.e., less problem specific knowledge requiredAbility to solve difficult problemsSolution availabilityRobustnessInherent parallelism
EA Pros
Fitness function and genetic operators often not obviousPremature convergenceComputationally intensiveDifficult parameter optimizationEA Cons
EA components
Search spaces: representation & sizeEvaluation of trial solutions: fitness functionExploration versus exploitationSelective pressure ratePremature convergence
Nature versus the digital realm
EnvironmentProblem (search space)FitnessFitness functionPopulationSetIndividualDatastructureGenesElementsAllelesDatatype
EA Strategy ParametersPopulation sizeInitialization related parametersSelection related parametersNumber of offspringRecombination chanceMutation chanceMutation rateTermination related parameters
Problem solving stepsCollect problem knowledgeChoose gene representationDesign fitness functionCreation of initial populationParent selectionDecide on genetic operatorsCompetition / survivalChoose termination conditionFind good parameter values
Function optimization problem
Given the function
f(x,y) = x2y + 5xy 3xy2
for what integer values of x and y is f(x,y) minimal?
Solution space: Z x ZTrial solution: (x,y)Gene representation: integerGene initialization: randomFitness function: -f(x,y)Population size: 4Number of offspring: 2Parent selection: exponentialFunction optimization problem
Function optimization problemGenetic operators:1-point crossoverMutation (-1,0,1)Competition:remove the two individuals with the lowest fitness value
f(x,y) = x2y + 5xy - 3xy2
Measuring performanceCase 1: goal unknown or never reachedSolution quality: global average/best population fitness
Case 2: goal known and sometimes reachedOptimal solution reached percentage
Case 3: goal known and always reachedConvergence speed
InitializationUniform randomHeuristic basedKnowledge basedGenotypes from previous runsSeeding
Representation (2.3.1)Genotype spacePhenotype spaceEncoding & DecodingKnapsack Problem (2.4.2)Surjective, injective, and bijective decoder functions
Simple Genetic Algorithm (SGA)
Representation: Bit-stringsRecombination: 1-Point CrossoverMutation: Bit FlipParent Selection: Fitness ProportionalSurvival Selection: Generational
Trace example errataPage 39, line 5, 729 -> 784Table 3.4, x Value, 26 -> 28, 18 -> 20Table 3.4, Fitness:676 -> 784324 -> 4002354 -> 2538588.5 -> 634.5729 -> 784
RepresentationsBit StringsScaling Hamming CliffsBinary vs. Gray coding (Appendix A)IntegersOrdinal vs. cardinal attributesPermutationsAbsolute order vs. adjacencyReal-Valued, etc.Homogeneous vs. heterogeneous
Permutation RepresentationOrder based (e.g., job shop scheduling)Adjacency based (e.g., TSP)
Problem space: [A,B,C,D]Permutation: [3,1,2,4]Mapping 1: [C,A,B,D]Mapping 2: [B,C,A,D]
Mutation vs. Recombination
Mutation = Stochastic unary variation operator
Recombination = Stochastic multi-ary variation operator
MutationBit-String Representation:Bit-FlipE[#flips] = L * pm
Integer Representation:Random Reset (cardinal attributes)Creep Mutation (ordinal attributes)
Mutation cont.Floating-PointUniformNonuniform from fixed distributionGaussian, Cauche, Levy, etc.PermutationSwapInsertScrambleInversion
Permutation MutationSwap MutationInsert MutationScramble MutationInversion Mutation (good for adjacency based problems)
RecombinationRecombination rate: asexual vs. sexualN-Point Crossover (positional bias)Uniform Crossover (distributional bias)Discrete recombination (no new alleles)(Uniform) arithmetic recombinationSimple recombinationSingle arithmetic recombinationWhole arithmetic recombination
Recombination (cont.)Adjacency-based permutationPartially Mapped Crossover (PMX)Edge Crossover
Order-based permutationOrder CrossoverCycle Crossover
Permutation RecombinationAdjacency based problemsPartially Mapped Crossover (PMX)Edge Crossover
Order based problemsOrder CrossoverCycle Crossover
PMXChoose 2 random crossover points & copy mid-segment from p1 to offspringLook for elements in mid-segment of p2 that were not copiedFor each of these (i), look in offspring to see what copied in its place (j)Place i into position occupied by j in p2If place occupied by j in p2 already filled in offspring by k, put i in position occupied by k in p2Rest of offspring filled by copying p2
Order CrossoverChoose 2 random crossover points & copy mid-segment from p1 to offspringStarting from 2nd crossover point in p2, copy unused numbers into offspring in the order they appear in p2, wrapping around at end of list
Population ModelsTwo historical modelsGenerational ModelSteady State ModelGenerational Gap
General modelPopulation sizeMating pool sizeOffspring pool size
Parent selectionFitness Proportional Selection (FPS)High risk of premature convergenceUneven selective pressureFitness function not transposition invariantWindowing, Sigma ScalingRank-Based SelectionMapping function (ala SA cooling schedule)Linear ranking vs. exponential ranking
Sampling methodsRoulette WheelStochastic Universal Sampling (SUS)
Rank based sampling methodsTournament SelectionTournament Size
Survivor selectionAge-basedFitness-basedTruncationElitism
TerminationCPU time / wall timeNumber of fitness evaluationsLack of fitness improvementLack of genetic diversitySolution quality / solution foundCombination of the above
Behavioral observablesSelective pressurePopulation diversityFitness valuesPhenotypesGenotypesAlleles
Report writing tipsUse easily readable fonts, including in tables & graphs (11 pnt fonts are typically best, 10 pnt is the absolute smallest)Number all figures and tables and refer to each and every one in the main text body (hint: use autonumbering)Capitalize named articles (e.g., ``see Table 5'', not ``see table 5'')Keep important figures and tables as close to the referring text as possible, while placing less important ones in an appendixAlways provide standard deviations (typically in between parentheses) when listing averages
Report writing tipsUse descriptive titles, captions on tables and figures so that they are self-explanatoryAlways include axis labels in graphsWrite in a formal style (never use first person, instead say, for instance, ``the author'')Format tabular material in proper tables with grid linesProvide all the required information, but avoid extraneous data (information is good, data is bad)
Evolution Strategies (ES)
Birth year: 1963Birth place: Technical University of Berlin, GermanyParents: Ingo Rechenberg & Hans-Paul Schwefel
ES history & parameter controlTwo-membered ES: (1+1)Original multi-membered ES: (+1)Multi-membered ES: (+), (,)Parameter tuning vs. parameter controlFixed parameter controlRechenbergs 1/5 success ruleSelf-adaptationMutation Step control
Uncorrelated mutation with one Chromosomes: x1,,xn, = exp( N(0,1))xi = xi + N(0,1)Typically the learning rate 1/ nAnd we have a boundary rule < 0 = 0
Mutants with equal likelihoodCircle: mutants having same chance to be created
Mutation case 2:Uncorrelated mutation with n sChromosomes: x1,,xn, 1,, n i = i exp( N(0,1) + Ni (0,1))xi = xi + i Ni (0,1)Two learning rate parmeters: overall learning rate coordinate wise learning rate 1/(2 n) and 1/(2 n) and have individual proportionality constants which both have default values of 1i < 0 i = 0
Mutants with equal likelihoodEllipse: mutants having the same chance to be created
Mutation case 3:Correlated mutations Chromosomes: x1,,xn, 1,, n ,1,, k where k = n (n-1)/2 and the covariance matrix C is defined as:cii = i2cij = 0 if i and j are not correlated cij = ( i2 - j2 ) tan(2 ij) if i and j are correlatedNote the numbering / indices of the s
Correlated mutations contdThe mutation mechanism is then:i = i exp( N(0,1) + Ni (0,1))j = j + N (0,1)x = x + N(0,C)x stands for the vector x1,,xn C is the covariance matrix C after mutation of the values 1/(2 n) and 1/(2 n) and 5 i < 0 i = 0 and | j | > j = j - 2 sign(j)
Mutants with equal likelihoodEllipse: mutants having the same chance to be created
RecombinationCreates one childActs per variable / position by eitherAveraging parental values, orSelecting one of the parental valuesFrom two or more parents by either:Using two selected parents to make a childSelecting two parents for each position anew
Names of recombinations
Two fixed parentsTwo parents selected for each izi = (xi + yi)/2 Local intermediaryGlobal intermediaryzi is xi or yi chosen randomly Local discreteGlobal discrete
Evolutionary Programming (EP)Traditional application domain: machine learning by FSMsContemporary application domain: (numerical) optimizationarbitrary representation and mutation operators, no recombinationcontemporary EP = traditional EP + ESself-adaptation of parameters
EP technical summary tableau
RepresentationReal-valued vectorsRecombinationNoneMutationGaussian perturbationParent selectionDeterministic Survivor selectionProbabilistic (+)SpecialtySelf-adaptation of mutation step sizes (in meta-EP)
Historical EP perspectiveEP aimed at achieving intelligenceIntelligence viewed as adaptive behaviourPrediction of the environment was considered a prerequisite to adaptive behaviourThus: capability to predict is key to intelligence
Prediction by finite state machinesFinite state machine (FSM): States SInputs IOutputs O Transition function : S x I S x OTransforms input stream into output streamCan be used for predictions, e.g. to predict next input symbol in a sequence
FSM exampleConsider the FSM with: S = {A, B, C}I = {0, 1}O = {a, b, c} given by a diagram
FSM as predictorConsider the following FSMTask: predict next inputQuality: % of in(i+1) = outi Given initial state CInput sequence 011101Leads to output 110111Quality: 3 out of 5
Introductory example:evolving FSMs to predict primesP(n) = 1 if n is prime, 0 otherwiseI = N = {1,2,3,, n, }O = {0,1}Correct prediction: outi= P(in(i+1)) Fitness function:1 point for correct prediction of next input0 point for incorrect predictionPenalty for too many states
Introductory example:evolving FSMs to predict primesParent selection: each FSM is mutated onceMutation operators (one selected randomly):Change an output symbolChange a state transition (i.e. redirect edge) Add a stateDelete a stateChange the initial stateSurvivor selection: (+)Results: overfitting, after 202 inputs best FSM had one state and both outputs were 0, i.e., it always predicted not prime
Modern EPNo predefined representation in generalThus: no predefined mutation (must match representation)Often applies self-adaptation of mutation parametersIn the sequel we present one EP variant, not the canonical EP
Representation For continuous parameter optimisationChromosomes consist of two parts:Object variables: x1,,xnMutation step sizes: 1,,nFull size: x1,,xn, 1,,n
MutationChromosomes: x1,,xn, 1,,n i = i (1 + N(0,1))xi = xi + i Ni(0,1) 0.2boundary rule: < 0 = 0 Other variants proposed & tried:Lognormal scheme as in ESUsing variance instead of standard deviationMutate -lastOther distributions, e.g, Cauchy instead of Gaussian
Recombination NoneRationale: one point in the search space stands for a species, not for an individual and there can be no crossover between speciesMuch historical debate mutation vs. crossoverPragmatic approach seems to prevail today
Parent selectionEach individual creates one child by mutationThus: DeterministicNot biased by fitness
Survivor selectionP(t): parents, P(t): offspring Pairwise competitions, round-robin format:Each solution x from P(t) P(t) is evaluated against q other randomly chosen solutions For each comparison, a "win" is assigned if x is better than its opponentThe solutions with greatest number of wins are retained to be parents of next generationParameter q allows tuning selection pressure (typically q = 10)
Example application: the Ackley function (Bck et al 93)The Ackley function (with n =30):
Representation: -30 < xi < 30 (coincidence of 30s!)30 variances as step sizesMutation with changing object variables first! Population size = 200, selection q = 10Termination after 200,000 fitness evalsResults: average best solution is 1.4 10 2
Example application: evolving checkers players (Fogel02)Neural nets for evaluating future values of moves are evolvedNNs have fixed structure with 5046 weights, these are evolved + one weight for kingsRepresentation: vector of 5046 real numbers for object variables (weights)vector of 5046 real numbers for sMutation: Gaussian, lognormal scheme with -firstPlus special mechanism for the kings weightPopulation size 15
Example application: evolving checkers players (Fogel02)Tournament size q = 5Programs (with NN inside) play against other programs, no human trainer or hard-wired intelligenceAfter 840 generation (6 months!) best strategy was tested against humans via InternetProgram earned expert class ranking outperforming 99.61% of all rated players
Genetic Programming (GP)Characteristic property: variable-size hierarchical representation vs. fixed-size linear in traditional EAsApplication domain: model optimization vs. input values in traditional EAsUnifying Paradigm: Program Induction
Program induction examplesOptimal controlPlanningSymbolic regressionAutomatic programmingDiscovering game playing strategiesForecastingInverse problem solvingDecision Tree inductionEvolution of emergent behaviorEvolution of cellular automata
GP specificationS-expressionsFunction setTerminal setArityCorrect expressionsClosure propertyStrongly typed GP
GP notes
Mutation or recombination (not both)Bloat (survival of the fattest)Parsimony pressure
Learning Classifier Systems (LCS)Note: LCS is technically not a type of EA, but can utilize an EACondition-Action Rule Based Systems rule format: Reinforcement LearningLCS rule format: predicted payoffdont care symbols
LCS specificsMulti-step credit allocation Bucket Brigade algorithmRule Discovery Cycle EAPitt approach: each individual represents a complete rule setMichigan approach: each individual represents a single rule, a population represents the complete rule set
Parameter Tuning vs Control
Parameter Tuning: A priori optimization of fixed strategy parameters
Parameter Control: On-the-fly optimization of dynamic strategy parameters
Parameter Tuning methodsStart with stock parameter valuesManually adjust based on user intuitionMonte Carlo sampling of parameter values on a few (short) runsMeta-tuning algorithm (e.g., meta-EA)
Parameter Tuning drawbacksExhaustive search for optimal values of parameters, even assuming independency, is infeasibleParameter dependenciesExtremely time consumingOptimal values are very problem specificDifferent values may be optimal at different evolutionary stages
Parameter Control methodsDeterministicExample: replace pi with pi(t)akin to cooling schedule in Simulated AnnealingAdaptiveExample: Rechenbergs 1/5 success ruleSelf-adaptiveExample: Mutation-step size control in ES
Evaluation Function ControlExample 1: Parsimony Pressure in GPExample 2: Penalty Functions in Constraint Satisfaction Problems (aka Constrained Optimization Problems)
Penalty Function Controleval(x)=f(x)+W penalty(x)
Deterministic ex: W=W(t)=(C t) with C,1
Adaptive ex (page 135 of textbook)
Self-adaptive ex (pages 135-136 of textbook)Note: this allows evolution to cheat!
Parameter Control aspectsWhat is changed?Parameters vs. operatorsWhat evidence informs the change?Absolute vs. relativeWhat is the scope of the change?Gene vs. individual vs. populationEx: one-bit allele for recombination operator selection (pairwise vs. vote)
Parameter control examplesRepresentation (GP:ADFs, delta coding)Evaluation function (objective function/)Mutation (ES)Recombination (Davis adaptive operator fitness:implicit bucket brigade)Selection (Boltzmann)PopulationMultiple
Parameterless EAsPrevious workDr. Ts EvoFree project
Multimodal ProblemsMultimodal def.: multiple local optima and at least one local optimum is not globally optimalBasins of attraction & NichesMotivation for identifying a diverse set of high quality solutions:Allow for human judgementSharp peak niches may be overfitted
Restricted MatingPanmictic vs. restricted matingFinite pop size + panmictic mating -> genetic driftLocal Adaptation (environmental niche)Punctuated EquilibriaEvolutionary StasisDemesSpeciation (end result of increasingly specialized adaptation to particular environmental niches)
EA spaces
BiologyEAGeographicalAlgorithmicGenotypeRepresentationPhenotypeSolution
Implicit diverse solution identification (1)Multiple runs of standard EANon-uniform basins of attraction problematicIsland Model (coarse-grain parallel)Punctuated EquilibriaEpoch, migrationCommunication characteristicsInitialization: number of islands and respective population sizes
Implicit diverse solution identification (2)Diffusion Model EAsSingle Population, Single SpeciesOverlapping demes distributed within Algorithmic Space (e.g., grid)Equivalent to cellular automataAutomatic SpeciationGenotype/phenotype mating restrictions
Explicit diverse solution identificationFitness Sharing: individuals share fitness within their nicheCrowding: replace similar parents
Game-Theoretic ProblemsAdversarial search: multi-agent problem with conflicting utility functions
Ultimatum GameSelect two subjects, A and BSubject A gets 10 units of currencyA has to make an offer (ultimatum) to B, anywhere from 0 to 10 of his unitsB has the option to accept or reject (no negotiation)If B accepts, A keeps the remaining units and B the offered units; otherwise they both loose all units
Real-World Game-Theoretic ProblemsReal-world examples: economic & military strategyarms controlcyber securitybargainingCommon problem: real-world games are typically incomputable
Armsraces
Military armsracesPrisoners DilemmaBiological armsraces
Approximating incomputable gamesConsider the space of each users actionsPerform local search in these spacesSolution quality in one space is dependent on the search in the other spacesThe simultaneous search of co-dependent spaces is naturally modeled as an armsrace
Evolutionary armsraces
Iterated evolutionary armsracesBiological armsraces revisitedIterated armsrace optimization is doomed!
Coevolutionary Algorithm (CoEA)A special type of EAs where the fitness of an individual is dependent on other individuals. (i.e., individuals are explicitely part of the environment)
Single species vs. multiple speciesCooperative vs. competitive coevolution
CoEA difficulties (1)DisengagementOccurs when one population evolves so much faster than the other that all individuals of the other are utterly defeated, making it impossible to differentiate between better and worse individuals without which there can be no evolution
CoEA difficulties (2)CyclingOccurs when populations have lost the genetic knowledge of how to defeat an earlier generation adversary and that adversary re-evolvesPotentially this can cause an infinite loop in which the populations continue to evolve but do not improve
CoEA difficulties (3)Suboptimal Equilibrium(aka Mediocre Stability)Occurs when the system stabilizes in a suboptimal equilibrium
Case Study from Critical Infrastructure Protection
Infrastructure HardeningHardenings (defenders) versus contingencies (attackers)Hardenings need to balance spare flow capacity with flow control
Case Study from Automated Software Engineering
Automated Software CorrectionPrograms (defenders) versus test cases (attackers)Programs encoded with Genetic ProgrammingProgram specification encoded in fitness function (correctness critical!)
Multi-Objective EAs (MOEAs)
Extension of regular EA which maps multiple objective values to single fitness valueObjectives typically conflictIn a standard EA, an individual A is said to be better than an individual B if A has a higher fitness value than BIn a MOEA, an individual A is said to be better than an individual B if A dominates B
Domination in MOEAs
An individual A is said to dominate individual B iff:A is no worse than B in all objectivesA is strictly better than B in at least one objective
Pareto Optimality (Vilfredo Pareto)Given a set of alternative allocations of, say, goods or income for a set of individuals, a movement from one allocation to another that can make at least one individual better off without making any other individual worse off is called a Pareto Improvement. An allocation is Pareto Optimal when no further Pareto Improvements can be made. This is often called a Strong Pareto Optimum (SPO).
Pareto Optimality in MOEAsAmong a set of solutions P, the non-dominated subset of solutions P are those that are not dominated by any member of the set PThe non-dominated subset of the entire feasible search space S is the globally Pareto-optimal set
Goals of MOEAsIdentify the Global Pareto-Optimal set of solutions (aka the Pareto Optimal Front)Find a sufficient coverage of that setFind an even distribution of solutions
MOEA metrics
Convergence: How close is a generated solution set to the true Pareto-optimal front
Diversity: Are the generated solutions evenly distributed, or are they in clusters
Deterioration in MOEAsCompetition can result in the loss of a non-dominated solution which dominated a previously generated solutionThis loss in its turn can result in the previously generated solution being regenerated and surviving
NSGA-IIInitialization before primary loopCreate initial population P0Sort P0 on the basis of non-dominationBest level is level 1Fitness is set to level number; lower number, higher fitnessBinary Tournament SelectionMutation and Recombination create Q0
NSGA-II (cont.)Primary LoopRt = Pt + QtSort Rt on the basis of non-dominationCreate Pt + 1 by adding the best individuals from RtCreate Qt + 1 by performing Binary Tournament Selection, Mutation, and Recombination on Pt + 1
Epsilon-MOEASteady StateElitistNo deterioration
Epsilon-MOEA (cont.)Create an initial population P(0)Epsilon non-dominated solutions from P(0) are put into an archive population E(0)Choose one individual from E, and one from PThese individuals mate and produce an offspring, cA special array B is created for c, which consists of abbreviated versions of the objective values from c
Epsilon-MOEA (cont.)An attempt to insert c into the archive population EThe domination check is conducted using the B array instead of the actual objective valuesIf c dominates a member of the archive, that member will be replaced with cThe individual c can also be inserted into P in a similar manner using a standard domination check
SNDL-MOEADesired FeaturesDeterioration PreventionStored non-domination levels (NSGA-II)Number and size of levels user configurableSelection methods utilizing levels in different waysProblem specific representationProblem specific compartments (E-MOEA)Problem specific mutation and crossover