Top Banner
Evolutionary Computing Computer Science 348 Dr. T presents…
114

Evolutionary Computing

Feb 02, 2016

Download

Documents

farhani

Dr. T presents…. Evolutionary Computing. Computer Science 348. Introduction. The field of Evolutionary Computing studies the theory and application of Evolutionary Algorithms. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • Evolutionary ComputingComputer Science 348Dr. T presents

  • IntroductionThe field of Evolutionary Computing studies the theory and application of Evolutionary Algorithms.

    Evolutionary Algorithms can be described as a class of stochastic, population-based local search algorithms inspired by neo-Darwinian Evolution Theory.

  • Computational Basis

    Trial-and-error (aka Generate-and-test)Graduated solution qualityStochastic local search of solution landscape

  • Biological MetaphorsDarwinian EvolutionMacroscopic view of evolutionNatural selectionSurvival of the fittestRandom variation

  • Biological Metaphors(Mendelian) GeneticsGenotype (functional unit of inheritance)Genotypes vs. phenotypesPleitropy: one gene affects multiple phenotypic traitsPolygeny: one phenotypic trait is affected by multiple genesChromosomes (haploid vs. diploid)Loci and alleles

  • More general purpose than traditional optimization algorithms; i.e., less problem specific knowledge requiredAbility to solve difficult problemsSolution availabilityRobustnessInherent parallelism

    EA Pros

  • Fitness function and genetic operators often not obviousPremature convergenceComputationally intensiveDifficult parameter optimizationEA Cons

  • EA components

    Search spaces: representation & sizeEvaluation of trial solutions: fitness functionExploration versus exploitationSelective pressure ratePremature convergence

  • Nature versus the digital realm

    EnvironmentProblem (search space)FitnessFitness functionPopulationSetIndividualDatastructureGenesElementsAllelesDatatype

  • EA Strategy ParametersPopulation sizeInitialization related parametersSelection related parametersNumber of offspringRecombination chanceMutation chanceMutation rateTermination related parameters

  • Problem solving stepsCollect problem knowledgeChoose gene representationDesign fitness functionCreation of initial populationParent selectionDecide on genetic operatorsCompetition / survivalChoose termination conditionFind good parameter values

  • Function optimization problem

    Given the function

    f(x,y) = x2y + 5xy 3xy2

    for what integer values of x and y is f(x,y) minimal?

  • Solution space: Z x ZTrial solution: (x,y)Gene representation: integerGene initialization: randomFitness function: -f(x,y)Population size: 4Number of offspring: 2Parent selection: exponentialFunction optimization problem

  • Function optimization problemGenetic operators:1-point crossoverMutation (-1,0,1)Competition:remove the two individuals with the lowest fitness value

  • f(x,y) = x2y + 5xy - 3xy2

  • Measuring performanceCase 1: goal unknown or never reachedSolution quality: global average/best population fitness

    Case 2: goal known and sometimes reachedOptimal solution reached percentage

    Case 3: goal known and always reachedConvergence speed

  • InitializationUniform randomHeuristic basedKnowledge basedGenotypes from previous runsSeeding

  • Representation (2.3.1)Genotype spacePhenotype spaceEncoding & DecodingKnapsack Problem (2.4.2)Surjective, injective, and bijective decoder functions

  • Simple Genetic Algorithm (SGA)

    Representation: Bit-stringsRecombination: 1-Point CrossoverMutation: Bit FlipParent Selection: Fitness ProportionalSurvival Selection: Generational

  • Trace example errataPage 39, line 5, 729 -> 784Table 3.4, x Value, 26 -> 28, 18 -> 20Table 3.4, Fitness:676 -> 784324 -> 4002354 -> 2538588.5 -> 634.5729 -> 784

  • RepresentationsBit StringsScaling Hamming CliffsBinary vs. Gray coding (Appendix A)IntegersOrdinal vs. cardinal attributesPermutationsAbsolute order vs. adjacencyReal-Valued, etc.Homogeneous vs. heterogeneous

  • Permutation RepresentationOrder based (e.g., job shop scheduling)Adjacency based (e.g., TSP)

    Problem space: [A,B,C,D]Permutation: [3,1,2,4]Mapping 1: [C,A,B,D]Mapping 2: [B,C,A,D]

  • Mutation vs. Recombination

    Mutation = Stochastic unary variation operator

    Recombination = Stochastic multi-ary variation operator

  • MutationBit-String Representation:Bit-FlipE[#flips] = L * pm

    Integer Representation:Random Reset (cardinal attributes)Creep Mutation (ordinal attributes)

  • Mutation cont.Floating-PointUniformNonuniform from fixed distributionGaussian, Cauche, Levy, etc.PermutationSwapInsertScrambleInversion

  • Permutation MutationSwap MutationInsert MutationScramble MutationInversion Mutation (good for adjacency based problems)

  • RecombinationRecombination rate: asexual vs. sexualN-Point Crossover (positional bias)Uniform Crossover (distributional bias)Discrete recombination (no new alleles)(Uniform) arithmetic recombinationSimple recombinationSingle arithmetic recombinationWhole arithmetic recombination

  • Recombination (cont.)Adjacency-based permutationPartially Mapped Crossover (PMX)Edge Crossover

    Order-based permutationOrder CrossoverCycle Crossover

  • Permutation RecombinationAdjacency based problemsPartially Mapped Crossover (PMX)Edge Crossover

    Order based problemsOrder CrossoverCycle Crossover

  • PMXChoose 2 random crossover points & copy mid-segment from p1 to offspringLook for elements in mid-segment of p2 that were not copiedFor each of these (i), look in offspring to see what copied in its place (j)Place i into position occupied by j in p2If place occupied by j in p2 already filled in offspring by k, put i in position occupied by k in p2Rest of offspring filled by copying p2

  • Order CrossoverChoose 2 random crossover points & copy mid-segment from p1 to offspringStarting from 2nd crossover point in p2, copy unused numbers into offspring in the order they appear in p2, wrapping around at end of list

  • Population ModelsTwo historical modelsGenerational ModelSteady State ModelGenerational Gap

    General modelPopulation sizeMating pool sizeOffspring pool size

  • Parent selectionFitness Proportional Selection (FPS)High risk of premature convergenceUneven selective pressureFitness function not transposition invariantWindowing, Sigma ScalingRank-Based SelectionMapping function (ala SA cooling schedule)Linear ranking vs. exponential ranking

  • Sampling methodsRoulette WheelStochastic Universal Sampling (SUS)

  • Rank based sampling methodsTournament SelectionTournament Size

  • Survivor selectionAge-basedFitness-basedTruncationElitism

  • TerminationCPU time / wall timeNumber of fitness evaluationsLack of fitness improvementLack of genetic diversitySolution quality / solution foundCombination of the above

  • Behavioral observablesSelective pressurePopulation diversityFitness valuesPhenotypesGenotypesAlleles

  • Report writing tipsUse easily readable fonts, including in tables & graphs (11 pnt fonts are typically best, 10 pnt is the absolute smallest)Number all figures and tables and refer to each and every one in the main text body (hint: use autonumbering)Capitalize named articles (e.g., ``see Table 5'', not ``see table 5'')Keep important figures and tables as close to the referring text as possible, while placing less important ones in an appendixAlways provide standard deviations (typically in between parentheses) when listing averages

  • Report writing tipsUse descriptive titles, captions on tables and figures so that they are self-explanatoryAlways include axis labels in graphsWrite in a formal style (never use first person, instead say, for instance, ``the author'')Format tabular material in proper tables with grid linesProvide all the required information, but avoid extraneous data (information is good, data is bad)

  • Evolution Strategies (ES)

    Birth year: 1963Birth place: Technical University of Berlin, GermanyParents: Ingo Rechenberg & Hans-Paul Schwefel

  • ES history & parameter controlTwo-membered ES: (1+1)Original multi-membered ES: (+1)Multi-membered ES: (+), (,)Parameter tuning vs. parameter controlFixed parameter controlRechenbergs 1/5 success ruleSelf-adaptationMutation Step control

  • Uncorrelated mutation with one Chromosomes: x1,,xn, = exp( N(0,1))xi = xi + N(0,1)Typically the learning rate 1/ nAnd we have a boundary rule < 0 = 0

  • Mutants with equal likelihoodCircle: mutants having same chance to be created

  • Mutation case 2:Uncorrelated mutation with n sChromosomes: x1,,xn, 1,, n i = i exp( N(0,1) + Ni (0,1))xi = xi + i Ni (0,1)Two learning rate parmeters: overall learning rate coordinate wise learning rate 1/(2 n) and 1/(2 n) and have individual proportionality constants which both have default values of 1i < 0 i = 0

  • Mutants with equal likelihoodEllipse: mutants having the same chance to be created

  • Mutation case 3:Correlated mutations Chromosomes: x1,,xn, 1,, n ,1,, k where k = n (n-1)/2 and the covariance matrix C is defined as:cii = i2cij = 0 if i and j are not correlated cij = ( i2 - j2 ) tan(2 ij) if i and j are correlatedNote the numbering / indices of the s

  • Correlated mutations contdThe mutation mechanism is then:i = i exp( N(0,1) + Ni (0,1))j = j + N (0,1)x = x + N(0,C)x stands for the vector x1,,xn C is the covariance matrix C after mutation of the values 1/(2 n) and 1/(2 n) and 5 i < 0 i = 0 and | j | > j = j - 2 sign(j)

  • Mutants with equal likelihoodEllipse: mutants having the same chance to be created

  • RecombinationCreates one childActs per variable / position by eitherAveraging parental values, orSelecting one of the parental valuesFrom two or more parents by either:Using two selected parents to make a childSelecting two parents for each position anew

  • Names of recombinations

    Two fixed parentsTwo parents selected for each izi = (xi + yi)/2 Local intermediaryGlobal intermediaryzi is xi or yi chosen randomly Local discreteGlobal discrete

  • Evolutionary Programming (EP)Traditional application domain: machine learning by FSMsContemporary application domain: (numerical) optimizationarbitrary representation and mutation operators, no recombinationcontemporary EP = traditional EP + ESself-adaptation of parameters

  • EP technical summary tableau

    RepresentationReal-valued vectorsRecombinationNoneMutationGaussian perturbationParent selectionDeterministic Survivor selectionProbabilistic (+)SpecialtySelf-adaptation of mutation step sizes (in meta-EP)

  • Historical EP perspectiveEP aimed at achieving intelligenceIntelligence viewed as adaptive behaviourPrediction of the environment was considered a prerequisite to adaptive behaviourThus: capability to predict is key to intelligence

  • Prediction by finite state machinesFinite state machine (FSM): States SInputs IOutputs O Transition function : S x I S x OTransforms input stream into output streamCan be used for predictions, e.g. to predict next input symbol in a sequence

  • FSM exampleConsider the FSM with: S = {A, B, C}I = {0, 1}O = {a, b, c} given by a diagram

  • FSM as predictorConsider the following FSMTask: predict next inputQuality: % of in(i+1) = outi Given initial state CInput sequence 011101Leads to output 110111Quality: 3 out of 5

  • Introductory example:evolving FSMs to predict primesP(n) = 1 if n is prime, 0 otherwiseI = N = {1,2,3,, n, }O = {0,1}Correct prediction: outi= P(in(i+1)) Fitness function:1 point for correct prediction of next input0 point for incorrect predictionPenalty for too many states

  • Introductory example:evolving FSMs to predict primesParent selection: each FSM is mutated onceMutation operators (one selected randomly):Change an output symbolChange a state transition (i.e. redirect edge) Add a stateDelete a stateChange the initial stateSurvivor selection: (+)Results: overfitting, after 202 inputs best FSM had one state and both outputs were 0, i.e., it always predicted not prime

  • Modern EPNo predefined representation in generalThus: no predefined mutation (must match representation)Often applies self-adaptation of mutation parametersIn the sequel we present one EP variant, not the canonical EP

  • Representation For continuous parameter optimisationChromosomes consist of two parts:Object variables: x1,,xnMutation step sizes: 1,,nFull size: x1,,xn, 1,,n

  • MutationChromosomes: x1,,xn, 1,,n i = i (1 + N(0,1))xi = xi + i Ni(0,1) 0.2boundary rule: < 0 = 0 Other variants proposed & tried:Lognormal scheme as in ESUsing variance instead of standard deviationMutate -lastOther distributions, e.g, Cauchy instead of Gaussian

  • Recombination NoneRationale: one point in the search space stands for a species, not for an individual and there can be no crossover between speciesMuch historical debate mutation vs. crossoverPragmatic approach seems to prevail today

  • Parent selectionEach individual creates one child by mutationThus: DeterministicNot biased by fitness

  • Survivor selectionP(t): parents, P(t): offspring Pairwise competitions, round-robin format:Each solution x from P(t) P(t) is evaluated against q other randomly chosen solutions For each comparison, a "win" is assigned if x is better than its opponentThe solutions with greatest number of wins are retained to be parents of next generationParameter q allows tuning selection pressure (typically q = 10)

  • Example application: the Ackley function (Bck et al 93)The Ackley function (with n =30):

    Representation: -30 < xi < 30 (coincidence of 30s!)30 variances as step sizesMutation with changing object variables first! Population size = 200, selection q = 10Termination after 200,000 fitness evalsResults: average best solution is 1.4 10 2

  • Example application: evolving checkers players (Fogel02)Neural nets for evaluating future values of moves are evolvedNNs have fixed structure with 5046 weights, these are evolved + one weight for kingsRepresentation: vector of 5046 real numbers for object variables (weights)vector of 5046 real numbers for sMutation: Gaussian, lognormal scheme with -firstPlus special mechanism for the kings weightPopulation size 15

  • Example application: evolving checkers players (Fogel02)Tournament size q = 5Programs (with NN inside) play against other programs, no human trainer or hard-wired intelligenceAfter 840 generation (6 months!) best strategy was tested against humans via InternetProgram earned expert class ranking outperforming 99.61% of all rated players

  • Genetic Programming (GP)Characteristic property: variable-size hierarchical representation vs. fixed-size linear in traditional EAsApplication domain: model optimization vs. input values in traditional EAsUnifying Paradigm: Program Induction

  • Program induction examplesOptimal controlPlanningSymbolic regressionAutomatic programmingDiscovering game playing strategiesForecastingInverse problem solvingDecision Tree inductionEvolution of emergent behaviorEvolution of cellular automata

  • GP specificationS-expressionsFunction setTerminal setArityCorrect expressionsClosure propertyStrongly typed GP

  • GP notes

    Mutation or recombination (not both)Bloat (survival of the fattest)Parsimony pressure

  • Learning Classifier Systems (LCS)Note: LCS is technically not a type of EA, but can utilize an EACondition-Action Rule Based Systems rule format: Reinforcement LearningLCS rule format: predicted payoffdont care symbols

  • LCS specificsMulti-step credit allocation Bucket Brigade algorithmRule Discovery Cycle EAPitt approach: each individual represents a complete rule setMichigan approach: each individual represents a single rule, a population represents the complete rule set

  • Parameter Tuning vs Control

    Parameter Tuning: A priori optimization of fixed strategy parameters

    Parameter Control: On-the-fly optimization of dynamic strategy parameters

  • Parameter Tuning methodsStart with stock parameter valuesManually adjust based on user intuitionMonte Carlo sampling of parameter values on a few (short) runsMeta-tuning algorithm (e.g., meta-EA)

  • Parameter Tuning drawbacksExhaustive search for optimal values of parameters, even assuming independency, is infeasibleParameter dependenciesExtremely time consumingOptimal values are very problem specificDifferent values may be optimal at different evolutionary stages

  • Parameter Control methodsDeterministicExample: replace pi with pi(t)akin to cooling schedule in Simulated AnnealingAdaptiveExample: Rechenbergs 1/5 success ruleSelf-adaptiveExample: Mutation-step size control in ES

  • Evaluation Function ControlExample 1: Parsimony Pressure in GPExample 2: Penalty Functions in Constraint Satisfaction Problems (aka Constrained Optimization Problems)

  • Penalty Function Controleval(x)=f(x)+W penalty(x)

    Deterministic ex: W=W(t)=(C t) with C,1

    Adaptive ex (page 135 of textbook)

    Self-adaptive ex (pages 135-136 of textbook)Note: this allows evolution to cheat!

  • Parameter Control aspectsWhat is changed?Parameters vs. operatorsWhat evidence informs the change?Absolute vs. relativeWhat is the scope of the change?Gene vs. individual vs. populationEx: one-bit allele for recombination operator selection (pairwise vs. vote)

  • Parameter control examplesRepresentation (GP:ADFs, delta coding)Evaluation function (objective function/)Mutation (ES)Recombination (Davis adaptive operator fitness:implicit bucket brigade)Selection (Boltzmann)PopulationMultiple

  • Parameterless EAsPrevious workDr. Ts EvoFree project

  • Multimodal ProblemsMultimodal def.: multiple local optima and at least one local optimum is not globally optimalBasins of attraction & NichesMotivation for identifying a diverse set of high quality solutions:Allow for human judgementSharp peak niches may be overfitted

  • Restricted MatingPanmictic vs. restricted matingFinite pop size + panmictic mating -> genetic driftLocal Adaptation (environmental niche)Punctuated EquilibriaEvolutionary StasisDemesSpeciation (end result of increasingly specialized adaptation to particular environmental niches)

  • EA spaces

    BiologyEAGeographicalAlgorithmicGenotypeRepresentationPhenotypeSolution

  • Implicit diverse solution identification (1)Multiple runs of standard EANon-uniform basins of attraction problematicIsland Model (coarse-grain parallel)Punctuated EquilibriaEpoch, migrationCommunication characteristicsInitialization: number of islands and respective population sizes

  • Implicit diverse solution identification (2)Diffusion Model EAsSingle Population, Single SpeciesOverlapping demes distributed within Algorithmic Space (e.g., grid)Equivalent to cellular automataAutomatic SpeciationGenotype/phenotype mating restrictions

  • Explicit diverse solution identificationFitness Sharing: individuals share fitness within their nicheCrowding: replace similar parents

  • Game-Theoretic ProblemsAdversarial search: multi-agent problem with conflicting utility functions

    Ultimatum GameSelect two subjects, A and BSubject A gets 10 units of currencyA has to make an offer (ultimatum) to B, anywhere from 0 to 10 of his unitsB has the option to accept or reject (no negotiation)If B accepts, A keeps the remaining units and B the offered units; otherwise they both loose all units

  • Real-World Game-Theoretic ProblemsReal-world examples: economic & military strategyarms controlcyber securitybargainingCommon problem: real-world games are typically incomputable

  • Armsraces

    Military armsracesPrisoners DilemmaBiological armsraces

  • Approximating incomputable gamesConsider the space of each users actionsPerform local search in these spacesSolution quality in one space is dependent on the search in the other spacesThe simultaneous search of co-dependent spaces is naturally modeled as an armsrace

  • Evolutionary armsraces

    Iterated evolutionary armsracesBiological armsraces revisitedIterated armsrace optimization is doomed!

  • Coevolutionary Algorithm (CoEA)A special type of EAs where the fitness of an individual is dependent on other individuals. (i.e., individuals are explicitely part of the environment)

    Single species vs. multiple speciesCooperative vs. competitive coevolution

  • CoEA difficulties (1)DisengagementOccurs when one population evolves so much faster than the other that all individuals of the other are utterly defeated, making it impossible to differentiate between better and worse individuals without which there can be no evolution

  • CoEA difficulties (2)CyclingOccurs when populations have lost the genetic knowledge of how to defeat an earlier generation adversary and that adversary re-evolvesPotentially this can cause an infinite loop in which the populations continue to evolve but do not improve

  • CoEA difficulties (3)Suboptimal Equilibrium(aka Mediocre Stability)Occurs when the system stabilizes in a suboptimal equilibrium

  • Case Study from Critical Infrastructure Protection

    Infrastructure HardeningHardenings (defenders) versus contingencies (attackers)Hardenings need to balance spare flow capacity with flow control

  • Case Study from Automated Software Engineering

    Automated Software CorrectionPrograms (defenders) versus test cases (attackers)Programs encoded with Genetic ProgrammingProgram specification encoded in fitness function (correctness critical!)

  • Multi-Objective EAs (MOEAs)

    Extension of regular EA which maps multiple objective values to single fitness valueObjectives typically conflictIn a standard EA, an individual A is said to be better than an individual B if A has a higher fitness value than BIn a MOEA, an individual A is said to be better than an individual B if A dominates B

  • Domination in MOEAs

    An individual A is said to dominate individual B iff:A is no worse than B in all objectivesA is strictly better than B in at least one objective

  • Pareto Optimality (Vilfredo Pareto)Given a set of alternative allocations of, say, goods or income for a set of individuals, a movement from one allocation to another that can make at least one individual better off without making any other individual worse off is called a Pareto Improvement. An allocation is Pareto Optimal when no further Pareto Improvements can be made. This is often called a Strong Pareto Optimum (SPO).

  • Pareto Optimality in MOEAsAmong a set of solutions P, the non-dominated subset of solutions P are those that are not dominated by any member of the set PThe non-dominated subset of the entire feasible search space S is the globally Pareto-optimal set

  • Goals of MOEAsIdentify the Global Pareto-Optimal set of solutions (aka the Pareto Optimal Front)Find a sufficient coverage of that setFind an even distribution of solutions

  • MOEA metrics

    Convergence: How close is a generated solution set to the true Pareto-optimal front

    Diversity: Are the generated solutions evenly distributed, or are they in clusters

  • Deterioration in MOEAsCompetition can result in the loss of a non-dominated solution which dominated a previously generated solutionThis loss in its turn can result in the previously generated solution being regenerated and surviving

  • NSGA-IIInitialization before primary loopCreate initial population P0Sort P0 on the basis of non-dominationBest level is level 1Fitness is set to level number; lower number, higher fitnessBinary Tournament SelectionMutation and Recombination create Q0

  • NSGA-II (cont.)Primary LoopRt = Pt + QtSort Rt on the basis of non-dominationCreate Pt + 1 by adding the best individuals from RtCreate Qt + 1 by performing Binary Tournament Selection, Mutation, and Recombination on Pt + 1

  • Epsilon-MOEASteady StateElitistNo deterioration

  • Epsilon-MOEA (cont.)Create an initial population P(0)Epsilon non-dominated solutions from P(0) are put into an archive population E(0)Choose one individual from E, and one from PThese individuals mate and produce an offspring, cA special array B is created for c, which consists of abbreviated versions of the objective values from c

  • Epsilon-MOEA (cont.)An attempt to insert c into the archive population EThe domination check is conducted using the B array instead of the actual objective valuesIf c dominates a member of the archive, that member will be replaced with cThe individual c can also be inserted into P in a similar manner using a standard domination check

  • SNDL-MOEADesired FeaturesDeterioration PreventionStored non-domination levels (NSGA-II)Number and size of levels user configurableSelection methods utilizing levels in different waysProblem specific representationProblem specific compartments (E-MOEA)Problem specific mutation and crossover