Genetic Programming in Mathematica by Hussein Suleman submitted in fulfilment of the requirements for the degree of Magister Scientiae in the Department of Computer Science in the Faculty of Science at the University of Durban-Westville. Supervisor : Dr. M. Hajek Date Submitted : 15 January 1997
179
Embed
Genetic Programming in Mathematica - hussein suleman · Genetic Programming in Mathematica ... General Algorithm ... CRITERIA FOR PID CONTROLLERS ...
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Genetic
Programming in
Mathematica
by Hussein Suleman
submitted in fulfilment of the requirements for the degree of Magister Scientiae in
the Department of Computer Science in the Faculty of Science at the University of
Durban-Westville.
Supervisor : Dr. M. Hajek
Date Submitted : 15 January 1997
Page i
Declaration
I, Hussein Suleman, Reg. No. : 9144784,
hereby declare that the thesis entitled
Genetic Programming in Mathematica
is the result of my own investigation and research and that it has not been submitted in
part or in full for any other degree or to any other University.
………………………. …………………….
Signature Date
Page ii
1ACKNOWLEDGEMENTS
My heartfelt thanks go to my supervisor, Dr M. Hajek, for his ever-willing assistance
throughout my studies, the staff of the Department of Computer Science, my family
and friends for supporting all my endeavours, and God, without whose guidance none
TABLE 5.7. GP PARAMETERS FOR CSTR ...............................................................................................112
TABLE 5.8. GP PARAMETERS FOR PID CONTROLLER ............................................................................116
TABLE 5.9. CRITERIA FOR PID CONTROLLERS........................................................................................120
TABLE 5.10. GP PARAMETERS FOR MAGIC STAR...................................................................................124
Page 1
1ABSTRACT
Genetic Programming (GP) is an implementation of evolutionary programming, where
the problem-solving domain is modelled on computer and the algorithm attempts to
find a solution by the process of simulated evolution, employing the biological theory
of genetics and the Darwinian principle of survival of the fittest. GP is distinct from
other techniques because of its tree representation and manipulation of all solutions.
GP has traditionally been implemented in LISP but there is a slow migration towards
faster languages like C++. Any implementation language is dictated not only by the
speed of the platform but also by the desirability of such an implementation. With a
large number of scientists migrating to scientifically-biased programming languages
like Mathematica, such provides an ideal testbed for GP.
In this study it was attempted to implement GP on a Mathematica platform, exploiting
the advantages of Mathematica’s unique capabilities. Wherever possible,
optimizations have been applied to drive the GP algorithm towards realistic goals. At
an early stage it was noted that the standard GP algorithm could be significantly
speeded up by parallelisation and the distribution of processing. This was incorporated
into the algorithm, using known techniques and Mathematica-specific knowledge.
Benchmark problems were tested on both the serial and parallel algorithms to assess
the ability of the implementation to effectively solve problems using GP. Mostly
known problems were used since it was desired to test the implementation and not the
capabilities of the algorithm itself.
Mathematica has been found to be suitable for the implementation of GP in cases
where the problem domain has been modelled already in this environment. Although
Mathematica is not an optimal environment for the execution of a GP, it is highly
adaptable to different problem domains, thus promoting the implementation of
problem-solving techniques like GP.
Page 2
1CHAPTER 1 : INTRODUCTION
The Evolutionary Paradigm of Programming
Computer Science had its beginnings when scientists built the first computers and
realised that these machines needed to be constantly tended. This tending took the
form of writing programs and thereafter maintaining these programs and their data. At
first it was a rather haphazard process, with programmers writing code on the spur of
the moment and then changing their programs to suit changes in the environment or
the requirements. As time passed, this disorderly process caused more problems than
solutions and Computer Science began to turn its head towards the formal
specification of programming.
The programming of computers can be considered as the focus of research in
Computer Science. In recent years, people have been asking very pertinent questions
regarding the speed and size of programs. There has been a quest to write programs
that run faster and use less memory and storage. Also, some programs are sought
simply for parsimony or the ability to prove correctness mathematically. But, like any
other scientific field, the thrust of work is not on efficiency but on new developments.
Problems from all aspects of life are modelled on computer and new solutions are
being constantly sought.
People from varied disciplines implement their problem-solving methodologies on
computer. In many cases an existing sequence of steps is known and this simply needs
to be converted into a computer program. In other situations, only raw data is
available and this then needs to be processed to generate useful information. Both
scenarios require that computer programs be written, whether by the user or an
external party.
Programming, by its very creative nature, is an intuitive process that cannot be broken
down into finite determinate steps. Many people argue for and against this standpoint.
Software engineers argue quite strongly that software can be created using a pre-
defined series of steps in a determinate manner [Schach, 1992]. But they also agree
Page 3
that innovations in programming cannot follow this same process. Ultimately, a
program has to be written and that program cannot always be created in a definite
manner. This implies that a programmer will have to intuitively devise a new
algorithm, using and incorporating existing algorithms. Being a creative process, it
takes an unknown amount of time and resources to accomplish. Also, the programmer
never knows for certain whether the problem will be solved (except for some cases
where this is proven mathematically in advance) by the program. Some problems do
not even lend themselves to a program, although most of these are ferreted out by the
experienced programmer.
Whatever the case may be, an experienced programmer has to devote an unknown
amount of time in order to solve any moderately complex problem. This in itself is a
problem worthy of study. How can this programming task be made easier ? Classical
computer science has proposed many techniques to ease programming by
modularising the data and programs e.g. object-orientation. Artificial intelligence
suggests different approaches which consider computer programs as simply “black
boxes” which convert input into the appropriate output.
Neural networks are a popular strategy for problem solving nowadays. Using this
approach, a computer model of the human brain is created and this then learns the
relationship between the input and output. Information is stored internally in the form
of a matrix of weights, where each weight refers to the relative ability of one neuron to
fire another one. This “connectionist” approach is used widely because of its ability to
simulate the learning and recollection process of human thought. However, it does
have some disadvantages, namely the requirement that the inter-neuron connections be
seeded before learning can begin (in back-propagation learning). This initial state has
to be determined experimentally and this makes it somewhat similar to the classical
program because an expert needs to set up the neural network.
The “non-connectionist” school of artificial intelligence has tried to implement the
black-box computer component by modelling it on existing systems other than the
human brain. One of the most popular approaches is to model the computer on nature.
Nature has succeeded in solving a rather complex problem, that of creating and
sustaining life. In order to do this, simple living organisms were first introduced into
Page 4
the environment. Then these organisms underwent a transformation process through
evolution, lasting many millions of years. The current set of organisms that inhabits
the world is far stronger and better adapted to its environment than its predecessors.
For example, the ratio of diameters of blood vessels in the human body allows for
better flow according to modern fluid dynamics [Hietkotter, 1995]. But this was a
result of evolution and not some individual’s calculations. So if problem-solving is
modelled on evolution, it may be possible to discover solutions that are optimal or
better than the analytical ones.
Evolution was a theory proposed by Darwin [Darwin, 1959] to explain the creation of
life. He proposed that the nature of living creatures changed over the years to result in
stronger specimens, better suited to the environment, being formed. The better
specimens would then dominate and the lesser individuals would eventually cease to
exist. This is commonly known as “survival of the fittest” . This does not preclude the
evolutionary process creating individuals that are less fit than their predecessors. In
such cases, the new generation individuals would simply perish and their ancestors
would continue to thrive, until they can generate better specimens.
This does not suggest that evolutionary techniques are the solution to all our
problems. Evolution itself does not guarantee the creation of fitter individuals. It does
however, explore many possibilities that may lead to stronger individuals. There is no
ultimate goal or problem that must be solved by natural evolution. Instead organisms
are constantly changed to suit the environment, which changes just as rapidly.
Similarly, in an artificial environment of simulated evolution, solutions can be
gradually adapted to satisfy the problem specification with greater accuracy.
According to modern theory of genetics, the fabric of our being is stored as a set of
attributes in our DNA (genes). An individual’s genes are like a blueprint to create that
individual, since it is a complete description. When two parents mate to produce
offspring, the children receive some genetic material from each parent. This crossing
over of the genetic material allows nature to create individuals different from either
parent.
Page 5
For example, consider a monkey population where long tails are desired and long
noses are not. If one parent with a long tail and short nose mates with another with a
short tail and long nose, the offspring could have any combination of these features. If
the child has a long nose and short tail, that child would not be very strong since it
cannot hang from branches and its nose would always get in the way - it would
probably not reproduce since none of the other monkeys would be attracted to a weak
individual. On the other hand, a child with a long tail and short nose would be ideally
suited to the monkey’s environment. This child would be the fitter of the two and
would propagate its genes in future generations.
Computer programs modelled on nature, normally associate possible solutions with
the populations of individuals from nature. Then these solutions undergo a simulated
evolution to attempt to produce better individuals. Just like nature, this process is
quasi-random and solutions generated can be either better or worse than their parents.
However, the probability of producing better solutions in this way is much higher than
a blind random search through the solution space [Koza, 1992]. There exist many
different approaches to this modelling, the most common being Genetic Algorithms,
Evolutionary Programming, Evolution Strategies and Genetic Programming [Kinnear,
1994]. Collectively these are known as Evolutionary Algorithms. An evolutionary
algorithm has the following general structure :
initialise a random generation of individuals Pop = initpopulation (G) evaluate the fitnesses of individuals in the population evaluate (G) while not done do // select couples for reproduction Pop1 = select (Pop); // apply genetic operations to genes Pop1 = genetic operations (Pop1); // evaluate fitnesses of new population evaluate (Pop1); // merge new individuals into the existing population Pop = merge (Pop1);
Page 6
Genetic Algorithms
In order to understand Genetic Programming, it is first vital to consider the alternative
approaches to evolutionary programming that led to its creation. Most discussions on
genetic programming begin with an explanation of genetic algorithms, being the direct
predecessor of genetic programming [Koza, 1992; Andre, 1994].
Genetic Algorithms (GAs) are evolutionary programs that manipulate a population of
individuals represented by fixed-format strings of information. Their acceptance as a
means to solve real-world optimization problems is readily attributable to the theory
of artificial adaptation discussed in the ground-breaking work of Holland [Holland,
1992]. An initial population of individuals (solutions) is generated for the problem
domain and these then undergo evolution by means of reproduction, crossover and
mutation of individuals until an acceptable solution is found.
Genetic algorithms, like most other evolutionary computation techniques, require that
only the parameters for the problem be specified. Thereafter the algorithm applied to
search for a solution is mostly problem-independent .
As an inheritance from its biological counterpart, in genetic algorithms each character
in the individual’s data string is called a gene. Each possible value that the gene can
take on is called an allele. These concepts are elaborated upon in numerous texts on
biological genetics e.g. Hartl [Hartl, 1988].
For the purposes of the following discussion of genetic algorithms, the problem being
solved is finding the square root of 2.
Representation of Problem
The representation of the problem domain is one of the most important factors when
designing a genetic algorithm. Genetic algorithms usually represent all solutions in the
form of fixed length character strings, analogous to the DNA that is found in living
organisms. There are a few genetic algorithm implementations that make use of
variable-length strings and other representations [Michalewicz, 1992] but these are not
common. The reason for the fixed length character strings is to allow easier
manipulation, storage, modelling and implementation of the genetic algorithm.
Page 7
Consider the example of finding the square root of two. The first step would be to
identify a possible range of solutions. Assuming no knowledge of the solution, it
would be possible to deduce that the solution lies between zero and the number itself
(in this case 2). Since it is known that the square of 1 is one, all numbers less than one
can be removed. Also, the square of two will produce 4 so that can be eliminated as
well. Thus the range is reduced to numbers greater than 1 and less than 2 - no solution
to this problem can lie outside of this range. Of course, negative numbers can also
produce the same results but since negative numbers are only different in sign, only
the positive numbers need be considered. The next step is to represent all numbers
between 1 and 2 with a fixed length character string. Binary numbers are usually
utilised for numerical computations such as this. The reasons for this are outlined
below. Binary numbers also allow for easy conversion to and from the exact solution.
However, since there are obviously infinitely many real numbers between 1 and 2,
fixed-length strings pose an additional problem for the programmer. To solve this, the
real number range must be discretized into a finite number of constituent real number
segments, corresponding to each binary number used in the character string. Suppose
that the character strings have a length of n=10. Then the possible values for the
character string would be from 0000000000 to 1111111111.
These binary numbers must be mapped onto the range of possible solutions, viz. the
numbers between 1 and 2. There are 1024 (2n) distinct numbers in the binary range,
hence the numbers start from 0 and end at 1023 (2n -1). The 1 (solution space) is
1 1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 0
...
Figure 1.1 Bit-string GA representation
Page 8
mapped onto the 0 (binary) and the 2 (solution space) is mapped onto the 1023
(binary). All other binary numbers are mapped linearly onto the real solution range.
One of the reasons for using binary numbers is to disallow incorrectly formatted
solutions automatically. Every combination of 1’s and 0’s corresponds to a possible
solution. Decimal numbers can be used but since the solution range is between 1 and
2, a remapping process would have to be carried out to exclude the numbers greater
than 2 or less than 1. In binary, it is easier to visualise some characteristics being
present (by a 1) or absent (by a 0). This is more applicable to non-numeric problem
domains. In addition, there are only two possible binary values (1 and 0). This means
that all possible binary values can be generated by these two values. Thus the binary
individuals 0000000000 and 1111111111 contain all the genetic material possible i.e.
they span the solution space. With representations of a larger order (e.g. decimal), the
number of individuals needed to span the solution space is much larger and this has
repercussions on the speed at which the genetic algorithm finds a solution and the size
of the parameters needed.
Population of Solutions
A collection of possible solutions is kept throughout the life cycle of the genetic
algorithm. This collection is generally known as the population since it is analogous to
a population of living organisms. The population can be either of fixed or variable size
but fixed size populations are used more often so that the exact amount of computer
...
binary
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 1 0
1 1 1 1 1 1 1 1 1 1
binary value real equivalent
0
1
2
1023
...
1 + (0/1023) = 1
1 + (1/1023)
1 + (2/1023)
1 + (1023/1023) = 2
...
Figure 1.2. Conversion from bit-string to real representation
Page 9
resources can be pre-determined. The population of solutions is stored in main
memory or on secondary storage, depending on the type of genetic algorithm and
computer resources available.
At the very beginning of the algorithm, a population of solutions is generated
randomly. In the case of the square root problem, a fixed number of 10 character
binary strings are generated randomly.
This population is then modified through the mechanisms of evolution to result
eventually in individuals that are closer to the solution than these initial random ones.
Fitness
Darwinian evolution of a population implies that the strongest individuals will
survive. To implement such a principle necessitates a means of evaluating the relative
strength, or fitness, of each individual. In terms of the genetic algorithm, the fitness of
an individual is a numerical assessment of that individual’s ability to solve the
problem at hand - it is the ability of the individual to satisfy the requirements of the
environment.
In terms of the square root problem, the perfect individual is the numerical value
approximated by 1.414213562373. This can therefore be regarded as the fittest
solution. Since fitness is quantified numerically, maximum and minimum fitness
values of 1 and 0 are normally used. According to this scale, the perfect solution
...
random individuals
0 0 1 0 0 1 0 1 1 0
0 1 1 0 1 1 0 0 1 1
1 1 0 1 0 0 1 1 1 0
1 0 1 0 0 1 0 0 0 1
individual no
1
2
3
100
...
Figure 1.3. Initial random population
Page 10
above represents a fitness of 1. The minimum fitness must be the absolutely worst
solution possible, to ensure that all solutions are in the range 0-1. In the square root
problem, the worst solution is “2”, hence the fitness of the solution “2” would be 0.
Although it is possible to find distinct best and worst case values in this problem it is
not possible for all problem domains. However, every possible individual in the
solution space must be restricted to the fitness range 0-1.
Fitness is normally defined as a function that takes as its single parameter the
individual and returns a real number representing the fitness value of that individual.
Fitness cannot be calculated by comparing the perfect solution with the individual
simply because the perfect solution is not known at the time of calculation. Thus it has
to be calculated from other information in the specification. In the case of the square
root problem, the fitness of an individual can be calculated by squaring its numerical
value and then comparing this to 2. The results can then be scaled to fit in the range 0
to 1. The following fitness function satisfies these criteria.
In addition to assigning the boundary values, the fitness function must also be able to
assign values to every other solution in the solution space. The intuitively better
solutions must be allocated better fitnesses than the worse solutions. This is necessary
so that the better solution can be selected over the worse one when comparisons are
being made. For numerical calculations the fitness function is chosen as a relative
error (as is done above in Equation 1.1) to achieve this aim. In economic problems,
the profit can be used to generate a fitness function - greater profit tends towards a
perfect solution while lesser profit has lower fitness values.
Page 11
The table in Figure 1.4 represents some sample solutions in the initial random
population, together with their associated actual values and their fitnesses. The best
solution displayed is in the second line, as it has the lowest fitness - it is also the value
closest to the perfect solution, as expected.
Reproduction
The vehicle of all evolutionary change in the genetic algorithm is reproduction. The
reproduction operation allows the population to progress from one generation into the
next. This progression occurs in the most natural way possible, favouring the fitter
individuals. Individuals are selected from one generation of the population to be
injected into the next generation. This new generation is a permutation (with
duplicates) of the original population and when completely formed, it replaces the
original population.
The selection process is based on the fitnesses of the individuals. Generally,
individuals with a higher fitness are selected more often than individuals with a lower
fitness. There have been many strategies to implement this tendency to select fitter
individuals.
The most common method is called fitness-proportionate reproduction. In this
approach, the probability of selecting each individual is proportionate to its fitness.
Thus the fitter individuals get selected more often than the less fit individuals. This
...
random individuals
0 0 1 0 0 1 0 1 1 0
0 1 1 0 1 1 0 0 1 1
1 1 0 1 0 0 1 1 1 0
1 0 1 0 0 1 0 0 0 1
binary value
278
435
846
657
...
solution
1.2717
1.4252
1.8270
1.6422
...
fitness
0.1913
0.0156
0.6689
0.3485
...
Figure 1.4. Selected individuals with corresponding real values and fitnesses
Page 12
leads to some individuals being selected more than once and others not being selected
at all, which is only natural as the better individuals flourish while those that are not
good enough perish.
The roulette wheel implementation implicitly forces fitness-proportionate
reproduction. In this approach, the fitnesses of all individuals in the population are
arranged into a list and then summed. A random number in the range of the sum is
generated. Then the fitnesses in the list are summated again until the random number
is reached or exceeded. The last individual in the list is the one chosen. The method
works because the individuals with higher fitnesses occupy a larger portion of the
range from which a random number is being selected - therefore they can be selected
more often. This process is repeated until enough individuals are selected to replace
the whole of the last generation.
Another common approach to selecting individuals is tournament selection. Two
individuals are selected from the population and their fitnesses are compared. The one
with the higher fitness is progressed into the next generation. The tournament can also
be carried out among more than 2 individuals (K-tournament selection).
Elitism is a strategy where the highly fit individuals are explicitly favoured. This can
be useful when the fitnesses are linear and the problem has a single solution.
However, most fitness functions do not produce a linear relationship between
individuals and their fitnesses i.e. there are local minima in the range of fitness values.
individual 1
23
…
Figure 1.5. Roulette wheel individual selection
Page 13
The restrictive nature of elitism could cause convergence to one of those local
minima, which is most likely a far from optimal solution.
Crossover
Reproduction on its own cannot cause a population of solutions to evolve since the
individuals from one generation are simply being copied into the next generation of
the population. In order for the fitnesses of individuals to improve, there must be a
sharing of genetic material. Crossover swaps some of the genetic material of two
individuals, creating two new individuals (children), who are possibly better than their
parents. This is analogous to genetic crossover as observed in living organisms.
In genetic algorithms, crossover is implemented by selecting a point in the character
string and swapping all characters after that point. This selection point is generated
randomly and the operation is applied to two individuals of the newly reproduced
population.
The result of the crossover genetic operation is two individuals who are possibly fitter
than their parents. In any event, these individuals are added to the new generation
parent 1
0 0 1 0 0 1 0 1 1 0 1 0 1 0 0 1 0 0 0 1
parent 2
crossover point
0 0 1 0 0 1 0 1 1 0 1 0 1 0 0 1 0 0 0 1
0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 1 0 1 1 0
child 10 0 1 0 0 1 0 0 0 1 1 0 1 0 0 1 0 1 1 0
child 2
CROSSOVER
Figure 1.6. Crossover of two individuals in GA
Page 14
being created. The simplest strategy is to replace the parents with the children. That
way each parent only participates in crossover once. An alternative is to inject the
children into the population and replace a pair of individuals with relatively low
fitness. Using fitness-proportionate reproduction, this strategy is unnecessary since the
population potentially contains more than one copy of the fitter individuals.
This genetic operator does not have to use only one crossover point. Instead, many
crossover points can be chosen, and the genetic material exchanged at each point. If
two crossover points are chosen, then, effectively, the genes between the points are
exchanged.
Mutation
During reproduction, fitter individuals in a population are selected more often than
others. This leads to some individuals not being selected for promotion into the next
generation. These are generally the least fit individuals. However, they may contain
within their structure genes which are part of a better solution. This genetic material is
lost to the population since the individuals are no longer propagated.
In order to recover from this loss of genetic material, the individuals are allowed to
change their genes randomly. This is a slight perturbation in the genetic material
which occurs with a much lower frequency than crossover. A random point or points
are chosen in the character string. A random allele is then generated and inserted at
each of the mutation points.
Like crossover, mutation can create individuals who replace their parents in the new
generation, or they can be added to the population. Individuals must be removed so
that the population does not grow unmanageably large. The primary reason for this is
to make genetic algorithms feasible for practical implementation.
Page 15
General Algorithm
//start with an initial generation G = 0 //initialise a random generation of fixed-format strings Pop = initpopulation (G) //evaluate the fitnesses of individuals in the population evaluate (G) while not done do // increase generation counter G++ // generate new population using fitness-proportionate reproduction Pop1 = select (Pop); // crossover genes Pop1 = crossover (Pop1); // mutate genes Pop1 = mutate (Pop1); // evaluate fitnesses of new population evaluate (Pop1); // replace population with new generation Pop = Pop1;
There are various alternatives and modifications of this algorithm but the essential
structure is always the same. One common change is to incorporate the reproduction
parent
0 0 1 0 0 1 0 1 1 0
mutation point
MUTATION
1 random allele
child0 0 1 0 1 1 0 1 1 0
Figure 1.7. Mutation of an individual in
GA
Page 16
operation into the crossover and mutation operations - individuals are selected fitness-
proportionately, crossed over (or mutated) and inserted into the new generation in a
single operation.
John Holland’s Schema Theorem [Holland, 1992] is widely accepted as mathematical
proof that the genetic algorithm, due to its fitness-proportionate reproduction,
converges to better solutions. According to the schema theorem, individuals are
grouped into schemata according to particular subsets of their genes. The number of
individuals in each group converges if the fitness of that group relative to the entire
population is high, and vice versa. This result is slightly modified by the crossover and
mutation operations which create new individuals from the existing population,
implicitly changing the schemata into which individuals fall.
Evolutionary Programming and Evolution Strategies
Genetic algorithms are just one example of a paradigm of evolutionary programming.
Other techniques were created, with many similarities to genetic algorithms as
discussed by Heitkotter and Kinnear [Heitkotter, 1995; Kinnear, 1994].
Evolutionary Programming, conceived by Fogel in 1960, uses only mutation as a
means to improve the fitness of individuals. Individuals can be represented by any
convenient syntax, since there is no crossover operation. The population is propagated
from one generation to another by applying the mutation operation in varying degrees
according to the proximity of the individual to the expected solution.
Simultaneously with the development of evolutionary programming, a group of
students in Germany, Rechenberg and Schwefel, developed a strategy to optimise
shapes of bodies in a wind tunnel. Their technique uses a population of solutions,
changed by normally distributed random mutations. Each individual contains both
objective and strategy variables - objective variables are representations of the
problem domain while strategy variables indicate the decreasing mutation rates to be
deployed.
Page 17
Genetic Programming
Genetic algorithms, although very useful for simple problems, can restrict complex
problems due to its inability to represent individuals other than fixed-format character
strings. Genetic Programming is a generalisation of genetic algorithms devised by
Koza [Koza, 1992]. It is readily accepted that the most general form of a solution to a
computer-modelled problem is a computer program. Genetic Programming (hereafter
known as GP) takes cognizance of this and attempts to use computer programs as its
data representation.
Similarly to genetic algorithms, genetic programming needs only that the problem be
specified. Then the program searches for a solution in a problem-independent manner.
Most genetic operators can be implemented, albeit somewhat differently from its
predecessors. Although Koza has suggested definitional guidelines for GP, these have
been relaxed in attempts to achieve greater efficiency with reduced computer
resources.
Representation
Each individual in a genetic program is a computer program. However, this definition
is a little vague since there is no general structure for all computer programs. On
different platforms with differing compilers and interpreters, the structure of the
programs can be different. GP is not specific in this regard - it can be applied in all
cases.
Most classical programming languages can have their programs represented as
sequences of functions. These functions can operate on constants or variables or the
results of other functions. This lends itself to a tree structure for a typical program.
Computer programs in GP are viewed as free-format trees, consisting of leaves
(variables and constants) and non-terminal nodes (functions).
Page 18
Any mathematical expression can be considered as a computer program since it takes
input, processes the input and produces output. The expressions in Figure 1.8 are
therefore proper programs and can be used to generalise the capabilities of the GP
algorithm. The tree representation indicates how the GP ought to store the program
internally. The method of storage is not critical as long as the algorithm can
manipulate the individual solutions as trees.
In the illustrated example, there are only two variables, two constants and three
functions, which totally define the expression. However, real-life computer programs
can use many hundreds of variables and functions to solve a modestly complex
problem. Although such problems are still not feasible for solution by GP, it has been
recognised that the number of variables and functions has a significant impact on the
efficiency and scale of GP. Hence, the number of variables, constants and functions
needs to be reduced by eliminating those not necessary in a particular problem
domain. The functions, appearing only in intermediate nodes, are called the non-
terminals. Variables and constants, appearing only on the leaves of the tree, are
appropriately called terminals. The non-terminal set for the example is {+, /, *} and
the terminal set is {x, y, 3, 5}.
+
* /
3 x * 5
y y
35
2
xy
+
standard expression notation tree representation
Figure 1.8. Representation of individuals as trees in GP
Page 19
The terminal set is the set of all alleles that can appear at the leaves of a GP tree while
the non-terminals are the acceptable functions. These two sets define the search space
for the problem - every tree constructed has to get its nodes from the terminal and
non-terminal sets. The size of the search space is determined by the sizes of these two
sets. An increase in the size of the non-terminal set results in a linear increase in the
size of the search space. However, an increase in the size of the terminal set results in
an exponential increase in the search space size, since the combinations of parameters
available to every function is also increased.
On the other hand, if a terminal or non-terminal set does not contain sufficient variety,
it may not be possible to represent some solutions. For example, the expression “-3”
cannot in any way be represented by selecting terminals and non-terminals from the
given sets. Thus there are two important considerations when selecting terminal and
non-terminal sets. Firstly, the set must span the solution space completely. Secondly,
these sets must be as compact as possible, to prevent extraneous searches.
For example, if Boolean functions are being considered, then the non-terminal set
needs only contain {AND, OR, NOT} [Koza, 1992]. These functions are not the
absolute minimum to span the solution space, but the inclusion of a small degree of
redundancy allows for the formation of smaller computer programs (expressions).
Koza has also suggested that every function in the non-terminal set must operate only
within the scope of the terminal set. The functions must be capable of taking on every
combination of terminals possible, and the return values must be in the range of the
terminal set. By requiring this of all functions, there is no possibility of parameter
incompatibilities. It also allows functions to be nested without restriction. This is an
obvious feature of some functions but exceptions must be catered for. If the terminal
set contains integers and the non-terminal set the standard operators {+, -, /, *}, then
division by zero is a distinct possibility. To cater for this, the division operation can be
modified or overloaded so that division by zero returns a large number instead of an
error. This protection of functions enables closure of the non-terminal set.
Alternatives to closure include the use of strongly-typed GP, where each non-terminal
has a pre-specified return value type, which may be different for various functions.
Page 20
Haynes [Haynes, 1995] has used this strategy successfully to optimise an artificial
predator/prey scenario in a manner better than the standard GP.
Population of Solutions
Similarly to a GA, genetic programming first constructs a population of random
individuals and then processes these by simulated evolution. The random individuals
in this case are random trees. Due to the closure property of the non-terminal set, it is
possible to recursively create any combination of terminals and non-terminals.
Populations in GP are normally much larger than those in genetic algorithms. This is
chiefly because of the unrestrained nature of the representation. While a GA allows
only fixed-format strings, trees have much greater diversity of size and structure. To
accommodate this greater diversity, larger populations are necessary.
+
* /
3 x * 5
y y
individual 1 individual 2
/
* 3
x y
xy
3 35
2
xy
+
Figure 1.9. Extract from population of GP trees and corresponding expression representation
Page 21
Fitness
Since individuals are represented as computer programs, the obvious method of
testing effectiveness of the solutions would be to execute the programs. Then some
means of measuring the performance (error, time taken, etc.) can be used as the fitness
measure. This adds extra overhead to the GP algorithm since each individual has to be
executed to determine its fitness. Also, most programming languages do not support
the execution of data items or dynamic conversion between data and code. In such
cases, an interpreter has to be incorporated into the algorithm.
The raw fitness of an individual is the fitness value calculated directly from the
execution of the program. This value is not bound to any range so its needs to be
modified before it can be used constructively. The standardised fitness converts the
raw fitness to a zero-centric function - the standardised fitness of an individual is zero
for the best individual and higher for individuals of lower fitness. The standardised
fitness attempts to restrict the fitnesses to the range of positive real numbers only. The
adjusted fitness changes the fitness value so that it lies strictly within the 0-1 range.
This is useful to standardise the result designation and make statistics more
meaningful. The adjusted fitness can be generating trivially from the standardised
fitness by the following function.
AdjustedFitness xStandardizedFitness x
( )( )
=+
1
1 ....... (1.2)
Kinnear [Kinnear, 1994] stresses the importance of using a fitness function that not
only generates the right boundary conditions but also allocates appropriate fitness
values for all other expressions. If partial credit is not given for containing features
that lead to a better solution, then the fitness function would not be effective.
Reproduction
Fitness-proportionate reproduction in GP is identical to GAs, since the change in
representation has no effect on the copying of individuals. In order to produce a new
generation, only the fitnesses need be known, and these are gleaned from the adjusted
fitness function applied to all the individuals in the original population.
Page 22
Crossover
Crossover is applied to a pair of individuals from the newly reproduced population in
order to exchange genetic material. In the case of the classic GA, genetic material took
the form of sub-strings of the character string representation. GP, on the other hand,
exchanges sub-trees of the individuals in order to create new individuals. Since the
non-terminals have achieved closure, it is possible to exchange a sub-tree rooted with
a non-terminal with one rooted by a terminal since the non-terminal function produces
a return value in the range of the terminal set.
Another difference between GAs and GP is in the selection of crossover points. In
GAs, a single crossover point was chosen and applied to both individuals. In GP this
is not possible since the individuals may have different structures, so instead different
crossover points are generated for each individual.
Page 23
Mutation
Mutation is not necessary in GP because the large population sizes almost always
ensure that the genetic material cannot be easily lost. However, large population sizes
x
*
+
* /
3 * 5
y y
parent 1 parent 2
/
3
x y
xy
33
5
2
xy
+
crossover point
crossover point
x
*
+
* /
3 * 5
y y
child 1 child 2
/
3
x y
x
33
5
2
xyy
+
Figure 1.10. Crossover of two individuals in GP
Page 24
require lots of resources and, in the absence of these, steps have to be taken to recover
the genetic material. Also, taking into account the successes of mutation-based
evolutionary computing, this genetic operator cannot be simply ignored.
Just as in crossover, mutation is applied to a randomly chosen sub-tree in the
individual. This sub-tree is removed from the individual and replaced with a new
randomly created sub-tree.
General Algorithm
// start with an initial generation G = 0 // initialise a random generation of trees from the terminals and non-terminals Pop = initpopulation (G) // evaluate the fitnesses of individuals in the population evaluate (G) while not done do // increase generation counter G++ // generate new population using fitness-proportionate
3x *
+
* /
3 5
y y
parent
35
2
xy
+
mutation point
+
* /
3 5
child
x
33
5xy +
Figure 1.11. Mutation of an individual in GP
Page 25
reproduction Pop1 = select (Pop); // crossover sub-trees Pop1 = crossover (Pop1); // mutate sub-trees Pop1 = mutate (Pop1); // evaluate fitnesses of new population evaluate (Pop1); // replace population with new generation Pop = Pop1;
It is apparent that the general algorithm for GP is nearly identical to the GA. As far as
implementation is concerned, the major difference is in the representation. But this
difference is sufficient to necessitate changes in the genetic operators and all other
manipulation routines in the algorithm. There are also implicit differences that affect
the efficiency or conceptualisation of GP as compared to standard GAs.
Applications of GP
In traditional evolutionary algorithms, the optimization of existing solutions is a large
research area because the algorithms are more suited to slight perturbations rather than
outright changes (evolutionary programming and evolution strategies). GAs have the
limitation that the structure of the solution needs to be known in advance in order that
it may be modelled by the fixed character string. Although some work has been done
on variable-length GA strings, this is sufficiently different from the original algorithm
to fall within the ambit of GP itself. GP has no such restrictions on representation
therefore the scope of applications is much broader. In an ideal situation, any
application which requires a solution in the form of a computer program can be solved
using a GP.
Koza [Koza, 1992] applied the GP to many benchmark problems that are still used to
test the capabilities of GP systems. The most famous of those problems is that of
symbolic regression. A set of points is generated from some test data and an equation
passing through the points is sought. There exists no definite analytic method to find
such an equation if the form of the equation is not known in advance. Statistical
methods assume a form for the equation and then try to optimise the coefficients for
Page 26
the equation. GP can find both the structure and the coefficients for the equation.
Oakley successfully extended symbolic regression to chaotic data [Oakley, 1994].
Another popular area of application is the control of artificial animals and robots.
Reynolds generated programs to control a robot in order to avoid obstacles [Reynolds,
1994]. Spencer used GP to teach a 6-legged robot how to walk, in terms of the
sequence of mechanical actions that had to be performed [Spencer, 1994].
Economic optimization, a complex field for analytical study, has also lent itself to
evolutionary computation techniques. Andrews modelled a double auctioning system
which used GP to generate a better automatic auctioning program than those
previously known [Andrews, 1994].
Koza et al have applied GP to the problem of designing electrical circuits. They
trained an artificial animal in maximal food foraging - the algorithm being produced
in the form of an electronic circuit discovered by GP [Koza, 1996-1]. In a similar
manner, an electronic circuit was successfully built to implement an operational
amplifier with desirable amplifier characteristics [Koza, 1996-2].
Andre used GP to learn rules for optical character recognition [Andre 1996]. It is a
laborious task to write rules manually to distinguish among different characters in a
character set, especially when different fonts and sizes are used. GP successfully
found rules to classify characters with few errors.
GP can also be applied to classification problems. A finite automaton, when
duplicated and arranged in a regular formation, can exhibit aggregate behaviour about
the total structure. A classic problem is to find a boolean-valued automaton that
relaxes the total automaton into a steady state corresponding to the value that occurred
most often in the start state. This is known as the Majority Classification Problem and
can be solved in numerous ways. Andre used GP to find a rule for the cellular
automata that was better than any previously known rule (for a particular
configuration) [Andre, 1996].
Hand-in-hand with new applications of GP goes the development of new
implementations. The early Koza-based implementation of GP was done in LISP, but
Page 27
attempts are being made to port the GP paradigm to other programming environments.
C++ and other 3GLs are useful for implementation but require complex modelling for
non-trivial problems. Other platforms (eg. Mathematica) are considered to circumvent
this complexity.
Page 28
2CHAPTER 2 : A MATHEMATICA IMPLEMENTATION
Implementation Languages
Lisp
The first implementations of GP done by Koza used the LISP programming language
[Koza, 1992]. LISP (LISt Processor) has some unique characteristics compared to
other commonly used languages, which makes it an ideal platform for the
implementation of GP.
In LISP, there are only two basic syntactic constructs. The atom is a terminal part of
an expression, being either a variable or constant. The other construct is a list. Any
program can be represented solely using lists of atoms. Lists can also be nested and
embedded recursively. Lists use a prefix notation, as opposed to popular programming
languages which prefer infix notation for its more obvious interpretation. These lists
in LISP are known as S-expressions.
It can be shown that all computer programs are essentially sequences of functions.
LISP generalises this by requiring all programs to be in the form of a list. The first
element of the list is the name of the function while the rest constitute its arguments.
Thus, in Table 2.1, “+” is the name of the function and its arguments are the numbers
“1” and “2”. These lists can also be represented as trees since they allow nesting. This
( + 1 2 )
( * a b )
( + ( * a b ) ( / c d ) 8 )
1+2
ab
abc
d8+ +
LISP normal interpretation
Table 2.1. Sample LISP expressions
Page 29
tree visualisation is ideal since GP requires a tree representation for its various
manipulations.
LISP makes no distinction between code and data. Both the program and the data it
works on are represented as lists. Thus it is possible to execute an item of data as if it
was code. Alternatively, it is also possible to manipulate a program as if it was pure
data. The primary reason why most people implement GP in LISP is because they can
exploit this feature to make the evaluation of fitnesses easier. Instead of writing an
interpreter to execute the individuals, they can be run directly on the computer by
virtue of this almost unique LISP feature.
Although these features of LISP are conducive to a GP implementation, LISP is not
widely used because programs do not execute fast enough (compared to 3GL
languages) and compilers/interpreters are uncommon. It is used by AI researchers but
not by many other people.
C++
In order to create a GP implementation that is both fast and portable, C++ is an ideal
choice. Of the wide range of 3GL languages available, C++ compilers are available on
most platforms. Thus the code can be written in a platform-independent manner. C++
also has an adequate library of functions to enable greater flexibility when designing
internal representations and manipulation functions.
Keith discusses some of the problems that accompany a C++ implementation,
especially the issue of representation [Keith, 1994]. Since tree structures are not
native to C++, these have to be simulated using data structures. In a direct conversion
from LISP, these trees can be created using pointers and objects. However, it is also
possible to convert the tree into postfix or prefix notation and use a one-dimensional
array to store the tree. These different methods have a direct effect on the functions
that manipulate the expressions in terms of complexity and speed.
The greatest advantage of LISP over C++ is its ability to execute the individuals
directly to gauge their fitnesses. C++ has to use an interpreter to perform this task.
This interpreter will have to take the data structure that corresponds to an individual
Page 30
and simulate execution. For simple problems, such an interpreter may be trivial to
build, but a larger non-terminal set may require a complex interpreter on the scale of
the compiler itself.
This can be a prohibitive factor since the interpreter will have to be written as part of
the GP implementation. In addition, the problem domain will have to be modelled in
C++. The complexity of such modelling cannot be predetermined so the effect of such
is not obvious. However, without the aid of function libraries, mathematical modelling
in C++ is a non-trivial task which may require more development time than the actual
GP algorithm.
Mathematica
Mathematica is an environment in which mathematical computations are easily
performed. It is essentially an interpreter which takes expressions as input and
attempts to make conclusions from these expressions. Most Mathematica users only
utilise this subset of its capabilities.
Mathematica can be compared to the BASIC (Beginners All Purpose Symbolic
Instruction Code) interpreter which was bundled with the older versions of MSDOS
(MicroSoft Disc Operating System). It can execute one command at a time or it can
take input from a file, thus processing a batch of input at once. This batch processing
allows the user to write programs in Mathematica.
Mathematica stores all expressions internally as trees. This makes it easier to
implement GP in Mathematica since GP requires a tree representation. Mathematica
also has available a library of functions for manipulation of these trees, and these are
useful for genetic operators.
Similarly to LISP, Mathematica makes no distinction between program code and data.
Thus a program can be manipulated and modified as if it was plain data, and data
could be executed as if it was code. Unlike C++, it is unnecessary to use an interpreter
to evaluate the fitnesses of individuals, since the individuals can be executed within
the framework of the Mathematica environment.
Page 31
The most important factor supporting the implementation of GP in Mathematica is the
large body of existing and ongoing mathematical modelling in this environment, as
demonstrated by the number of conferences and publications devoted to it.
Mathematica is becoming a platform of choice because of its ingrained orientation
towards the analysis and presentation of mathematical solutions. The ease with which
complex problems can be implemented in Mathematica makes it feasible to
implement GP on this platform. Since GP is problem-independent, the majority of
work done to solve a problem is in the modelling stage. By choosing a platform like
Mathematica which supports easier modelling, productivity can be increased.
Nachbar was the first person to document a GP implementation in Mathematica but,
subsequently, there has been little work done in this field [Nachbar, 1994]. This study
explores the implementation of GP on a Mathematica platform, making full use of the
multiple paradigms, optimizations and other advanced features available in the
language.
Introduction to Mathematica
The following overview of Mathematica is focused on the aspects that are relevant to
the GP implementation. A more in-depth discussion can be found in [Wolfram, 1991],
[Wolfram, 1992], [Wickham-Jones, 1994], [Maeder, 1991] and [Abell, 1992].
Platforms and Organisation
Mathematica is available on many different hardware platforms and operating system
combinations e.g. DOS, Windows 3.x, Sun, Silicon Graphics. However, the
underlying kernel of the environment is the same in all instances. This kernel is a
single-line text input processing system. A line of Mathematica code is typed in at the
keyboard, this expression is immediately evaluated and the results are output to the
screen.
In modern GUI (graphical user interface) operating systems, this method of inputting
data into the environment would not be acceptable since it does not conform to the
user interface and the advantages of the operating system would be lost. To make
Mathematica easier to use, a front-end processor was included. This is a graphical
Page 32
program that takes input from the user in the most natural way possible and passes this
input to the Mathematica kernel. The output from the kernel is then re-directed back to
the front-end, which formats it in a more natural way. The input and output are both
displayed as a single document, much in the same way as a word processor displays a
text document. This allows the user to edit and re-evaluate expressions, which could
not be done in the line-by-line version. Also, having both the input and output on a
single page allows for easier publishing of results from the session. This document,
containing Mathematica input, output and other formatting is known as a Notebook.
Variables
Mathematica can do both numerical and symbolic calculations, attempting at all times
to produce a result which is as accurate as possible. If the answer to a calculation is a
fraction, then that fraction would be output instead of its numerical equivalent, to
preserve computational precision.
The basic data types are String, Integer and Real. These can then be compounded into
lists. Values are assigned to variables by means of the standard assignment operator
“=”.
X=12
In an actual Mathematica environment, these input and output operations may be
preceded by an internal numbering system, which allows the user to refer to results
from previous calculations.
After such a definition, all occurrences of X (taking case into account) are replaced by
its associated value. If the input is simply X then the output would be “12”. Obviously,
the value of one variable can be assigned to another using the same syntax. Variables
can be created on-the-fly, without the need to declare the list of variables in advance.
A list of values is denoted by curly braces.
TestList = {1, 2, 3}
There are no pointers in Mathematica since it does its own memory management.
Lists can grow as large as memory and hard disk space (used for virtual memory)
Page 33
allow. They can be embedded and nested to form trees, which are the most general
form of data structure directly supported in Mathematica.
Functions
Mathematica is first and foremost a functional programming language. It contains a
large collection of pre-defined functions and allows the user to define further
functions or even enhance the built-in definitions. A program in Mathematica is
simply a sequence of calls to these functions. These calls can themselves be embedded
within another function, allowing modular programming.
Functions are called by the exact name of the function, followed by the parameters
within square brackets. For example,
Plus[2, 2]
would produce the following output:
4
All operations without exception can be written in this form. Even simple functions
like addition and subtraction can use this notation. However, in order to make
inputting of expressions easier, the kernel allows an alternative notation for some
common expressions, like addition and multiplication. Thus the expression
2+2
is equivalent to the one above and would produce the same output.
Function calls can be nested and the expression is then evaluated depth-first (in most
cases). Thus it is possible to write
Times[12, Plus[2, 1]]
which would evaluate to “36”.
Functions are defined using the following general syntax:
NewFunction [x_, y_] := 2 * x + y
Page 34
The name of the function will be NewFunction. This will be added to the list of
built-in functions. There is no distinction between built-in functions and user-defined
functions, allowing the Mathematica environment to be easily extended.
The parameters within brackets are the formal parameters. The underscores after the
names of the formal parameters indicate that they are simply placeholders for actual
parameters. Mathematica uses a system of pattern-matching to implement its function
mechanism. When the function is called, the actual parameters are replaced for the
formal parameters wherever they occur in the expression, then the expression is
evaluated. If the underscores are omitted, Mathematica would try to match the exact
parameters in the list, without any form of pattern-matching. Thus, only
NewFunction[x, y] would be successfully parsed.
The “:=” indicates that the RHS expression is not to be evaluated until the function is
used within another expression. This ensures that parameter substitution by means of
pattern-matching gets highest precedence. If the colon was not prefixed to the
assignment operator then the RHS would be evaluated when the function is defined; if
x and y are global variables then their values would be substituted, instead of the
parameters, and the result of the function would be that constant value generated.
The expression on the RHS of the function definition is the body of the function. The
variables used are subject to parameter pattern-matching. The result of the function
call is the evaluation of this expression. Thus
NewFunction [7, 3]
would result in
17
It is also possible to do symbolic calculations. Variables can be used as input to the
function, whether they have a value or not. Consider the following code fragment:
a=12; NewFunction [a, b]
The output would be
24 + b
Page 35
If two statements are separated by a semi-colon, then they are executed in sequence
and the result of the expression is the result of the second expression. In the above
example, a has an associated value while b does not. The kernel therefore replaces the
a with its value when calling the function. The second actual parameter is b since it
doesn’t have a value. Thus the answer is as accurate as possible with the limited
information provided. Using this technique of defining values for variables it is also
possible to perform symbolic calculations in Mathematica.
Overloading of functions is an integral part of the environment, allowing for multi-
part functions and different parameter types and ranges. Functions are very flexible
when pattern-matching. It is possible to write functions that only accept parameters of
particular types or ranges or even parameters that obey specific rules. Varying
numbers of parameters are also catered for.
Paradigms
Although Mathematica focuses mainly on the functional aspects of programming,
there are also mechanisms that enable the user to write procedural and declarative
code.
By simple virtue of the fact that function overloading and pattern-matching is
NewGen creates a new generation of individuals. The newgen is first initialised to an
empty list. The sum of fitnesses (maxwheel) and the size of the population (lenx)
are calculated. It can be argued that the PopulationSize can be used. However,
by generating the population size dynamically, it is possible to apply this function to
subsets of the population as well.
Page 46
CalcFitnessSum creates the list of partial sums needed for the binary search. A
new generation is then created iteratively. A random number is generated and the
associated individual is selected by the Search function. The individual is then
appended to the new generation in newgen.
Finally, the value of newgen is returned as the result of the function, being the new
population.
Crossover
Two child expressions are produced from a pair of parents by means of the crossover
genetic operator. Cross1 takes two individuals and performs crossover.
Crossover applies this function to an entire population.
(* Get list of all indices of internal points in expression *) RemoveZero[x_]:=If[Position[x, 0]=={}, x, {}] Points[x_]:=Union[Map[RemoveZero, Position[x, _]], {}] GetInternal[{x___}]:=x
The unique position of any node or subtree in a tree can be specified by a list of
indices, which represent the path from the root to the node. Points is a function
which generates a list of the positions of every subtree of a given tree.
On[ Del et eFi l e: : nf f i l ] ; Map[ ( Del et eDi r ect or y[ #, Del et eCont ent s- >Tr ue] ) &, Fi l eNames[ " PROC* " ] ] ;
( * par agr aph 2 * ) Genet i c` Par amet er s` Gl obal Sol ut i on=1; Genet i c` Par amet er s` Gl obal Sol ut i onFi t ness=0; Genet i c` Par amet er s` Gl obal Sol ut i onSet ={ } ; Genet i c` Par amet er s` Tot Ti me=0;
( * par agr aph 3 * ) Save[ " pop. l og" , Genet i c` Par amet er s` Gl obal Sol ut i on] ; Save[ " pop. l og" , Genet i c` Par amet er s` Gl obal Sol ut i onFi t ness] ; Save[ " pop. l og" , Genet i c` Par amet er s` Gl obal Sol ut i onSet ] ; Save[ " pop. l og" , Genet i c` Par amet er s` Tot Ti me] ;
( * par agr aph 4 * ) MakePossi bi l i t i es;
( * par agr aph 5 * ) Save[ " cal ced. m" , Genet i c` Par amet er s` GPossi bi l i t i es] ; Save[ " cal ced. m" , Genet i c` Par amet er s` GPossPar amet er ] ; Save[ " cal ced. m" , Genet i c` Par amet er s` GTer mLengt h] ; Save[ " cal ced. m" , Genet i c` Par amet er s` GPossLengt h] ;
( * par agr aph 6 * ) I ni t Names;
( * par agr aph 7 * ) Save[ " cal ced. m" , Genet i c` Par amet er s` Popul at i onNames] ; Save[ " cal ced. m" , Genet i c` Par amet er s` Mi gr at i onPai r s] ;
( * par agr aph 8 * ) Genet i c` Par amet er s` Popul at i onSi ze= Genet i c` Par amet er s` Popul at i onSi ze/ Genet i c` Par amet er s` NoOf Subpopul at i ons;
( * par agr aph 9 * ) GI nf or mat i on;
( * par agr aph 10 * ) Map[ I ni t i al i zePop, Genet i c` Par amet er s` Popul at i onNames] ;
( * par agr aph 11 * ) CheckGl obal Sol ut i ons; ]
Page 92
Initialize initialises all variables and sub-populations in preparation for the
execution of the GP algorithm.
All traces of previous runs are erased. This includes log files created and directories
used to store processor information (paragraph 1). Global variables are initialised (2)
and stored in the global information file (3). In order to save time during the
generation of individuals, the terminal and function sets are joined during initialisation
and stored in a disk file - CALCED.M (4/5). The names of populations are generated
together with migration pairs and these are stored in the same disk file (6/7). The
population size is divided by the number of sub-populations (8) and information on
the run is displayed (10). Each sub-population is initialised with random individuals
(11), their fitnesses are evaluated and global statistics are calculated (12).
After initialising the variables, each processor must be registered for scheduling
purposes. This registration simply creates a unique directory for each processor.
The first iterations of the experiment attempted to compare the performances of
various configurations of workstations/processors. The parameters for the run were
consistent at the values indicated in Table 5.2.
For this experiment, migration took place on a single computer after each generation
was evolved i.e. one computer performed migration on the entire set of sub-
populations.
As shown in the table, the number of sub-populations is 9, implying that the sub-
populations were distributed on a 3x3 grid. Although this does not assist is preserving
variety of the population, it does make it possible to execute the algorithm in parallel,
which was the primary focus of this experiment.
The experiment was repeated 15 times, 5 times each using 1 processor, 3 processors
and 9 processors. In all instances the perfect solution, as indicated by Equation 5.1,
was evolved. The times taken to achieve these results are shown in Table 5.3.
Parameter ValuePopulation Size 450No of Sub-populations 9Max no of Generations 51Max initial size 5Max size 17Maximum complexity 50Min solution fitness 1Mutation probability 0.1Crossover probability 0.9Terminal set {x}Function set {PPlus, PPlus, PTimes, PTimes, PMinus, PDivide}
Table 5.2. Parameters for parallel symbolic regression - Exp 2.1
Page 106
First the algorithm was run on a single machine (Run1A-Run1E) and this found the
solution in an average time of 2 hours, 2 minutes and 45 seconds. When the algorithm
was run on a network of 3 computers, it took only an average of 41 minutes and 55
seconds to find the solution. When 9 processors were used, the increase in speed was
minimal and the average time taken was reduced to only 41 minutes and 30 seconds.
Table 5.3. Time taken to run parallel symbolic regression on multiple processors
Page 107
Figure 5.3 illustrates the differences in time taken during the three runs. There is a
substantial decrease in time when the number of processors is increased to 3 but not
much improvement gained from increasing the number of processors to 9. This is due
to the serial nature of migration. When 9 processors were used, the time taken for
evolution was small compared to the time taken for migration. At this point it was
decided to parallelise the migration operation as well.
Although the time taken for a complete evolutionary run is significant, it is not the
best metric for comparative analysis since the length of each run is most probably
different. Thus, when comparing the time taken to reach a solution with different
numbers of processors, it is more accurate to use the average times taken to evolve
each new generation. Using this data, Figure 5.4 was generated.
0:00:00
0:14:24
0:28:48
0:43:12
0:57:36
1:12:00
1:26:24
1:40:48
1:55:12
2:09:36
1 3 9No of Processors
Time(h:m:s)
Figure 5.3. Graph showing overall time taken vs. no of processors - Exp 2.1
Page 108
It can be seen that the total time taken for each run is related almost proportionately to
the time taken for evolution of a single generation.
However, if one compares the average total time taken for 1 processor (2h 02m 45s) to
that of 3 processors (41m 45s), it superficially seems that the latter case achieves
greater than linear speedup. This is, of course, not the case, since migration and
collation of results were still serial operations, resulting in lower than optimal
increases in speed. Thus the 3 processors ought to have achieved less than linear
speedup of execution. Now, if the average time taken to evolve single generations is
used instead, then comparisons can be made between different numbers of processors.
The average time taken to process one generation was 485 seconds for 1 processor and
220 seconds for 3 processors. This ratio is below 3:1, as was expected.
Experiment 2.2
In order to prove that the parallel algorithm really does speed up the execution of the
algorithm, a single-population model was also tested with all parameters being the
same except the number of sub-populations, as indicated in Table 5.4.
050
100150200250300350400450500
1 3 9No of Processors
Time(s)
Figure 5.4. Graph showing time taken per generation vs. no of processors - Exp 2.1
Page 109
The times taken for each run of the experiment is indicated in Table 5.5.
It was expected that the single-population algorithm would be outperformed by both
the 3-processor and 9-processor models. However, the results of the single-population
model surpass all models of the parallel algorithm. This occurred primarily because of
the serial nature of migration, taking a substantial percentage of the total computation
time.
Parameter ValuePopulation Size 450No of Sub-populations 1Max no of Generations 51Max initial size 5Max size 17Maximum complexity 50Min solution fitness 1Mutation probability 0.1Crossover probability 0.9Terminal set {x}Function set {PPlus, PPlus, PTimes, PTimes, PMinus, PDivide}
Table 5.4. Parameters for parallel symbolic regression - Exp 2.2
Table 5.6. Time taken to run parallel symbolic regression on 3 processors with parallelised
migration operation
Page 111
Experiment 3: CSTR Controller
A Continuous Stirred Tank Reactor (CSTR) is a chemical reactor that was modelled
in Mathematica for a simple exothermic reaction [Hajek, 1994]. For some reactions, it
is desirable to attain a particular state of the reactor, in terms of the temperature,
concentration of reactant and other parameters. With optimal control of the reaction,
the chemical reactor may produce maximal yield. It was attempted to control the
reactor, by means of changes in coolant and reactant inflow. Hajek applied fuzzy
logic, optimised by a genetic algorithm in order to generate equations to control the
reactor towards a known unstable steady state. The Mathematica model for this reactor
was obtained by personal contact with the author and GP was applied in an attempt to
find controlling equations that achieve the objective with as little control deviation as
possible.
The fitness function was pre-specified to be the sum of differences between the
desired set points and the control variables, temperature and reactant concentration,
over a set of discrete time intervals. This summation included four scenarios of the
experiment with different starting points (temperature and reactant concentration).
This is discussed further in [Hajek, 1994].
The function set contained only the four standard arithmetic operators (Plus, Minus,
Times, Divide), to streamline the genetic processes. The terminal set contained the
two control variables, temperature (x) and concentration of reactant (y), as well as
some constant values. The parameters used for the GP run are indicated in Table 5.7.
Page 112
The constants in the terminal set were introduced in order to allow greater scaling of
the variables i.e. to increase the range of values spanned by the control variables.
Many copies of the control variables (x and y) were included in the terminal set in
order to increase the probability of selection of the variables relative to the constants
in the same set.
Since two equations were sought, the genetic operators were modified to cater for this.
Each individual was generalised to be a list of expressions rather than a single
expression. Then, all operations could be applied to the lists. Crossover on a list of
expressions was extended to operate on a single expression from the list, chosen with
uniform randomness - the corresponding expression is chosen from another
individual. Mutation was changed similarly to operate on one of the expressions
within the list.
Experiment 3.1
Raw fitness criteria were compared to a supplied heuristic estimate for the control
functions, which produced a value of 25337.2. This criterion corresponds to the
cumulative error so lower values are indicative of better solutions. The GP algorithm
produced comparatively better criteria.
Parameter ValuePopulation Size 360No of Sub-populations 9Max no of Generations 51Max initial size 5Max size 17Maximum complexity 50Min solution fitness 1Mutation probability 0.1Crossover probability 0.9Terminal set {x, x, x, x, x, y, y, y, y, y, 1000, 100, 10, 1,
0.01, 0.001, 0.0001}Function set {PPlus, PPlus, PTimes, PTimes, PMinus, PDivide}
Table 5.7. GP Parameters for CSTR
Page 113
Figure 5.5 shows the control trajectory achieved (top left graph) as well as the values
of the control function during the each time interval (coolant inflow on the left and
reactant inflow on the right). The criterion was 18613.4, which corresponded to a fitter
8APPENDIX B : SCHEDULER #define WIN31 #include <dir.h> #include <owl.h> // -------------------------------------------------------------------- \\ // Class declaration for a general item of data in a linked list class Thing { public: Thing () {}; Thing *Next, *Prev; }; // -------------------------------------------------------------------- \\ // Class declaration and definition for a general linked list class ThingList { public: ThingList (); void AddThing ( Thing *p ); Thing *PopThing (); protected: Thing *Head, *Tail; }; ThingList::ThingList () { Head=NULL; Tail=NULL; } void ThingList::AddThing ( Thing *p ) { p->Prev=Tail; if (Head==NULL) Head=p; else Tail->Next=p; p->Next=NULL; Tail=p; } Thing *ThingList::PopThing () { if (Head==NULL) return NULL; if (Head==Tail) { Thing *p=Head; Tail=Head=NULL; return p; } else { Thing *p=Head; Head=Head->Next; Head->Prev=NULL; return p; } } // -------------------------------------------------------------------- \\ // Declaration and definition for a list of job class Job : public Thing
Page 136
{ publ i c: Job ( char * n ) ; char * Get Name ( ) ; pr ot ect ed: char Name[ 80] ; } ; Job: : Job ( char * n ) { l s t r cpy ( Name, n) ; } char * Job: : Get Name ( ) { r et ur n Name; } / / - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - \ \ / / Decl ar at i on and def i ni t i on f or a l i s t of j obs c l ass JobLi st : publ i c Thi ngLi st { publ i c: JobLi st ( PTDi al og pt d ) ; JobLi st ( PTDi al og pt d, i nt ) ; voi d Ref r esh ( ) ; voi d AddJob ( Job * p ) ; pr ot ect ed: PTDi al og Par ent ; } ; JobLi st : : JobLi st ( PTDi al og pt d, i nt ) { Par ent =pt d; } JobLi st : : JobLi st ( PTDi al og pt d ) { st r uct f f bl k f f bl k; i nt done; Par ent =pt d; done = f i ndf i r st ( " * . * " , &f f bl k, FA_DI REC) ; whi l e ( ! done) { i f ( ( f f bl k. f f _name[ 0] ==' P' ) && ( f f bl k. f f _name[ 1] ==' O' ) && ( f f bl k. f f _name[ 2] ==' P' ) && ( f f bl k. f f _name[ l st r l en ( f f bl k. f f _name) - 3] ==' L' ) && ( f f bl k. f f _name[ l st r l en ( f f bl k. f f _name) - 2] ==' O' ) && ( f f bl k. f f _name[ l st r l en ( f f bl k. f f _name) - 1] ==' G' ) && ( f f bl k. f f _name[ 3] ! =' . ' ) ) { f f bl k. f f _name[ l st r l en ( f f bl k. f f _name) - 4] =0; Job * p=new Job ( f f bl k. f f _name) ; AddJob ( p) ; } done = f i ndnext ( &f f bl k) ; } Ref r esh ( ) ; } voi d JobLi st : : Ref r esh ( ) { Job * p=( Job * ) Head; Par ent - >SendDl gI t emMsg ( 102, LB_RESETCONTENT, 0, 0) ; whi l e ( p! =NULL) {
Page 137
if (p!=NULL) Parent->SendDlgItemMsg (102, LB_ADDSTRING, 0, (long)(p->GetName())); p=(Job *)p->Next; } } void JobList::AddJob ( Job *p ) { AddThing (p); } // -------------------------------------------------------------------- \\ // Declaration and definition for a list of migration jobs class MigrateJobList { public: MigrateJobList ( PTDialog ptd, int nos ); ~MigrateJobList (); void AddJob ( char *s ); void Refresh (); char *GetJob (); BOOL MoreJobs (); void ClearJob ( char *s ); protected: unsigned char *Matrix, *List; PTDialog Parent; unsigned long Size; char tJob[256]; unsigned long GetPos ( int r, int c ); }; MigrateJobList::MigrateJobList ( PTDialog ptd, int nos ) { Parent=ptd; Size=nos; Matrix=new unsigned char [GetPos (nos-1, nos)+1]; memset (Matrix, 0, GetPos (nos-1, nos)+1); List=new unsigned char [nos]; memset (List, 0, nos); } MigrateJobList::~MigrateJobList () { delete Matrix; delete List; } unsigned long MigrateJobList::GetPos ( int r, int c ) { unsigned long Pos, r1=r, c1=c; Pos=(((r1-1)*(2*(Size-1)-r1+2))/2)+c1-r1-1; return Pos; } void MigrateJobList::AddJob ( char *s ) { s++; unsigned long Code=atol (s); unsigned long r=(Code / Size)+1; unsigned long c=(Code % Size)+1; Matrix[GetPos (r, c)]=1; } char *MigrateJobList::GetJob () { for ( int a=1; a<Size; a++ ) for ( int b=a+1; b<=Size; b++ ) if ((List[a-1]==0) && (List[b-1]==0) && (Matrix[GetPos (a, b)]==1)) { List[a-1]=1; List[b-1]=1;
TickTimer=GetTickCount (); } } // -------------------------------------------------------------------- \\ // -------------------------------------------------------------------- \\ // Main program body // -------------------------------------------------------------------- \\ int PASCAL WinMain ( HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR lpCmdLine, int nCmdShow ) { MApplication M ("GPNet", hInstance, hPrevInstance, lpCmdLine, nCmdShow); M.Run (); return M.Status; }
Page 146
9APPENDIX C : PARALLEL GP
time.m (* Genetic Programming *) (* Time output routines *) (* H. Suleman *) (* 24 October 1995 *) BeginPackage["Genetic`Time`"] Time::usage = "Time[x] outputs the time taken in seconds, minutes and hours. Time[x, Stuff] outputs Stuff followed by time taken." Begin["`Private`"] Time[x_ Second /; x>=3600, Stuff___] := Module[{h, m, s}, s = x; h = Floor[s/3600]; s -= h*3600; m = Floor[s/60]; s -= m*60; Print[Stuff, h, " Hours, ", m, " Minutes, ", s, " Seconds"]] Time[x_ Second /; x>=60, Stuff___] := Module[{m, s}, s = x; m = Floor[s/60]; s -= m*60; Print[Stuff, m, " Minutes, ", s, " Seconds"]] Time[x_ Second, Stuff___]:= Print[Stuff, x, " Seconds"] End[] Protect[Time] EndPackage[]
operator.m (* Genetic Programming *) (* Genetic operator routines *) (* H. Suleman *) (* 9 June 1996 *) (* Get parameters *) Needs["Genetic`Parameters`", "default.m"] BeginPackage["Genetic`Operators`"] Cross1::usage = "Cross1[x, y] performs crossover on x and y to produce {x1, y1}." Crossover::usage = "Crossover[x] performs crossover on the population in list x." Mutate::usage = "Mutate[x] randomly mutates expression x." Begin["`Private`"] (* Get list of all indices of internal points in expression *) RemoveZero[x_]:=If[Position[x, 0]=={}, x, {}] Points[x_]:=Union[Map[RemoveZero, Position[x, _]], {}] GetInternal[{x___}]:=x (* Perform crossover operation on two expressions *) Cross1[x_, y_]:=Module[ {spot1, spot2, point1, point2, temp1, temp2}, If[ Random[]<Genetic`Parameters`CrossoverProbability, point1=Points[x]; spot1=Random[Integer, {1, Length[point1]}]; point2=Points[y]; spot2=Random[Integer, {1, Length[point2]}]; temp1=x[[GetInternal[point1[[spot1]]]]]; temp2=y[[GetInternal[point2[[spot2]]]]]; { If[ point1[[spot1]]=={}, temp2, ReplacePart[x, temp2, point1[[spot1]]] ], If[ point2[[spot2]]=={}, temp1, ReplacePart[y, temp1, point2[[spot2]]] ] }, {x, y} ] ] (* perform crossover on corresponding elements in lists *) Cross1[x_ /; Head[x]==List, y_ /; Head[y]==List]:= Module[ {z, pos, xnew, ynew}, pos=Random[Integer, {1, Length[x]}]; z=Cross1[x[[pos]], y[[pos]]]; xnew=x; ynew=y; xnew[[pos]]=z[[1]]; ynew[[pos]]=z[[2]]; {xnew, ynew} ] (* Perform mutation operation on an expression *) Mutate[x_]:=Module[ {spot1, point1, y, xold, xnew},
Page 149
xold=x; xnew=x; If[ Random[]<Genetic`Parameters`MutationProbability, y=Genetic`Initialization`Generate[Random[Integer, {1, Genetic`Parameters`MaxInitialSize}]]; point1=Points[xnew]; spot1=Random[Integer, {1, Length[point1]}]; xnew=If[ point1[[spot1]]=={}, y, ReplacePart[x, y, point1[[spot1]]] ]; If[ ((Depth[xnew]<=Genetic`Parameters`MaxSize) && (LeafCount[xnew]<=Genetic`Parameters`MaxComplexity)), xnew, xold ], xold ] ] (* perform mutation on an element within a list *) Mutate[x_ /; Head[x]==List]:= Module[ {z, xnew}, z=Random[Integer, {1, Length[x]}]; xnew=x; xnew[[z]]=Mutate[x[[z]]]; xnew ] (* Perform crossover on all expressions in new generation *) Crossover[x_] := Module[ {newx, oldx, n2, leno, origlen}, oldx=x; newx={}; leno=Length[oldx]; origlen=leno; While[ leno>0, If[ leno==1, newx=Append[newx, First[oldx]]; oldx=Rest[oldx], n2=Cross1[oldx[[1]], oldx[[2]]]; If[((Depth[n2[[1]]]<=Genetic`Parameters`MaxSize) && (LeafCount[n2[[1]]]<=Genetic`Parameters`MaxComplexity)), newx=Append[newx, n2[[1]]], newx=Append[newx, oldx[[1]]] ]; If[((Depth[n2[[2]]]<=Genetic`Parameters`MaxSize) && (LeafCount[n2[[2]]]<=Genetic`Parameters`MaxComplexity)), newx=Append[newx, n2[[2]]], newx=Append[newx, oldx[[2]]] ]; oldx=Drop[oldx, 2]; ]; leno=Length[oldx] ]; newx ] End[] Protect[Cross1, Crossover, Mutate] EndPackage[]
Page 150
initial.m (* Genetic Programming *) (* Initialization routines *) (* H. Suleman *) (* 9 June 1996 *) (* Get time routines *) Needs["Genetic`Time`", "time.m"] (* Get extra definitions for basic arithmetic operations *) Needs["Genetic`ExtraDefinitions`", "xtradefs.m"] (* Get parameters *) Needs["Genetic`Parameters`", "default.m"] (* Get file locking routines *) Needs["Genetic`Shares`", "shares.m"] BeginPackage["Genetic`Initialization`", {"Genetic`Parameters`"}] Generate::usage = "Generate[x] generates a random expression of depth x." Initialize::usage = "Initialize initialises the various parameters and populations." GInformation::usage = "GInformation[] lists information about the current parameters." GPopInformation::usage = "GPopInformation[popname] lists information about the state and best individual in the current population." CheckSolution::usage = "CheckSolution calculates fitnesses and checks for solutions." CheckGlobalSolutions::usage = "CheckGlobalSolutions checks if the local solution betters the global one." InitNames::usage = "InitNames initialises the table of name prefixes of populations." Begin["`Private`"] (* Make lists of terminals+functions, parameters, etc. *) MakePossibilities:=Module[ {}, Genetic`Parameters`GPossibilities=Join[Terminals, Functions]; Genetic`Parameters`GPossParameter=Join[ Table[0, {Length[Terminals]}], Parameters ]; Genetic`Parameters`GPossLength=Length[Genetic`Parameters`GPossibilities]; Genetic`Parameters`GTermLength=Length[Terminals]; ] (* Generate random expression *) GenerateNormal[d_]:=Module[ {r}, If[ d>1, r=Random[Integer, {1, Genetic`Parameters`GPossLength}], r=Random[Integer, {1, Genetic`Parameters`GTermLength}] ]; Switch[ Genetic`Parameters`GPossParameter[[r]],
genmain.m (* Genetic Programming *) (* Main routines *) (* H. Suleman *) (* 28 May 1996 *) (* Get normal distribution functionality *) Needs["Statistics`NormalDistribution`"]; (* Get time routines *) Needs["Genetic`Time`", "time.m"] (* Get extra definitions for basic arithmetic operations *) Needs["Genetic`ExtraDefinitions`", "xtradefs.m"] (* Get parameters *) Needs["Genetic`Parameters`", "default.m"] (* Get initialization routines *) Needs["Genetic`Initialization`", "initial.m"] (* Get file locking routines *) Needs["Genetic`Shares`", "shares.m"] (* Get genetic operators *) Needs["Genetic`Operators`", "operator.m"] BeginPackage["Genetic`Main`", {"Genetic`Parameters`", "Genetic`Initialization`", "Genetic`Operators`", "Statistics`NormalDistribution`"}] CreateNewGeneration::usage = "CreateNewGeneration[oldgen] creates a new generation from the old generation using fitness-proportionate reproduction." StartRun::usage = "Starts the run of the genetic algorithm." RegisterProc::usage = "Registers a processor." Begin["`Private`"] (* Make cumulative fitnesses vector *) CalcFitnessSum:=Module[{fitsum, i}, fitsum=Table[Apply[Plus, Take[Fitnesses, i]], {i, 1, Length[Fitnesses]}];
stats.m (* Genetic Programming *) (* Statistics routines *) (* H. Suleman *) (* 30 October 1996 *) Needs["Graphics`Graphics`"]; Needs["Graphics`Animation`"]; BeginPackage["Genetic`Stats`", {"Graphics`Graphics`", "Graphics`Animation`", "Graphics`Graphics3D`" }] GlobalCurve::usage = "GlobalCurve[] shows the global fitness curve." GlobalHistogram::usage = "GlobalHistogram produces a set of histograms for the entire population." MaxHistogram::usage = "MaxHistogram produces a set of 3-D histograms showing the progress of the solution fitness in each subpopulation." AveHistogram::usage = "AveHistogram produces a set of 3-D histograms showing the average fitness in each subpopulation." CalcHistogram::usage = "CalcHistogram calculates the global histograms and 3D histograms." HistogramData={}; Histogram3DMax={}; Histogram3DAve={}; Begin["`Private`"] GlobalCurve:=Module[ {t, MaxG, MinG, AveG}, BeginPackage["Genetic`Parameters`"]; Get["pop.log"]; EndPackage[]; t=MapThread[List, Genetic`Parameters`GlobalSolutionSet]; MaxG=ListPlot[MapThread[List, {t[[1]], t[[2]]}], PlotRange->{{0, Max[t[[1]]]}, {0, 1}}, PlotStyle->{RGBColor[1,0,0]}, Frame->True,