Page 1
EVOLUTIONARY ARTIFICIAL NEURAL NETWORK DESIGN AND
TRAINING FOR WOOD VENEER CLASSIFICATION
Marco Castellania1 and Hefin Rowlandsb
aCentro de Inteligência Artificial (CENTRIA), Departamento Informática, Universidade
Nova Lisboa, 2829-516 Caparica, Portugal
b Research & Enterprise Department, University of Wales, Newport, Allt-yr-yn
Campus, PO Box 180, NP20 5XR Newport, UK
ABSTRACT
This study addresses the design and the training of a multi-layer perceptron classifier for
identification of wood veneer defects from statistical features of wood sub-images.
Previous research utilised a neural network structure manually optimised using the
Taguchi method with the connection weights trained using the backpropagation rule.
The proposed approach uses the evolutionary ANNGaT algorithm to generate the neural
network system. The algorithm evolves simultaneously the neural network topology and
the weights. ANNGaT optimises the size of the hidden layer(s) of the neural network
structure through genetic mutations of the individuals. The number of hidden layers is a
system parameter. Experimental tests show that ANNGaT produces highly compact
neural network structures capable of accurate and robust learning. The tests show no
differences in accuracy between neural network architectures using one and two hidden
1 Corresponding author: Tel. +351 212948536 Fax +351 212948541 Email [email protected]
Page 2
layers of processing units. Compared to the manual approach, the evolutionary
algorithm generates equally performing solutions using considerably smaller
architectures. Moreover, the proposed algorithm requires a lower design effort since the
process is fully automated.
Keywords: Artificial Neural Networks, Evolutionary Algorithms, Artificial Neural
Network Design, Pattern Classification, Automated Visual Inspection
NOTATION
ANNGaT artificial neural network design and training
ANN artificial neural network
MLP multi-layer perceptron
EA evolutionary algorithm
GA genetic algorithm
EP evolutionary programming
BP backpropagation
ANNT artificial neural network training
Page 3
1 INTRODUCTION
Plywood is made of thin layers of wood, called veneers, joined together using an
adhesive. Defects of the veneer are identified by human inspectors as the sheets are
transported to assembly on a conveyor. The task is extremely stressful and demanding
and short disturbances or loss of attention may result in mis-classification. Two distinct
studies conducted on human inspectors in wood mills reported inspection accuracies
ranging from a more optimistic 68% (Huber et al., 1985) estimate to a more
conservative 55% (Polzleitner and Schwingshakl, 1992) measure.
An automatic visual inspection system (Pham and Alcock, 1996; Pham and Alcock,
1999a) was developed for this application by the Intelligent Systems Lab of the School
of Engineering at the University of Wales, Cardiff, UK and the Wood Research Institute
of Kuopio, Finland. Fig. 1 outlines the system. Monochrome images of the veneer are
pre-processed by automated algorithms that locate defect areas (Pham and Alcock,
1999b) where a set of numerical descriptors is extracted for further analysis. Seventeen
statistical attributes of the local grey level distribution were identified as relevant for
defect identification (Lappalainen et al., 1994; Pham and Alcock, 1999c). Twelve
possible defects of the veneer can be distinguished in contrast to clear wood giving 13
possible classes. For each data sample, a classifier takes the 17-dimensional vector of
image features and decides to which of the thirteen classes the pattern belongs.
Several algorithms were evaluated on their ability of correctly recognising wood veneer
defects. The best results were obtained using an Artificial Neural Network (ANN)
(Pham and Liu, 1995) classifier. In particular, Packianather (Packianather, 1997;
Packianather et al., 2000) reported 85% identification rates using a three-layered Multi-
Layer Perceptron (MLP) (Pham and Liu, 1995). The accuracy result was substantially
Page 4
confirmed in an independent study by Pham and Sagiroglu (2000) using a four-layered
MLP classifier. Despite the similar classification accuracies obtained, the conclusions of
the two studies differed on the best ANN configuration.
This paper addresses the design of the MLP classifier system. To the present, ANN
structure optimisation is still mainly a human expert’s job (Yao, 1999). Different ANN
architectures are usually trained according to some pre-defined induction algorithm and
their merit evaluated on the accuracy achieved. Unfortunately, training the frequently
large set of parameters (i.e., the connection weights) is one of the major problems in the
implementation of ANN systems. Since most of ANN training procedures are based on
gradient descent of the error surface, they are prone to sub-optimal convergence to local
minima. Such limitation in turn affects the capability of precisely evaluating the ANN
structures. A typical example of gradient-based learning algorithm is the
Backpropagation (BP) rule (Rumelhart and McClelland, 1986) that is used to train the
MLP classifier of the automatic visual inspection system.
A growing number of literature reports efforts toward automatic design of ANN
architectures (Branke, 1995; Yao, 1999). Constructive and destructive algorithms
(LeCun et al., 1990; Reed, 1993; Smieja, 1993) such as the Cascade Correlation
Learning Architecture (Fahlman and Lebiere, 1990) trim or enlarge the ANN structure
while parameter learning proceeds. The decision whether to add or delete further nodes
is based on greedy hill climbing of the ANN performance, thus leaving open the
problem of sub-optimal convergence to local structural optima (Angeline et al., 1994).
Thanks to their global search strategy, Evolutionary Algorithms (EAs) (Eiben and
Smith, 2003) are able to avoid being trapped into secondary peaks of performance and
can therefore provide effective and robust solution to the problem of automated ANN
Page 5
design and training (Balakrishnan and Honavar, 1995; Branke, 1995; Whitley, 1995;
Yao, 1999; Nikolaev, 2003). Three approaches have emerged: using EAs to generate the
ANN structure, using EAs for learning the parameters, and using EAs for concurrent
optimisation of both the ANN structure and the weights. The last approach presents the
most advantages in terms of reduced design effort and quality of the solutions.
However, the simultaneous evolution of the whole ANN system is not straightforward
due to the complexity of the learning task, which requires the optimisation of a large
number of mutually related parameters and variables.
This paper presents the application results of the algorithm ANNGaT, an EA for
concurrent structure design and training of ANN systems. ANNGaT is used to
automatically generate the MLP classifier for the wood veneer visual inspection system.
Section 2 introduces the problem domain. Section 3 surveys the literature on ANN
training and structure design algorithms. Section 4 describes the proposed algorithm.
Section 5 presents the experimental results of its application to the wood veneer defect
classification task. Conclusions and indications for further work are given in Section 6.
2 PROBLEM DOMAIN
The goal of this study is to design an MLP classifier that correctly recognises instances
of wood veneer defects. For this purpose, a set of 232 numerical data representing
statistical features extracted from images of plywood defect areas is available. Each
datum corresponds to a 17-dimensional feature vector. There are 13 classes
corresponding to 12 possible defects and clear wood.
Page 6
The data distribution is unbalanced, with two classes containing as few as 8 examples,
one class containing 16 examples and the remaining classes containing 20 examples.
There are no missing attributes. Table 1 details the class distribution of the data set.
Packianather et al. (2000) applied the Taguchi method (Roy, 2001) to optimise the MLP
architecture and the learning parameters of the BP training rule. The authors suggested
that one hidden layer of 45 neurons is sufficient for the task and reported a classification
accuracy estimated to 84.16% with an interval of confidence of ±1.52%.
Pham and Sagiroglu (2000) tried four different algorithms to train the MLP classifier to
identify the veneer defects. Different ANN topologies were also tested. The best results
were achieved using a manually designed MLP architecture comprising of two hidden
layers, each containing 17 neurons, and training this solution using the BP rule. The
optimised classifier achieved 86.96% recognition accuracy. However, the results were
inconclusive, since many solutions obtained similar recognition accuracies and the
paper doesn’t provide an estimate of the interval of confidence for the measurements.
In both the cases, the data set was randomly partitioned into a training set of examples
containing 80% of the instances and a test set containing the remaining 20%.
Packianather et al. (2000) trained the solutions until the performance stopped
improving. The final estimate for the classification accuracy refers to the average of 9
learning trials on 3 different random partitions of the data set. Pham and Sagiroglu
(2000) trained the solution for an experimentally optimised number of iterations. The
conclusions of the two studies differ for the structure of the ANN classifier, while the
performances of the two solutions are roughly in agreement and the differences are
likely to be due to statistical fluctuations.
Page 7
Given the current disagreement on the topology of the MLP classifier, a more
systematic search is required to determine the optimal ANN structure. For this task, a
machine learning approach allows a more exhaustive exploration of the space of the
possible MLP configurations. Furthermore, the automatic design of the ANN
architecture makes the system more easily re-configurable, since it removes the need of
time-consuming manual generation and testing of the candidate solutions.
The next section reviews the application of EAs for automatic design and training of
ANN structures.
3 EVOLUTIONARY GENERATION OF ANN SYSTEMS
The implementation of ANN systems requires the solution of two complex optimisation
tasks, that is, the design of the ANN architecture and the training of the frequently large
set of parameters.
The two tasks are closely related. On the one hand, since the worth of a candidate ANN
structure can only be assessed on the trained solution, the accuracy and the reliability of
the training procedure affects the outcome of the design process. On the other hand, the
choice of architecture has a considerable impact on the ANN processing power and
learning capabilites. Too small a topology may not possess enough computational
power to fully learn the desired input-output relationship, whereas a topology that is too
large may result in the ANN response modelling the training data too closely. The latter
case usually produces a solution with poor generalisation capabilities (Branke, 1995).
Many algorithms for ANN design and training use gradient-based search techniques,
such as constructive and destructive algorithms (Rychetsky et al., 1998; Parekh et al.,
2000) for structure optimisation and the BP rule and conjugate gradients (Johansson et
Page 8
al., 1991) for weight training. Unfortunately, local gradient-based search methods can
easily get trapped by local maxima or flat areas of the optimisation surface and time
consuming experimentation is required before a satisfactory solution is found. The
remaining of this section reviews the application of EAs, a popular class of global
search algorithms, to the automatic design and training of ANNs.
3.1 Evolutionary algorithms
EAs are stochastic search algorithms that aim to find an acceptable solution when time
or computational requirements make it impractical to find the best one. EAs are best
suited for search spaces that are multimodal, and include flat regions and points of
discontinuity where gradient-based methods would easily get stuck. EAs always search
for a global solution, while gradient-based algorithms can only find the optimum which
is at the end of the slope from the initial position. Being stochastic global optimisation
procedures, EAs are also robust to noisy fitness evaluations.
EAs are modelled on Darwin’s theory of natural evolution, where a species improves its
adaptation to the environment by means of a selection mechanism that encourages
individuals of higher fitness to reproduce more often than those of lower fitness. The
individuals improve until a stopping criterion is met. At the end of the process, the best
exemplar is chosen as the solution to the problem.
In GAs, the adaptation of an individual to the environment is defined by its ability to
perform the required task. A problem-specific fitness function is used for assessing the
quality of candidate solutions. The population is driven towards the optimal point(s) of
the search space by means of stochastic search operators inspired by the biological
mechanisms of selection, mutation and recombination.
Page 9
Following biological terminology, in GAs each data cluster defining a solution is called
a chromosome, and each basic component of a chromosome is called a gene.
EAs originated in the mid-sixties with the creation of Evolution Strategies (Rechenberg,
1965) and Evolutionary Programming (EP) (Fogel, L. J. et al., 1966). Ten years later the
creation of Genetic Algorithms (GAs) by (Holland, 1975) made EAs popular. Evolution
Strategies, EP and GAs represent different metaphors of biological evolution with
different representations of the candidate solutions and different genetic operators.
However, recent research developments in each field and the mutual exchange of ideas
blurred the boundaries between the three main branches of EAs.
3.2 Evolutionary ANN training
The first applications of EAs to the training of ANNs date back to the end of the 80s
with the work of Montana and Davis (1989) and Whitley and Hanson (1989) in the field
of GAs and L. J. Fogel and his co-workers (Fogel, D. B. et al., 1990; Saravanan and
Fogel, D. B., 1994) in the area of EP. The common approach is to encode the
connection weights into genes that are then concatenated to build the genotype. Much
debated is the representation of the solutions. The popular GA practice of binary coding
(Whitley and Hanson, 1989; Srinivas and Patnaik, 1991; Haussler et al., 1995; Seiffert,
2001) gives rise to long bit-strings for any non-trivial ANN architecture, leading to the
dual problem of a large search space and increased disruptiveness of the crossover
operator. Moreover, the larger the strings are, the longer the processing time is.
For the above reasons, standard GAs are often modified to allow more compact and
efficient encodings (Montana and Davis, 1989; Menczer and Parisi, 1992) and they are
Page 10
hybridised with other search algorithms (e.g., the BP rule) to speed up the learning
process (Montana and Davis, 1989; Skinner and Broughton, 1995; Yan et al., 1997).
Much debated is also the use of the crossover operator since there is no consensus on
which are the functional units to swap. Indeed, the distributed nature of the knowledge
base in connectionist systems favours the argument against point-to-point exchanges of
genetic material amongst solutions. Relevant to the efficiency of the crossover operator
is also the competing convention problem (Thierens et al., 1993), namely the many-to-
one mapping from the representation of the solutions (the genotype) to the actual ANN
(the phenotype). This problem leads to high disruption of the solutions' behaviour after
genetic recombinations. A way to prevent competing conventions is to match pairs of
neurons of mated solutions according to their similarity prior to the crossover operation
(Thierens et al., 1993). Alternatively, sub-populations (species) of neurons are evolved,
each species corresponding to a position on a pre-defined ANN structure (Gomez and
Miikkulainen, 2003). Unfortunately, these approaches don't scale well to large ANNs.
Because of its real-valued encoding and the lack of a crossover operator, EP is often
regarded as a better approach to ANN optimisation. Several successful implementations
are reported in the literature, mainly using Gaussian (Fogel, D. B. et al., 1990;
Saravanan and Fogel, D. B., 1994; Angeline and Fogel, D. B., 1997; Darwen, 2000;
Fogel and Chellapilla, 2005) or Cauchy (Yao and Liu, 1997a) mutation as the main
search operator. For further insights on the evolutionary training of ANNs the interested
reader can find broad surveys on the topic in (Branke, 1995; Whitley, 1995; Yao, 1999).
Page 11
3.3 Evolution of the ANN structure
Several studies report applications of EAs to the design of ANN architectures coupled
to customary weight training algorithms, a typical example being the evolution of MLP
topologies with BP training of the ANN parameters (Miller et al., 1989; Stepniewski
and Keane, 1996; Brown and Card, 1999). Fitness evaluation is generally expressed as a
multi-optimisation criterion that takes into account different requirements such as ANN
performance, size, learning speed etc.
Two main approaches for encoding the candidate solutions have emerged, namely direct
encoding and indirect encoding (Yao, 1999). Direct encoding specifies every ANN
connection and node usually representing individuals by means of connection matrices.
The architecture of the final solution is therefore fully determined by the evolution
process. Following this approach, chromosomes are easy to decode but the algorithm
does not scale well to large ANN structures.
Indirect encoding specifies only a compact representation of the structure of the
solutions, generally through parameters describing the network size and connectivity
(Harp et al., 1990; Castillo et al., 2000) or via ANN developmental rules (Kitano, 1990;
Schiffmann, 2000; Jung, 2005). While indirect encoding seems more biologically
plausible and does not suffer the problem of the competing conventions, the action of
the genetic operators on the actual phenotype becomes less clear and the decoding of the
chromosomes more difficult. Moreover, small changes in the genotype produce large
changes in the phenotype creating a rugged and more difficult search surface.
The use of EAs to design ANNs that are then trained using some parameter learning
algorithm allows compact and effective structures to be built. However, imprecision in
the evaluation of the candidate solutions must be taken into account due to possible sub-
Page 12
optimal convergence of the weight training procedure. Furthermore, the training of the
ANN weights may be excessively slow for adequate exploration of the search space. For
the above reasons, it is preferable to simultaneously optimise both the ANN architecture
and the parameters. This can be done either by alternating steps of evolutionary
structure optimisation with steps of standard (e.g. BP-driven) training of the parameters
(Cangelosi and Elman, 1995) or by evolving at the same time both the connectivity and
the weights (Srinivas and Patnaik, 1991; Angeline et al., 1994; Yao and Liu, 1997b;
Hüsken and Igel, 2002).
In the first case, the standard learning technique behaves similarly to an additional
problem-specific mutation operator. The genetic propagation of learnt knowledge
introduces an element of “Lamarckism” (Aboitiz, 1992) into the search, that is, the
permanent storing in the genotype of acquired behaviours resulting from learning by the
phenotype. In the second case, a set of mutation operators is needed for modification of
the ANN structure and weights. Standard ANN weight training algorithms (e.g., BP for
MLPs) are often used to speed up the search through Lamarckian learning (Yao 1999).
For the reasons discussed in Section 3.2, the use of crossover is not customary. Due to
the difficulty of encoding the connection weights, the use of indirect encoding becomes
problematic once the whole ANN system is evolved.
The next section presents an EA for the simultaneous design and training of the wood
veneer defect MLP classifier.
4 THE ALGORITHM
The Artificial Neural Network Generation and Training (ANNGaT) algorithm is
designed for concurrent optimisation of the structure and the connection weights for
Page 13
ANN systems. The population is evolved through a mix of random genetic mutations
and Lamarckian gradient based learning. Since it is more suitable for transmitting the
setting of the connection weights, the direct encoding approach is used for representing
the candidate solutions. This section presents the implementation of the algorithm to the
evolution of MLP classifiers for any pre-defined number of layers.
4.1 General overview
The EA architecture is shown in fig. 2. The algorithm comprises two components,
namely, a structure design module and an ANN training module, that act concurrently
on the same pool of individuals. The system is designed with the purpose of obtaining
maximum modularity between the two learning tasks.
The co-occurrence of the two modules is expected to be beneficial for the effectiveness
and the speed of the evolution procedure. That is, the presence of similarly performing
structural mutations of an individual is likely to favour population diversity. Moreover,
the EA fitness function calculates the accuracy of a solution as the difference between
the ANN output and the desired output. Manipulation of the topology modifies the
ANN output and hence the error surface, thus helping the weight training algorithm to
escape local peaks or flat areas of fitness. Finally, parallel distributed processing
systems such as ANNs possess well-known fault tolerance to addition or removal of
processing units. This capability minimises the number of fatal structural mutations,
since moderate changes of the ANN architecture are not likely to cause major disruption
to the progress of the learning procedure.
The genotype of each solution is characterised by a real-valued variable-length string
that encodes the setting of the connection weights. Each generation the fitness of the
Page 14
population is assessed, then a cycle of the structure design module and a cycle of the
ANN training module are executed. Evolution is achieved via random genetic mutations
affecting the ANN architecture and the weights. Fitness ranking (Fogel, D. B., 2000) is
used to select the pool of reproducing individuals. The BP rule is included into the ANN
training module as a problem-specific operator. Experimental tests carried out during
the algorithm optimisation phase showed that the use of the BP rule enhances the speed
and the accuracy of the weight training procedure.
Because of the real-valued encoding of the solutions and the lack of genetic crossover
the ANNGaT algorithm is conceptually akin to EP. This paradigm allows the candidate
solutions to be represented in a more compact format and avoids the many problems
stemming from use of crossover.
As a result of the action of the two modules, a new population is produced through
genetic mutation and BP training of the existing individuals. New solutions replace
current ones via generational replacement (Fogel, D. B., 2000). The procedure is
repeated until a pre-defined number of iterations has elapsed and the fittest solution of
the last generation is picked.
4.2 ANN structure design module
The proposed algorithm evolves the size (i.e. number of nodes) of the hidden layer(s) of
ANN classifier systems. The number of hidden layers is at present fixed a priori and
each layer is fully connected to the neighbouring ones.
Two genetic mutation operators of node addition and node deletion respectively add one
neuron to and delete one neuron from the ANN structure. When a new node is added, its
connection weights are initialised to small random values in order to avoid major
Page 15
disruption of the ANN behaviour. Each weight of a new node in initialised to a random
value sampled with uniform probability from the interval [-0.2, 0.2]. To bias the search
towards compact ANN structures, node deletion is given a slightly higher probability
than node addition. In the case node deletion is chosen, the algorithm picks a node from
a randomly selected hidden layer. Different heuristic criteria were tested for selecting
the unit for removal such as taking a randomly selected node, taking the node of lowest
average firing strength, taking the node of lowest maximum firing strength, taking the
node yielding the highest average error over a set of training patterns and taking the
node having the weakest incoming connections. Experimental tests indicate that the
latter choice produces the best learning results.
Despite the complexity of assessing the contribution of a processing unit in a parallel
distributed architecture, further work in this direction could produce less disruptive
node deletion operators and consequently more balanced structure mutation procedures.
At present, there seems to be an evolutionary bias toward additive mutation operations,
as the addition of a new unit with small connection weights appears to be less disruptive
than the deletion of an existing node. This bias may favour the creation non-minimal
ANN structures. Larger structures may be also be advantaged during the evolutionary
process since they can better point-to-point map the training set. Unfortunately, large
ANNs that closely fit the training set have poor generalisation capabilities. The
evolutionary bias toward larger structures is partly balanced by the superior learning
speed of smaller ANNs. A more strong bias towards minimal ANN representations is
produced by the fitness evaluation procedure (see Section 4.4).
Page 16
4.3 ANN training module
This module evolves the ANN weights in order to minimise classification error.
Evolution is achieved via two genetic operators, namely mutation and the BP algorithm.
Genetic mutations slightly modify the weights of each node of a solution. For each
weight, the perturbation is randomly sampled with uniform probability from an interval
of pre-defined width.
The BP rule is introduced as a deterministic mutation operator with the purpose of
speeding up the learning process. Individuals are randomly picked from the pool of
offsprings for BP learning. Selected solutions undergo one cycle of BP learning over the
whole training set. Since BP learning is computationally expensive, the operator is used
with a moderate rate of occurrence. Other weight training procedures may be used as an
alternative to the BP method. The deterministic weight training operator is the only part
where the ANNGaT algorithm is specific to the ANN paradigm. If other ANN models
are to be trained, the BP rule can be substituted by other parameter learning procedures.
The ANN training module can be run independently of the structure optimisation
procedure as an algorithm on its own and it can be used as an alternative to the standard
ANN training techniques. It will be henceforth referred as the ANN Training (ANNT)
algorithm.
4.4 Fitness evaluation function
The fitness of the candidate solutions is evaluated on their capability of accurately
classifying the training set of examples. To encourage the creation of compact and high
performing solutions, whenever the fitness score of two individuals is equal, preference
Page 17
is given to the solution having the smallest structure. ANN optimisation therefore
follows a hierarchical criterion where accuracy has priority over compactness.
In general, there are cases where the difference in accuracy between some of the
solutions is very small (i.e., a few training examples) in comparison to the spread of the
population. In such cases, it is more efficient to consider those solutions to be equally
performing and give preference to the ones having the most economical structure.
The proposed algorithm considers the accuracy of two individuals to be equal when the
difference is less than one standard deviation of the average population accuracy. That
is, the population is divided into a number of bins of width equal to
−
−⋅=popsize
worstbest
duration
gendvastdwidth ,1_max (1)
where width is the width of the bin, std_dva is the standard deviation of the fitness of
the population, gen is the current evolutionary cycle, duration is the duration of the
learning procedure, best and worst are the classification accuracies of respectively the
best and the worst individual and popsize is the population size.
The first bin is centred around the best performing solution while the centres of the
remaining bins are calculated according to the following formula:
widthibestcentrei ⋅−= (2)
where centrei is the centre of the ith bin and i is an integer number (i=1,…,n) that is
progressively increased until all the population is grouped.
Page 18
The proposed procedure aims at cutting part of the noise that affects the evaluation of
the candidate solutions. As the algorithm proceeds, the width of the bins is
progressively shrunk to shift the emphasis on finer differences of accuracy. For each
evaluation of the EA population, equation (1) limits the number of bins to a value that is
no greater than the population size.
Solutions are awarded the following pair of measures as fitness score:
{ }jj sizeinfitness ,−= (3)
where fitnessj is the fitness score of the jth member of the population, i is the bin where
the jth solution lies, n is the total number of bins and sizej expresses the size of the MLP
architecture as the total number of connection weights. Since the ANN is fully
connected and the input and the output layers are fixed, sizej is determined by the size of
the hidden layer(s).
The first fitness measure is proportionally related to the classification accuracy. That is,
the best performing solution (grouped into the first bin) has an accuracy score equal to
n-1. All the solutions within half bin width from the accuracy of the best individual
obtain the same score. The solutions grouped into the second bin obtain an accuracy
score equal to n-2, and so forth until the the last bin where solutions achieve an
accuracy score equal to 0. Solutions having the same accuracy score (i.e., belonging to
the same bin) are further ranked according to the measure of their size by the fitness
ranking procedure.
Page 19
5 EXPERIMENTAL SETTINGS AND RESULTS
This section presents the experimental settings and the results of the application of the
ANNGaT algorithm to the generation of the MLP classifier for the wood veneer defect
recognition task discussed in Section 2.
5.1 Experimental set up
This section presents the results of five experiments. Namely, two tests concerning the
BP training of manually optimised ANN structures, one test concerning the training of a
manually optimised structure using the ANNT algorithm, and two tests concerning the
full ANNGaT algorithm. The result reported for each of the experiments corresponds to
the average of 20 independent learning trials.
To simplify the training of the individuals, input data are normalised according to the
Mean-Variance procedure. A data balancing procedure is used. For each learning trial,
the size of the classes in the training set is made even by duplicating randomly picked
members of the smaller categories.
For each learning trial, the data set is randomly split into a training set including 80% of
the examples and a test set including the other 20%. The classifier is trained on the
former and the final learning result is evaluated on the latter. To reduce the danger of
overfitting, the order of presentation of the training samples is randomly reshuffled for
every learning cycle of the algorithm under evaluation.
The first two experiments replicate the learning trials of Packianather et al. (2000) and
Pham and Sagiroglu (2000) and train the classifier using the BP rule with momentum
term. The first test is carried out using the ANN topology that Packianather et al. (2000)
optimised through the Taguchi method. This configuration is characterised by a hidden
Page 20
layer of 45 processing units. Each neuron of the hidden layer receives 17+1 incoming
connections from the 17 input neurons and the bias neuron. Each neuron of the output
layer receives 45+1 incoming connections from the 45 hidden neurons and the bias
neuron. The MLP architecture is therefore composed of a total of 45x18+13x46=1408
connection weights. The second test uses the ANN configuration that Pham and
Sagiroglu (2000) have experimentally determined. This configuration is characterised
by two hidden layers of 17 processing units each. Likewise to the previous case, the
total connectivity of the MLP architecture can be calculated to amount to
17x18+17x18+13x18=846 connection weights. The duration of the BP learning
procedure is experimentally set to a fixed number of iterations. The purpose of the first
two experiments is to provide a baseline for the comparison of the results.
The third experiment uses the ANNT algorithm to train a manually designed MLP
classifier. The duration of the ANNT procedure and the size of the ANN architecture is
experimentally set to maximise the learning accuracy. The best performing solution
consists of one hidden layer of 35 processing units. The duration of the ANNT
algorithm is experimentally fixed. The purpose of this test is to assess the performance
of the ANNT procedure, which is the evolutionary MLP training module of the
ANNGaT algorithm.
Finally, the last two experiments apply the full ANNGaT algorithm to generate and
train the wood veneer defect classifier. In the first test, the EA is used to design and
train a three-layer (one hidden layer) MLP classifier. In the second test, the EA is used
to design and train a four-layer (two hidden layer) MLP classifier. In both the cases, the
size of the hidden layer(s) of the starting population is randomly initialised within the
interval of integer numbers [15, 25].
Page 21
The optimisation of the learning algorithms is carried out according to experimental
trial and error. Table 2 shows the main MLP settings and the learning parameters used
in the five tests.
5.2 Experimental results
The results of the five experiments are reported in Table 3 and Table 4. Table 3 details
the design method used for the MLP classifier, the mean and the standard deviation of
the classification accuracy over the 20 learning trials, and the number of learning cycles.
Table 3 reports also the upper and lower bounds of the 95% confidence interval for the
accuracy results produced by the automatic and manual design methods. All accuracy
results refer to the percentage of successfully classified examples of the test set. The
number of learning cycles refers to the manually optimised fixed duration of the
algorithm. Table 4 shows the average, maximum and minimum size of the hidden
layer(s) evolved by ANNGaT.
The first two experiments substantially confirm the results reported in the literature.
That is, the average classification accuracy obtained by the three-layered MLP is within
one standard deviation from the 84.16% classification accuracy reported for the same
architecture by Packianather et al. (2000). The average classification accuracy obtained
by the four-layered MLP is within two standard deviations from the 86.96%
classification accuracy reported for the same architecture by Pham and Sagiroglu
(2000). The spread of the classification accuracy for the MLP configuration having 45
hidden neurons is higher than the 1.52% estimate of Packianather et al. (2000).
However, the latter estimate is calculated by running 9 learning trials equally distributed
on 3 randomly initialised training and test set partitions, while the figure in table 3 is
Page 22
computed from 20 learning trials, each of them on a different randomly initialised data
set partition. The larger variability of the initial conditions may therefore explain the
increased spread of the data distibution. Overall, the two MLP configurations obtain
comparable learning results in terms of accuracy and robustness.
The learning trials involving the ANNT algorithm (third experiment) do not bring
further improvements of the learning results. The average classification accuracy of the
solutions evolved by the ANNT procedure is within one standard deviation from the
average accuracy of the solutions trained using the BP rule. The spread of the
classification accuracy of the solutions trained using the ANNT algorithm is also
comparable to the spread of the BP trained solutions. This result may indicate that the
search capability of the BP algorithm is already adequate for the problem domain.
Nevertheless, the test proves the capability of the proposed procedure of successfully
training high performing MLP solutions and provides a baseline for the evaluation of
the ANNGaT algorithm.
Finally, the learning trials involving the full ANNGaT algorithm generate solutions that
attain classification results comparable within one standard deviation to the results
obtained using the BP or the ANNT algorithm. However, the average size of the
evolutionary generated MLP structures is considerably smaller than the size of the
manually generated counterparts.
The average size of the hidden layer of the evolved three-layer solutions is three times
smaller than the size of the hidden layer of the solution suggested by Packianather et al.
(2000). The hidden layer of the evolutionary three-layer MLP is also less than half the
size of the solution that was manually optimised by the authors.
Page 23
The four-layer evolutionary ANN is more compact than the four-layer solution
suggested by Pham and Sagiroglu (2000). On average, the two hidden layers generated
by the ANNGaT algorithm are two-thirds the size of the manually optimised
counterparts. This reduction in size entails a drastic reduction in the number of
connection weights. In fact, the automatically generated neural networks contain a
number of weights that is on average half the number of weights contained by the
manually optimised structures.
The introduction of the second hidden layer brings no appreciable improvements to the
learning accuracy of the evolved solutions. This result confirms the conclusion of
Packianather et al. (2000) that a three-layer ANN configuration is fully adequate for the
task. Given the size of the hidden layers evolved by the ANNGaT algorithm for the two
configurations, a three-layer ANN structure represents the most compact choice.
The size of the minimal solutions generated during the 20 learning trials emphasises the
capability of the proposed algorithm of generating extremely compact structures. The
the maximal solutions generated during the 20 learning trials are in both cases (3-
layered and 4-layered MLP) smaller than the manually generated classifiers. The use of
smaller MLP architectures gives advantages in terms of faster processing times, and
cheaper implementation on hardware boards.
On average, the time duration of one run of the ANNGaT algorithm amounts to 71
minutes circa for the optimisation of the three-layer configuration and 103 minutes circa
for the optimisation of the four-layer configuration. BP training of the manually
optimised solutions takes about 10 minutes. The latter figure however doesn’t take into
account the time needed for the lengthy manual trial and error design of the solutions.
Page 24
The manual optimisation of the MLP took the authors about one day of work. A
PentiumIII 1GHz processor with 512 MB of RAM was used for the tests.
Fig. 3a-b shows the learning curves for the three- and four-layer MLP configurations.
The figure refers to 20 independent learning trials that were run during the parameter
optimisation phase of the algorithm. The plot monitors the evolution of the average
classification accuracy of the EA population and of its fittest individual on the training
set of examples. The evolution of the classification accuracy of the fittest individual on
the test set is also reported. The plot refers to average values over the 20 learning trials
and monitors the learning process over 10000 generations. In both the cases, the curves
show the standard learning pattern of EAs with a brisk initial improvement of the
population fitness followed by slow convergence to the maximum values.
The smooth approximately exponential behaviour of the learning curves is the result of
the collective learning process of a population of individuals. While part of the
population may occasionally get stuck into local optima or flat areas of fitness, other
solutions improve their fitness, and gradually monopolise the whole population pool.
This gradual spread of successful solutions gives rise to the smooth approximately
exponential learning curves of the kind of Fig. 3.
The classification accuracies usually reach their peak well before the last generation.
Due to some overfitting of the training set, the generalisation capability (i.e. test set
accuracy) of the solutions slightly deteriorates towards the end of the evolution span.
Fig. 4a-b illustrates the evolution of the average size of the hidden layer of the three-
and four-layer MLP configurations. The figure refers to the same 20 learning trials
monitored in figs. 3 and shows the structure evolution process over 10000 generations.
Very soon the size of the MLP structures starts decreasing and increasingly more
Page 25
compact solutions are found. This process continues also after the training accuracy
reaches its peak, driven by the selection procedure that favours compactness when equal
accuracy is attained. In the second part of the evolution period, the population
converges to the final solution and the evolution curves flatten.
6 CONCLUSIONS AND FURTHER WORK
This study expands previous work on the classification of wood veneer defects from
statistical features of wood sub-images using an MLP classifier system. The generation
of the MLP classifier is automated following the introduction of ANNGaT, an
evolutionary procedure for concurrent structure design and training of ANN systems.
The proposed algorithm evolves the size of the ANN hidden layer(s) while training the
connection weights.
Experimental evidence shows the ANNGaT algorithm builds highly compact MLP
structures capable of accurate and robust learning. The time duration of the proposed
algorithm is very reasonable.
Compared to the current approach based on the Taguchi method for the manual
optimisation of the MLP structure and on the BP rule for the training of the connection
weights, the proposed algorithm generates equally performing ANN solutions using
considerably smaller architectures. On average, the size of the hidden layer of the three-
layer MLP structures created by the ANNGaT algorithm is three times smaller than the
size of the empirically optimised three-layer configuration. Compared to an alternative
approach based on trial and error optimisation of the ANN structure, the ANNGaT
algorithm generates four-layer ANN structures that are on average two-thirds the size of
Page 26
the manually generated counterparts. In both the cases, the proposed algorithm clearly
requires lower design costs since the process is fully automated.
Further investigation on the impact of the structure modification operations on the ANN
behaviour could help to produce less disruptive mutation procedures. Improvement of
the proposed algorithm may also result from submitting the EA learning parameters,
namely the weight mutation stepsize and the BP learning rate to the evolutionary
process. Finally, since different ANN architectures and learning parameters are
characterised by different learning curves, niching techniques could be used to let
differently parametrised sub-populations evolve separately before submitting them to
competition. Such approach would allow more informed evaluations of the candidate
solutions.
REFERENCES
Aboitiz, F. (1992), Mechanisms of adaptive evolution - Darwinism and Lamarckism
restated, Medical Hypotheses, vol. 38, no. 3, pp. 194-202.
Angeline, P. J. a and Fogel, D. B. (1997), An evolutionary program for the
identification of dynamical systems, in Proceedings of SPIE Volume 3077 : Application
and Science of Artificial Neural Networks III, Rogers, S. editor, SPIE The International
Society for Optical Engineering, Bellingham, WA, pp. 409-417.
Angeline, P. J., Sauders, G. M. and Pollack, J. B. (1994), An Evolutionary Algorithm
That Constructs Recurrent Neural Networks, IEEE Transactions Neural Networks, vol.
5, no. 1, pp. 54--65.
Balakrishnan, K. and Honavar, V. (1995), Evolutionary Design of Neural Architectures
- a Preliminary Taxonomy and Guide to Literature, Technical Report CS TR95-01.
Department of Computer Science, Iowa State University, Ames.
Page 27
Branke, J. (1995), Evolutionary Algorithms for Neural Network Design and Training,
Technical Report no. 322, Institute AIFB, University Karlsruhe.
Branke, J. (1995), Evolutionary Algorithms for Neural Network Design and Training,
Technical Report no. 322, Institute AIFB, University Karlsruhe.
Brown A. D. and Card H. C. (1999), Cooperative-Competitive Algorithms for
Evolutionary Networks Classifying Noisy Digital ImagesNeural Processing Letters, vol.
10, no. 3, pp. 223-229.
Cangelosi, A. and Elman, J. L. (1995), Gene Regulation and Biological Development in
Neural Networks: an Exploratory Model, Technical Report, CRL-UCSD, University of
California San Diego.
Castillo, P. A., Carpio, J., Merelo, J. J., Prieto, A., Rivas, V. and Romero, G. (2000),
Evolving Multilayer Perceptrons. Neural Processing Letters, vol. 12, no. 2.
Darwen, P. J. (2000), Black Magic: Interdependence Prevents Principled Parameter
Setting, Self-Adapting Costs Too Much Computation, In Applied Complexity: From
Neural Nets to Managed Landscapes, pages 227-237
Eiben, A. E. and Smith, J. E. (2003), Introduction to evolutionary computing, Springer,
New York.
Fahlman, S. E. and Lebiere, C. (1990), The Cascade-Correlation Learning Architecture,
in Advances in Neural Information Processing Systems 2, Touretzky, D. S. editor,
Morgan Kaufmann, San Mateo, CA, pp. 524-532.
Fogel, D. B. (2000), Evolutionary Computation: Toward a New Philosophy of Machine
Intelligence, 2nd ed., IEEE Press, New York.
Page 28
Fogel, D. B. and Chellapilla, K. (2002), Verifying Anaconda’s Expert Rating by
Competing Against Chinook: Experiments in Co-Evolving a Neural Checkers Player,
Neurocomputing, no. 42, pp. 69-86.
Fogel, D. B., Fogel L. J. and Porto, V. W. (1990), Evolutionary programming for
training neural networks, Proceedings International Joint Conference on NNs, S. Diego
CA - June 1990, pp. 601-605.
Fogel, L. J., Owens A. J. and Walsh, M. J. (1966), Artificial intelligence through
simulated evolution, J. Wiley, New York.
Gomez, F. J. and Miikkulainen, R. (2003), Active Guidance for a Finless Rocket
through Neuroevolution, Proceedings 2003 Genetic and Evolutionary Computation
Conference (GECCO), Chicago IL, pp. 2084-2095.
Hancock, P. J. B. (1992), Genetic Algorithms and Permutation Problems: a Comparison
of Recombination Operators for Neural Structure Specification, Combinations of
Genetic Algorithms and Neural Networks, Whitely, D. and Schaffer, J. D. editors IEEE
Computer Society Press
Harp, S. A., Samad, T. and Guha, A. (1990), Designing Application-Specific Neural
Networks Using the Genetic Algorithm, in Advances in Neural Information Processing
Systems 2, D. S. Touretzky, editor, Morgan Kaufmann, San Mateo, CA, pp. 447-454.
Haussler, A., Li, Y., Ng, K. C., Murray-Smith, D. J. and Sharman, K. C. (1995),
Neurocontrollers Designed by a Genetic Algorithm, Proceedings GALESIA First
IEE/IEEE International Conference on GAs in Engineering Systems: Innovations and
Applications, Sheffield UK - 1995, pp. 536-542.
Holland, J. H. (1975), Adaptation in Natural and Artificial Systems, Ann Arbor, MI:
University of Michigan Press.
Page 29
Huber, H. A., McMillin, C. W. and McKinney, J. P. (1985), Lumber Defect Detection
Abilities of Furniture Rough Mill Employees. Forest Products Journal, vol. 35, no.
11/12, pp 79 - 82.
Hüsken, M. and Igel, C. (2002), Balancing Learning And Evolution, Proceedings
Generic and Evolutionary Computation Conference, (GECCO-2002), San Francisco
CA, USA, pp. 391-398.
Johansson, E. M., Dowla, F. U. and Goodman, D. M. (1991), Backpropagation Learning
for Multilayer Feed-Forward Neural Networks Using the Conjugate Gradient Method,
International Journal Neural Systems, vol. 2, no. 4, pp. 291-301.
Jung, S. Y. (2005), A Topographical Method for the Development of Neural Networks
for Artificial Brain Evolution, Artificial Life, no. 11, pp. 293-316.
Kitano, H. (1990), Designing Neural Networks Using Genetic Algorithms with Graph
Generation System, Complex Systems, vol. 4, no. 4, pp. 461-476.
Lappalainen, T., Alcock, R. J. and Wani, M. A. (1994). Plywood Feature Definition and
Extraction. Report 3.1.2, QUAINT, BRITE/EURAM project 5560, Intelligent Systems
Laboratory, School of Engineering, University of Wales, Cardiff.
LeCun, Y., Denker, J. S. and Solla, S. A. (1990), Optimal Brain Damage, In Advances
in Neural Information Processing Systems 2, Touretzky, D. S. editor, Morgan
Kaufmann, San Mateo, CA, pp. 598-605.
Menczer, F. and Parisi, D. (1992), Evidence of Hyperplanes in the Genetic Learning of
Neural Networks, Biological Cybernetics, vol. 66, pp. 283-289
Miller, G. F., Todd, P. M. and Hegde, S.U. (1989), Designing neural networks using
genetic algorithms, Proceedings 3rd International Conference on GAs and Applications,
Arligton VA - 1989, pp. 379-384.
Page 30
Montana, D. and Davis, L. (1989), Training feedforward neural networks using genetic
algorithms, Proceedings 11th International Joint Conference on AI, Detroit MI - 1989,
pp. 762-767.
Nikolaev, N. Y. (2003), Learning polynomial feedforward neural networks by genetic
programming and backpropagation, IEEE Transactions on Neural Networks, vol. 14,
no. 2, pp. 337-350
Packianather, M. (1997), Design and Optimisation of Neural Network Classifiers for
Automatic Visual Inspection of Wood Veneer, Ph.D. Thesis, University of Wales,
College of Cardiff (UWCC) – UK
Packianather, M. S., Drake, P. R. and Rowlands, H. (2000), Optimising the Parameters
of Multilayered Feedforward Neural Networks through Taguchi Design of Experiments.
Quality and Reliability Engineering International, vol. 16, pp. 461-473
Parekh, R., Yang, J. H., Honavar, V. (2000), Constructive neural-network learning
algorithms for pattern classification, IEEE Transactions Neural Networks, vol. 11, no. 2,
pp. 436-451
Pham, D. T. and Alcock, R. J. (1996). Automatic Detection of Defects on Birch Wood
Boards. Proceedings of Institution of Mechanical Engineers, part E, Journal of Process
Mechanical Engineering, vol. 210, pp. 45-52.
Pham, D. T. and Alcock, R. J. (1999a), Recent Developments in Automated Visual
Inspection of Wood Boards, Advances in Manufacturing: Decision, Control and
Information Technology, Tzafestas ed., Springer - London, pp. 80-88
Pham, D. T. and Alcock, R. J. (1999b), Plywood Image Segmentation Using Hardware-
Based Image Processing Functions", Proceedings of Institution of Mechanical
Engineers, part B, vol. 213, pp. 431-434.
Page 31
Pham, D. T. and Alcock, R. J. (1999c), Automated Visual Inspection of Wood Boards:
Selection of Features for Defect Classification by a Neural Network, Proceedings of
Institution of Mechanical Engineers, part E, vol. 213, pp. 231-245.
Pham, D. T. and Liu, X. (1995), Neural Networks for Identification, Prediction and
Control, Springler-Verlag Ltd., London.
Pham, D. T. and Sagiroglu, S. (2000), Neural Network Classification of Defects in
Veneer Boards, Proceedings of Institution of Mechanical Engineers, part B, vol. 214,
pp. 255-258.
Polzleitner, W. and Schwingshakl, G. (1992), Real-Time Surface Grading of Profiled
Wooden Boards, Industrial Metrology, vol. 2, pp. 283 -298.
Rechenberg, I. (1965), Cybernetic Solution Path of an Experimental Problem, Library
Translation no. 1122, Ministry of Aviation, Royal Aircraft Establishment, Farnborough,
Hants UK.
Reed, R. 1993, Pruning algorithms - A survey, IEEE Transactions Neural Networks,
vol. 4, pp. 740-747.
Roy, R. K. (2001), Design of Experiments Using The Taguchi Approach : 16 Steps to
Product and Process Improvement, John Wiley and Sons Ltd, New York.
Rumelhart, D. E. and McClelland, J. L. (1986), Parallel distributed processing:
exploration in the micro-structure of cognition, vol. 1-2, Cambridge, MIT Press.
Rychetsky, M., Ortmann, S. and Glesner, M. (1998), Correlation and Regression Based
Neuron Pruning Strategies, In Fuzzy-Neuro-Systems '98, 5th Int Workshop, Munich D.
Saravanan, N. and Fogel, D. B. (1994), Evolving Neurocontrollers using evolutionary
programming, Proceedings First IEEE Conference on Evolutionary Computation
(ICEC), Orlando FL - 1994, pp. 217-222.
Page 32
Schiffmann, W. (2000), Encoding feedforward networks for topology optimization by
simulated evolution, Proceedings 4th International Conference on Knowledge-Based
Intelligent Engineering Systems & Allied Technologies (KES '2000), vol. 1, pp. 361-
364,
Seiffert, U. (2001), Multiple Layer Perceptron Training Using Genetic Algorithms,
Proceedings 9th European Symposium on Artificial Neural Networks (ESANN 2001),
Bruges B, pp. 159-164.
Skinner, A. and Broughton, J. Q. (1995), Neural Networks in Computational Materials
Science: Training Algorithms, Modelling and Simulation in Materials Science and
Engineering, vol. 3, pp. 371-390.
Smieja, F. J. (1993), Neural Network Constructive Algorithms: Trading Generalization
for Learning Efficiency?, Circuits, Systems and Signal Processing, vol. 12, no. 2, pp.
331-374
Srinivas, M. and Patnaik, L. M. (1991), Learning Neural Network Weights Using
Genetic Algorithms - Improving Performance by Search-Space Reduction, in
Proceedings of 1991 IEEE International Joint Conference on Neural Networks
IJCNN'91, Singapore, vol. 3, IEEE Press, New York, NY, pp. 2331--2336.
Stepniewski, S. W. and Keane, A. J. (1996), Topology design of feedforward neural
networks by genetic algorithms, in Proceedings of the 4th Int Conference on Parallel
Problem Solving from Nature (PPSN IV), pp. 771-780.
Thierens, D., Suykens, J., Vanderwalle, J., and De Moor, B. (1993), Genetic Weight
Optimisation of a Feedforward Neural Network Controller, in Artificial Neural
Networks and Genetic Algorithms, Albrecht, R.F., Reeves, C.R., and Steele, N.C.
editors, Springler-Verlag Wien, pp. 658-663.
Page 33
Whitley, D. (1995), Genetic Algorithms and Neural Networks, Genetic Algorithms in
Engineering and Computer Science, Winter, G., Periaux, J., Galan M. and Cuesta, P.
editors, John Wiley, pp. 203-216.
Whitley, D. and Hanson, T. (1989), Optimising neural networks using faster, more
accurate genetic search, Proceedings 3rd International Conference on GAs and
Applications, Arligton VA - 1989, pp. 391-396.
Yao, X. (1999), Evolving Artificial Neural Networks, Proceedings IEEE, vol. 87, no. 9,
pp. 1423-1447.
Yao, X. and Liu, Y. (1997a), Fast evolution strategies, Proceedings of the 6th Annual
Conference on Evolutionary Programming (EP97), Lecture Notes in Computer Science,
vol. 1213, Springer-Verlag, Berlin, pp. 151-161.
Yao, X. and Liu, Y. (1997b), A New Evolutionary System for Evolving Artificial
Neural Networks, IEEE Transactions on Neural Networks, vol. 8, no. 3, pp. 694-713.
Yan, W., Zhu, Z. and Hu, R. (1997), Hybrid Genetic/BP Algorithm and its Application
for Radar Target Classification, Proceedings 1997 IEEE National Aerospace and
Electronics Conference, NAECON part 2, pp. 981-984.
Page 34
BIOGRAPHY OF AUTHORS Doctor Marco Castellani obtained his Ph.D. degree in 2000 from University of Wales, Cardiff with a thesis on intelligent control of manufacturing of fibre optic components. Between 2001 and 2002, he worked for a private company in Germany on machine learning applications to natural language processing. Between 2002 and 2005, he was at the New University of Lisbon, where his research work included machine learning, machine vision, remote sensing, pattern recognition and time series prediction. Professor Hefin Rowlands is Director of Research & Enterprise at the University of Wales, Newport with responsibilities for developing the research culture and environment for staff and postgraduate students across the University. His doctorate thesis was on optimum design using Taguchi method with neural networks and genetic algorithms. He was awarded a University of Wales Personal Chair in 2002 and his current research interests are concerned with investigating the benefits companies achieve from deploying business improvement techniques such as six sigma. LIST OF FIGURES Fig. 1: The AVI System. Fig. 2: ANNGaT Architecture. Fig. 3: ANNGaT Evolution Curves – Classification accuracy Fig. 4: ANNGaT Evolution Curves – ANN structure. LIST OF TABLES Table 1: Class Distribution of Wood Veneer Data Set. Table 2: Parameter Setting of Multi-Layer Perceptron and Learning Algorithms. Table 3: Experimental Results.
Page 35
Fig. 1: The AVI System.
Image segmentation
Image acquisition
Feature extraction
Classifier
CCD camera
veneer
Page 36
.......weights encoding
Genotype
Phenotype
random node addition/deletion
random weights mutation
Lamarckism
yes
Solution
population
population
reproduction pool
new population
no
new
pop
ulat
ion
Random initialisation
Fitness evaluation
Selection
ANN design module
ANN training module
stop?
BP learning
Fig. 2: ANNGaT Architecture.
Page 37
Three-Layer configuration - Classification Accuracy
0
10
20
30
40
50
60
70
80
90
100
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Generations
Accuracy
Population Average - Training SetBest Solution - Training SetBest Solution - Test Set
a) Three-layer configuration
Four-Layer configuration - Classification Accuracy
0
10
20
30
40
50
60
70
80
90
100
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Accuracy
Generations
Population Average - Training Set
Best Solution - Training Set
Best Solution - Test Set
b) Four-layer configuration Fig. 3: ANNGaT Evolution Curves – Classification accuracy
Page 38
Three-Layer configuration - Structure Optimisation
0
2
4
6
8
10
12
14
16
18
20
22
24
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Size of Layer
Generations
Best Solution
a) Three-layer configuration
Four-Layer configuration - Structure Optimisation
0
2
4
6
8
10
12
14
16
18
20
22
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Size of Layer
Generations
Best Solution - First Layer
Best Solution - Second Layer
b) Four-layer configuration Fig. 4: ANNGaT Evolution Curves – ANN structure.
Page 39
class number of examples 1 bark 20
2 clear wood 20
3 coloured streaks 20
4 curly grain 16
5 discoloration 20
6 holes 8
7 pin knots 20
8 rotten knots 20
9 roughness 20
10 sound knots 20
11 splits 20
12 streaks 20
13 worm holes 8
total examples 232
Table 1: Class Distribution of Wood Veneer Data Set.
Page 40
Multi-Layer Perceptron Settings
Input nodes 17
Output nodes 13
Hidden layers *
MLP hidden nodes *
Activation function of hidden layer nodes Hyper-tangent
Activation function of output layer nodes Sigmoidal
Evolutionary Algorithm Settings BP ANNT ANNGaT
Trials 20 20 20
Learning Cycles * * *
Population size - 100 100
MLP structure mutation rate (node addition) - - 0.0225
MLP structure mutation rate (node deletion) - - 0.0275
MLP weights mutation rate - 0.25 0.25
Amplitude MLP weights mut. - 0.2 0.2
BP Lamarckian operator rate - 0.6 0.6
BP Learning coefficient 0.01 0.01 0.01
BP Momentum term 0.1 - -
Initialisation range for MLP weights [-0.05, 0.05] [-0.05, 0.05] [-0.05, 0.05]
Initialisation range for MLP hidden nodes - - [15, 25]
* depending from test
Table 2: Parameter Setting of Multi-Layer Perceptron and Learning Algorithms.
Page 41
BP ANNT ANNGaT
MLP design method
Manual (P&S)
Taguchi (P&O)
manual (authors)
evolutionary (3 layers)
evolutionary (4 layers)
Accuracy 78.09 82.02 82.34 80.00 80.64
Standard Deviation
5.17 4.86 4.53 3.41 4.97
Upper Bound* 80.50 84.29 84.46 81.60 82.97
Lower Bound* 75.67 79.75 80.22 78.40 78.31
Learning cycles 4000 1500 2000 7000 2000
Manual (P&S) see Pham and Sagiroglu, 2000 Taguchi (P&O) see Packianather et al., 2000 * 95% confidence interval
Table 3: Experimental Results – Classification Accuracy.
ANNGaT Manual (P&S)
Taguchi (P&O)
Manual (Authors)
ANNGaT - 3 layers ANNGaT -4 layers
Average Max Min Average Max Min
hidden layer 1
17 45 35 14.15 17 8 12.31 14 9
hidden layer 2
17 - - - - - 12.25 16 9
Manual (P&S) see Pham and Sagiroglu, 2000 Taguchi (P&O) see Packianather et al., 2000
Table 4: Experimental Results – Structure Optimisation.