Evolutionary Artificial Neural Network Design and Training for wood veneer classification

EVOLUTIONARY ARTIFICIAL NEURAL NETWORK DESIGN AND

TRAINING FOR WOOD VENEER CLASSIFICATION

Marco Castellania1 and Hefin Rowlandsb

aCentro de Inteligência Artificial (CENTRIA), Departamento Informática, Universidade

Nova Lisboa, 2829-516 Caparica, Portugal

b Research & Enterprise Department, University of Wales, Newport, Allt-yr-yn

Campus, PO Box 180, NP20 5XR Newport, UK

ABSTRACT

This study addresses the design and the training of a multi-layer perceptron classifier for

identification of wood veneer defects from statistical features of wood sub-images.

Previous research utilised a neural network structure manually optimised using the

Taguchi method with the connection weights trained using the backpropagation rule.

The proposed approach uses the evolutionary ANNGaT algorithm to generate the neural

network system. The algorithm evolves simultaneously the neural network topology and

the weights. ANNGaT optimises the size of the hidden layer(s) of the neural network

structure through genetic mutations of the individuals. The number of hidden layers is a

system parameter. Experimental tests show that ANNGaT produces highly compact

neural network structures capable of accurate and robust learning. The tests show no

differences in accuracy between neural network architectures using one and two hidden

1 Corresponding author: Tel. +351 212948536 Fax +351 212948541 Email [email protected]

layers of processing units. Compared to the manual approach, the evolutionary

algorithm generates equally performing solutions using considerably smaller

architectures. Moreover, the proposed algorithm requires a lower design effort since the

process is fully automated.

Keywords: Artificial Neural Networks, Evolutionary Algorithms, Artificial Neural

Network Design, Pattern Classification, Automated Visual Inspection

NOTATION

ANNGaT artificial neural network design and training

ANN artificial neural network

MLP multi-layer perceptron

EA evolutionary algorithm

GA genetic algorithm

EP evolutionary programming

BP backpropagation

ANNT artificial neural network training

1 INTRODUCTION

Plywood is made of thin layers of wood, called veneers, joined together using an

adhesive. Defects of the veneer are identified by human inspectors as the sheets are

transported to assembly on a conveyor. The task is extremely stressful and demanding

and short disturbances or loss of attention may result in mis-classification. Two distinct

studies conducted on human inspectors in wood mills reported inspection accuracies

ranging from a more optimistic 68% (Huber et al., 1985) estimate to a more

conservative 55% (Polzleitner and Schwingshakl, 1992) measure.

An automatic visual inspection system (Pham and Alcock, 1996; Pham and Alcock,

1999a) was developed for this application by the Intelligent Systems Lab of the School

of Engineering at the University of Wales, Cardiff, UK and the Wood Research Institute

of Kuopio, Finland. Fig. 1 outlines the system. Monochrome images of the veneer are

pre-processed by automated algorithms that locate defect areas (Pham and Alcock,

1999b) where a set of numerical descriptors is extracted for further analysis. Seventeen

statistical attributes of the local grey level distribution were identified as relevant for

defect identification (Lappalainen et al., 1994; Pham and Alcock, 1999c). Twelve

possible defects of the veneer can be distinguished in contrast to clear wood giving 13

possible classes. For each data sample, a classifier takes the 17-dimensional vector of

image features and decides to which of the thirteen classes the pattern belongs.

Several algorithms were evaluated on their ability of correctly recognising wood veneer

defects. The best results were obtained using an Artificial Neural Network (ANN)

(Pham and Liu, 1995) classifier. In particular, Packianather (Packianather, 1997;

Packianather et al., 2000) reported 85% identification rates using a three-layered Multi-

Layer Perceptron (MLP) (Pham and Liu, 1995). The accuracy result was substantially

confirmed in an independent study by Pham and Sagiroglu (2000) using a four-layered

MLP classifier. Despite the similar classification accuracies obtained, the conclusions of

the two studies differed on the best ANN configuration.

This paper addresses the design of the MLP classifier system. To the present, ANN

structure optimisation is still mainly a human expert’s job (Yao, 1999). Different ANN

architectures are usually trained according to some pre-defined induction algorithm and

their merit evaluated on the accuracy achieved. Unfortunately, training the frequently

large set of parameters (i.e., the connection weights) is one of the major problems in the

implementation of ANN systems. Since most of ANN training procedures are based on

gradient descent of the error surface, they are prone to sub-optimal convergence to local

minima. Such limitation in turn affects the capability of precisely evaluating the ANN

structures. A typical example of gradient-based learning algorithm is the

Backpropagation (BP) rule (Rumelhart and McClelland, 1986) that is used to train the

MLP classifier of the automatic visual inspection system.

A growing number of literature reports efforts toward automatic design of ANN

architectures (Branke, 1995; Yao, 1999). Constructive and destructive algorithms

(LeCun et al., 1990; Reed, 1993; Smieja, 1993) such as the Cascade Correlation

Learning Architecture (Fahlman and Lebiere, 1990) trim or enlarge the ANN structure

while parameter learning proceeds. The decision whether to add or delete further nodes

is based on greedy hill climbing of the ANN performance, thus leaving open the

problem of sub-optimal convergence to local structural optima (Angeline et al., 1994).

Thanks to their global search strategy, Evolutionary Algorithms (EAs) (Eiben and

Smith, 2003) are able to avoid being trapped into secondary peaks of performance and

can therefore provide effective and robust solution to the problem of automated ANN

design and training (Balakrishnan and Honavar, 1995; Branke, 1995; Whitley, 1995;

Yao, 1999; Nikolaev, 2003). Three approaches have emerged: using EAs to generate the

ANN structure, using EAs for learning the parameters, and using EAs for concurrent

optimisation of both the ANN structure and the weights. The last approach presents the

most advantages in terms of reduced design effort and quality of the solutions.

However, the simultaneous evolution of the whole ANN system is not straightforward

due to the complexity of the learning task, which requires the optimisation of a large

number of mutually related parameters and variables.

This paper presents the application results of the algorithm ANNGaT, an EA for

concurrent structure design and training of ANN systems. ANNGaT is used to

automatically generate the MLP classifier for the wood veneer visual inspection system.

Section 2 introduces the problem domain. Section 3 surveys the literature on ANN

training and structure design algorithms. Section 4 describes the proposed algorithm.

Section 5 presents the experimental results of its application to the wood veneer defect

classification task. Conclusions and indications for further work are given in Section 6.

2 PROBLEM DOMAIN

The goal of this study is to design an MLP classifier that correctly recognises instances

of wood veneer defects. For this purpose, a set of 232 numerical data representing

statistical features extracted from images of plywood defect areas is available. Each

datum corresponds to a 17-dimensional feature vector. There are 13 classes

corresponding to 12 possible defects and clear wood.

The data distribution is unbalanced, with two classes containing as few as 8 examples,

one class containing 16 examples and the remaining classes containing 20 examples.

There are no missing attributes. Table 1 details the class distribution of the data set.

Packianather et al. (2000) applied the Taguchi method (Roy, 2001) to optimise the MLP

architecture and the learning parameters of the BP training rule. The authors suggested

that one hidden layer of 45 neurons is sufficient for the task and reported a classification

accuracy estimated to 84.16% with an interval of confidence of ±1.52%.

Pham and Sagiroglu (2000) tried four different algorithms to train the MLP classifier to

identify the veneer defects. Different ANN topologies were also tested. The best results

were achieved using a manually designed MLP architecture comprising of two hidden

layers, each containing 17 neurons, and training this solution using the BP rule. The

optimised classifier achieved 86.96% recognition accuracy. However, the results were

inconclusive, since many solutions obtained similar recognition accuracies and the

paper doesn’t provide an estimate of the interval of confidence for the measurements.

In both the cases, the data set was randomly partitioned into a training set of examples

containing 80% of the instances and a test set containing the remaining 20%.

Packianather et al. (2000) trained the solutions until the performance stopped

improving. The final estimate for the classification accuracy refers to the average of 9

learning trials on 3 different random partitions of the data set. Pham and Sagiroglu

(2000) trained the solution for an experimentally optimised number of iterations. The

conclusions of the two studies differ for the structure of the ANN classifier, while the

performances of the two solutions are roughly in agreement and the differences are

likely to be due to statistical fluctuations.

Given the current disagreement on the topology of the MLP classifier, a more

systematic search is required to determine the optimal ANN structure. For this task, a

machine learning approach allows a more exhaustive exploration of the space of the

possible MLP configurations. Furthermore, the automatic design of the ANN

architecture makes the system more easily re-configurable, since it removes the need of

time-consuming manual generation and testing of the candidate solutions.

The next section reviews the application of EAs for automatic design and training of

ANN structures.

3 EVOLUTIONARY GENERATION OF ANN SYSTEMS

The implementation of ANN systems requires the solution of two complex optimisation

tasks, that is, the design of the ANN architecture and the training of the frequently large

set of parameters.

The two tasks are closely related. On the one hand, since the worth of a candidate ANN

structure can only be assessed on the trained solution, the accuracy and the reliability of

the training procedure affects the outcome of the design process. On the other hand, the

choice of architecture has a considerable impact on the ANN processing power and

learning capabilites. Too small a topology may not possess enough computational

power to fully learn the desired input-output relationship, whereas a topology that is too

large may result in the ANN response modelling the training data too closely. The latter

case usually produces a solution with poor generalisation capabilities (Branke, 1995).

Many algorithms for ANN design and training use gradient-based search techniques,

such as constructive and destructive algorithms (Rychetsky et al., 1998; Parekh et al.,

2000) for structure optimisation and the BP rule and conjugate gradients (Johansson et

al., 1991) for weight training. Unfortunately, local gradient-based search methods can

easily get trapped by local maxima or flat areas of the optimisation surface and time

consuming experimentation is required before a satisfactory solution is found. The

remaining of this section reviews the application of EAs, a popular class of global

search algorithms, to the automatic design and training of ANNs.

3.1 Evolutionary algorithms

EAs are stochastic search algorithms that aim to find an acceptable solution when time

or computational requirements make it impractical to find the best one. EAs are best

suited for search spaces that are multimodal, and include flat regions and points of

discontinuity where gradient-based methods would easily get stuck. EAs always search

for a global solution, while gradient-based algorithms can only find the optimum which

is at the end of the slope from the initial position. Being stochastic global optimisation

procedures, EAs are also robust to noisy fitness evaluations.

EAs are modelled on Darwin’s theory of natural evolution, where a species improves its

adaptation to the environment by means of a selection mechanism that encourages

individuals of higher fitness to reproduce more often than those of lower fitness. The

individuals improve until a stopping criterion is met. At the end of the process, the best

exemplar is chosen as the solution to the problem.

In GAs, the adaptation of an individual to the environment is defined by its ability to

perform the required task. A problem-specific fitness function is used for assessing the

quality of candidate solutions. The population is driven towards the optimal point(s) of

the search space by means of stochastic search operators inspired by the biological

mechanisms of selection, mutation and recombination.

Following biological terminology, in GAs each data cluster defining a solution is called

a chromosome, and each basic component of a chromosome is called a gene.

EAs originated in the mid-sixties with the creation of Evolution Strategies (Rechenberg,

1965) and Evolutionary Programming (EP) (Fogel, L. J. et al., 1966). Ten years later the

creation of Genetic Algorithms (GAs) by (Holland, 1975) made EAs popular. Evolution

Strategies, EP and GAs represent different metaphors of biological evolution with

different representations of the candidate solutions and different genetic operators.

However, recent research developments in each field and the mutual exchange of ideas

blurred the boundaries between the three main branches of EAs.

3.2 Evolutionary ANN training

The first applications of EAs to the training of ANNs date back to the end of the 80s

with the work of Montana and Davis (1989) and Whitley and Hanson (1989) in the field

of GAs and L. J. Fogel and his co-workers (Fogel, D. B. et al., 1990; Saravanan and

Fogel, D. B., 1994) in the area of EP. The common approach is to encode the

connection weights into genes that are then concatenated to build the genotype. Much

debated is the representation of the solutions. The popular GA practice of binary coding

(Whitley and Hanson, 1989; Srinivas and Patnaik, 1991; Haussler et al., 1995; Seiffert,

2001) gives rise to long bit-strings for any non-trivial ANN architecture, leading to the

dual problem of a large search space and increased disruptiveness of the crossover

operator. Moreover, the larger the strings are, the longer the processing time is.

For the above reasons, standard GAs are often modified to allow more compact and

efficient encodings (Montana and Davis, 1989; Menczer and Parisi, 1992) and they are

hybridised with other search algorithms (e.g., the BP rule) to speed up the learning

process (Montana and Davis, 1989; Skinner and Broughton, 1995; Yan et al., 1997).

Much debated is also the use of the crossover operator since there is no consensus on

which are the functional units to swap. Indeed, the distributed nature of the knowledge

base in connectionist systems favours the argument against point-to-point exchanges of

genetic material amongst solutions. Relevant to the efficiency of the crossover operator

is also the competing convention problem (Thierens et al., 1993), namely the many-to-

one mapping from the representation of the solutions (the genotype) to the actual ANN

(the phenotype). This problem leads to high disruption of the solutions' behaviour after

genetic recombinations. A way to prevent competing conventions is to match pairs of

neurons of mated solutions according to their similarity prior to the crossover operation

(Thierens et al., 1993). Alternatively, sub-populations (species) of neurons are evolved,

each species corresponding to a position on a pre-defined ANN structure (Gomez and

Miikkulainen, 2003). Unfortunately, these approaches don't scale well to large ANNs.

Because of its real-valued encoding and the lack of a crossover operator, EP is often

regarded as a better approach to ANN optimisation. Several successful implementations

are reported in the literature, mainly using Gaussian (Fogel, D. B. et al., 1990;

Saravanan and Fogel, D. B., 1994; Angeline and Fogel, D. B., 1997; Darwen, 2000;

Fogel and Chellapilla, 2005) or Cauchy (Yao and Liu, 1997a) mutation as the main

search operator. For further insights on the evolutionary training of ANNs the interested

reader can find broad surveys on the topic in (Branke, 1995; Whitley, 1995; Yao, 1999).

3.3 Evolution of the ANN structure

Several studies report applications of EAs to the design of ANN architectures coupled

to customary weight training algorithms, a typical example being the evolution of MLP

topologies with BP training of the ANN parameters (Miller et al., 1989; Stepniewski

and Keane, 1996; Brown and Card, 1999). Fitness evaluation is generally expressed as a

multi-optimisation criterion that takes into account different requirements such as ANN

performance, size, learning speed etc.

Two main approaches for encoding the candidate solutions have emerged, namely direct

encoding and indirect encoding (Yao, 1999). Direct encoding specifies every ANN

connection and node usually representing individuals by means of connection matrices.

The architecture of the final solution is therefore fully determined by the evolution

process. Following this approach, chromosomes are easy to decode but the algorithm

does not scale well to large ANN structures.

Indirect encoding specifies only a compact representation of the structure of the

solutions, generally through parameters describing the network size and connectivity

(Harp et al., 1990; Castillo et al., 2000) or via ANN developmental rules (Kitano, 1990;

Schiffmann, 2000; Jung, 2005). While indirect encoding seems more biologically

plausible and does not suffer the problem of the competing conventions, the action of

the genetic operators on the actual phenotype becomes less clear and the decoding of the

chromosomes more difficult. Moreover, small changes in the genotype produce large

changes in the phenotype creating a rugged and more difficult search surface.

The use of EAs to design ANNs that are then trained using some parameter learning

algorithm allows compact and effective structures to be built. However, imprecision in

the evaluation of the candidate solutions must be taken into account due to possible sub-

optimal convergence of the weight training procedure. Furthermore, the training of the

ANN weights may be excessively slow for adequate exploration of the search space. For

the above reasons, it is preferable to simultaneously optimise both the ANN architecture

and the parameters. This can be done either by alternating steps of evolutionary

structure optimisation with steps of standard (e.g. BP-driven) training of the parameters

(Cangelosi and Elman, 1995) or by evolving at the same time both the connectivity and

the weights (Srinivas and Patnaik, 1991; Angeline et al., 1994; Yao and Liu, 1997b;

Hüsken and Igel, 2002).

In the first case, the standard learning technique behaves similarly to an additional

problem-specific mutation operator. The genetic propagation of learnt knowledge

introduces an element of “Lamarckism” (Aboitiz, 1992) into the search, that is, the

permanent storing in the genotype of acquired behaviours resulting from learning by the

phenotype. In the second case, a set of mutation operators is needed for modification of

the ANN structure and weights. Standard ANN weight training algorithms (e.g., BP for

MLPs) are often used to speed up the search through Lamarckian learning (Yao 1999).

For the reasons discussed in Section 3.2, the use of crossover is not customary. Due to

the difficulty of encoding the connection weights, the use of indirect encoding becomes

problematic once the whole ANN system is evolved.

The next section presents an EA for the simultaneous design and training of the wood

veneer defect MLP classifier.

4 THE ALGORITHM

The Artificial Neural Network Generation and Training (ANNGaT) algorithm is

designed for concurrent optimisation of the structure and the connection weights for

ANN systems. The population is evolved through a mix of random genetic mutations

and Lamarckian gradient based learning. Since it is more suitable for transmitting the

setting of the connection weights, the direct encoding approach is used for representing

the candidate solutions. This section presents the implementation of the algorithm to the

evolution of MLP classifiers for any pre-defined number of layers.

4.1 General overview

The EA architecture is shown in fig. 2. The algorithm comprises two components,

namely, a structure design module and an ANN training module, that act concurrently

on the same pool of individuals. The system is designed with the purpose of obtaining

maximum modularity between the two learning tasks.

The co-occurrence of the two modules is expected to be beneficial for the effectiveness

and the speed of the evolution procedure. That is, the presence of similarly performing

structural mutations of an individual is likely to favour population diversity. Moreover,

the EA fitness function calculates the accuracy of a solution as the difference between

the ANN output and the desired output. Manipulation of the topology modifies the

ANN output and hence the error surface, thus helping the weight training algorithm to

escape local peaks or flat areas of fitness. Finally, parallel distributed processing

systems such as ANNs possess well-known fault tolerance to addition or removal of

processing units. This capability minimises the number of fatal structural mutations,

since moderate changes of the ANN architecture are not likely to cause major disruption

to the progress of the learning procedure.

The genotype of each solution is characterised by a real-valued variable-length string

that encodes the setting of the connection weights. Each generation the fitness of the

population is assessed, then a cycle of the structure design module and a cycle of the

ANN training module are executed. Evolution is achieved via random genetic mutations

affecting the ANN architecture and the weights. Fitness ranking (Fogel, D. B., 2000) is

used to select the pool of reproducing individuals. The BP rule is included into the ANN

training module as a problem-specific operator. Experimental tests carried out during

the algorithm optimisation phase showed that the use of the BP rule enhances the speed

and the accuracy of the weight training procedure.

Because of the real-valued encoding of the solutions and the lack of genetic crossover

the ANNGaT algorithm is conceptually akin to EP. This paradigm allows the candidate

solutions to be represented in a more compact format and avoids the many problems

stemming from use of crossover.

As a result of the action of the two modules, a new population is produced through

genetic mutation and BP training of the existing individuals. New solutions replace

current ones via generational replacement (Fogel, D. B., 2000). The procedure is

repeated until a pre-defined number of iterations has elapsed and the fittest solution of

the last generation is picked.

4.2 ANN structure design module

The proposed algorithm evolves the size (i.e. number of nodes) of the hidden layer(s) of

ANN classifier systems. The number of hidden layers is at present fixed a priori and

each layer is fully connected to the neighbouring ones.

Two genetic mutation operators of node addition and node deletion respectively add one

neuron to and delete one neuron from the ANN structure. When a new node is added, its

connection weights are initialised to small random values in order to avoid major

disruption of the ANN behaviour. Each weight of a new node in initialised to a random

value sampled with uniform probability from the interval [-0.2, 0.2]. To bias the search

towards compact ANN structures, node deletion is given a slightly higher probability

than node addition. In the case node deletion is chosen, the algorithm picks a node from

a randomly selected hidden layer. Different heuristic criteria were tested for selecting

the unit for removal such as taking a randomly selected node, taking the node of lowest

average firing strength, taking the node of lowest maximum firing strength, taking the

node yielding the highest average error over a set of training patterns and taking the

node having the weakest incoming connections. Experimental tests indicate that the

latter choice produces the best learning results.

Despite the complexity of assessing the contribution of a processing unit in a parallel

distributed architecture, further work in this direction could produce less disruptive

node deletion operators and consequently more balanced structure mutation procedures.

At present, there seems to be an evolutionary bias toward additive mutation operations,

as the addition of a new unit with small connection weights appears to be less disruptive

than the deletion of an existing node. This bias may favour the creation non-minimal

ANN structures. Larger structures may be also be advantaged during the evolutionary

process since they can better point-to-point map the training set. Unfortunately, large

ANNs that closely fit the training set have poor generalisation capabilities. The

evolutionary bias toward larger structures is partly balanced by the superior learning

speed of smaller ANNs. A more strong bias towards minimal ANN representations is

produced by the fitness evaluation procedure (see Section 4.4).

4.3 ANN training module

This module evolves the ANN weights in order to minimise classification error.

Evolution is achieved via two genetic operators, namely mutation and the BP algorithm.

Genetic mutations slightly modify the weights of each node of a solution. For each

weight, the perturbation is randomly sampled with uniform probability from an interval

of pre-defined width.

The BP rule is introduced as a deterministic mutation operator with the purpose of

speeding up the learning process. Individuals are randomly picked from the pool of

offsprings for BP learning. Selected solutions undergo one cycle of BP learning over the

whole training set. Since BP learning is computationally expensive, the operator is used

with a moderate rate of occurrence. Other weight training procedures may be used as an

alternative to the BP method. The deterministic weight training operator is the only part

where the ANNGaT algorithm is specific to the ANN paradigm. If other ANN models

are to be trained, the BP rule can be substituted by other parameter learning procedures.

The ANN training module can be run independently of the structure optimisation

procedure as an algorithm on its own and it can be used as an alternative to the standard

ANN training techniques. It will be henceforth referred as the ANN Training (ANNT)

algorithm.

4.4 Fitness evaluation function

The fitness of the candidate solutions is evaluated on their capability of accurately

classifying the training set of examples. To encourage the creation of compact and high

performing solutions, whenever the fitness score of two individuals is equal, preference

is given to the solution having the smallest structure. ANN optimisation therefore

follows a hierarchical criterion where accuracy has priority over compactness.

In general, there are cases where the difference in accuracy between some of the

solutions is very small (i.e., a few training examples) in comparison to the spread of the

population. In such cases, it is more efficient to consider those solutions to be equally

performing and give preference to the ones having the most economical structure.

The proposed algorithm considers the accuracy of two individuals to be equal when the

difference is less than one standard deviation of the average population accuracy. That

is, the population is divided into a number of bins of width equal to

−

−⋅=popsize

worstbest

duration

gendvastdwidth ,1_max (1)

where width is the width of the bin, std_dva is the standard deviation of the fitness of

the population, gen is the current evolutionary cycle, duration is the duration of the

learning procedure, best and worst are the classification accuracies of respectively the

best and the worst individual and popsize is the population size.

The first bin is centred around the best performing solution while the centres of the

remaining bins are calculated according to the following formula:

widthibestcentrei ⋅−= (2)

where centrei is the centre of the ith bin and i is an integer number (i=1,…,n) that is

progressively increased until all the population is grouped.

The proposed procedure aims at cutting part of the noise that affects the evaluation of

the candidate solutions. As the algorithm proceeds, the width of the bins is

progressively shrunk to shift the emphasis on finer differences of accuracy. For each

evaluation of the EA population, equation (1) limits the number of bins to a value that is

no greater than the population size.

Solutions are awarded the following pair of measures as fitness score:

{ }jj sizeinfitness ,−= (3)

where fitnessj is the fitness score of the jth member of the population, i is the bin where

the jth solution lies, n is the total number of bins and sizej expresses the size of the MLP

architecture as the total number of connection weights. Since the ANN is fully

connected and the input and the output layers are fixed, sizej is determined by the size of

the hidden layer(s).

The first fitness measure is proportionally related to the classification accuracy. That is,

the best performing solution (grouped into the first bin) has an accuracy score equal to

n-1. All the solutions within half bin width from the accuracy of the best individual

obtain the same score. The solutions grouped into the second bin obtain an accuracy

score equal to n-2, and so forth until the the last bin where solutions achieve an

accuracy score equal to 0. Solutions having the same accuracy score (i.e., belonging to

the same bin) are further ranked according to the measure of their size by the fitness

ranking procedure.

5 EXPERIMENTAL SETTINGS AND RESULTS

This section presents the experimental settings and the results of the application of the

ANNGaT algorithm to the generation of the MLP classifier for the wood veneer defect

recognition task discussed in Section 2.

5.1 Experimental set up

This section presents the results of five experiments. Namely, two tests concerning the

BP training of manually optimised ANN structures, one test concerning the training of a

manually optimised structure using the ANNT algorithm, and two tests concerning the

full ANNGaT algorithm. The result reported for each of the experiments corresponds to

the average of 20 independent learning trials.

To simplify the training of the individuals, input data are normalised according to the

Mean-Variance procedure. A data balancing procedure is used. For each learning trial,

the size of the classes in the training set is made even by duplicating randomly picked

members of the smaller categories.

For each learning trial, the data set is randomly split into a training set including 80% of

the examples and a test set including the other 20%. The classifier is trained on the

former and the final learning result is evaluated on the latter. To reduce the danger of

overfitting, the order of presentation of the training samples is randomly reshuffled for

every learning cycle of the algorithm under evaluation.

The first two experiments replicate the learning trials of Packianather et al. (2000) and

Pham and Sagiroglu (2000) and train the classifier using the BP rule with momentum

term. The first test is carried out using the ANN topology that Packianather et al. (2000)

optimised through the Taguchi method. This configuration is characterised by a hidden

layer of 45 processing units. Each neuron of the hidden layer receives 17+1 incoming

connections from the 17 input neurons and the bias neuron. Each neuron of the output

layer receives 45+1 incoming connections from the 45 hidden neurons and the bias

neuron. The MLP architecture is therefore composed of a total of 45x18+13x46=1408

connection weights. The second test uses the ANN configuration that Pham and

Sagiroglu (2000) have experimentally determined. This configuration is characterised

by two hidden layers of 17 processing units each. Likewise to the previous case, the

total connectivity of the MLP architecture can be calculated to amount to

17x18+17x18+13x18=846 connection weights. The duration of the BP learning

procedure is experimentally set to a fixed number of iterations. The purpose of the first

two experiments is to provide a baseline for the comparison of the results.

The third experiment uses the ANNT algorithm to train a manually designed MLP

classifier. The duration of the ANNT procedure and the size of the ANN architecture is

experimentally set to maximise the learning accuracy. The best performing solution

consists of one hidden layer of 35 processing units. The duration of the ANNT

algorithm is experimentally fixed. The purpose of this test is to assess the performance

of the ANNT procedure, which is the evolutionary MLP training module of the

ANNGaT algorithm.

Finally, the last two experiments apply the full ANNGaT algorithm to generate and

train the wood veneer defect classifier. In the first test, the EA is used to design and

train a three-layer (one hidden layer) MLP classifier. In the second test, the EA is used

to design and train a four-layer (two hidden layer) MLP classifier. In both the cases, the

size of the hidden layer(s) of the starting population is randomly initialised within the

interval of integer numbers [15, 25].

The optimisation of the learning algorithms is carried out according to experimental

trial and error. Table 2 shows the main MLP settings and the learning parameters used

in the five tests.

5.2 Experimental results

The results of the five experiments are reported in Table 3 and Table 4. Table 3 details

the design method used for the MLP classifier, the mean and the standard deviation of

the classification accuracy over the 20 learning trials, and the number of learning cycles.

Table 3 reports also the upper and lower bounds of the 95% confidence interval for the

accuracy results produced by the automatic and manual design methods. All accuracy

results refer to the percentage of successfully classified examples of the test set. The

number of learning cycles refers to the manually optimised fixed duration of the

algorithm. Table 4 shows the average, maximum and minimum size of the hidden

layer(s) evolved by ANNGaT.

The first two experiments substantially confirm the results reported in the literature.

That is, the average classification accuracy obtained by the three-layered MLP is within

one standard deviation from the 84.16% classification accuracy reported for the same

architecture by Packianather et al. (2000). The average classification accuracy obtained

by the four-layered MLP is within two standard deviations from the 86.96%

classification accuracy reported for the same architecture by Pham and Sagiroglu

(2000). The spread of the classification accuracy for the MLP configuration having 45

hidden neurons is higher than the 1.52% estimate of Packianather et al. (2000).

However, the latter estimate is calculated by running 9 learning trials equally distributed

on 3 randomly initialised training and test set partitions, while the figure in table 3 is

computed from 20 learning trials, each of them on a different randomly initialised data

set partition. The larger variability of the initial conditions may therefore explain the

increased spread of the data distibution. Overall, the two MLP configurations obtain

comparable learning results in terms of accuracy and robustness.

The learning trials involving the ANNT algorithm (third experiment) do not bring

further improvements of the learning results. The average classification accuracy of the

solutions evolved by the ANNT procedure is within one standard deviation from the

average accuracy of the solutions trained using the BP rule. The spread of the

classification accuracy of the solutions trained using the ANNT algorithm is also

comparable to the spread of the BP trained solutions. This result may indicate that the

search capability of the BP algorithm is already adequate for the problem domain.

Nevertheless, the test proves the capability of the proposed procedure of successfully

training high performing MLP solutions and provides a baseline for the evaluation of

the ANNGaT algorithm.

Finally, the learning trials involving the full ANNGaT algorithm generate solutions that

attain classification results comparable within one standard deviation to the results

obtained using the BP or the ANNT algorithm. However, the average size of the

evolutionary generated MLP structures is considerably smaller than the size of the

manually generated counterparts.

The average size of the hidden layer of the evolved three-layer solutions is three times

smaller than the size of the hidden layer of the solution suggested by Packianather et al.

(2000). The hidden layer of the evolutionary three-layer MLP is also less than half the

size of the solution that was manually optimised by the authors.

The four-layer evolutionary ANN is more compact than the four-layer solution

suggested by Pham and Sagiroglu (2000). On average, the two hidden layers generated

by the ANNGaT algorithm are two-thirds the size of the manually optimised

counterparts. This reduction in size entails a drastic reduction in the number of

connection weights. In fact, the automatically generated neural networks contain a

number of weights that is on average half the number of weights contained by the

manually optimised structures.

The introduction of the second hidden layer brings no appreciable improvements to the

learning accuracy of the evolved solutions. This result confirms the conclusion of

Packianather et al. (2000) that a three-layer ANN configuration is fully adequate for the

task. Given the size of the hidden layers evolved by the ANNGaT algorithm for the two

configurations, a three-layer ANN structure represents the most compact choice.

The size of the minimal solutions generated during the 20 learning trials emphasises the

capability of the proposed algorithm of generating extremely compact structures. The

the maximal solutions generated during the 20 learning trials are in both cases (3-

layered and 4-layered MLP) smaller than the manually generated classifiers. The use of

smaller MLP architectures gives advantages in terms of faster processing times, and

cheaper implementation on hardware boards.

On average, the time duration of one run of the ANNGaT algorithm amounts to 71

minutes circa for the optimisation of the three-layer configuration and 103 minutes circa

for the optimisation of the four-layer configuration. BP training of the manually

optimised solutions takes about 10 minutes. The latter figure however doesn’t take into

account the time needed for the lengthy manual trial and error design of the solutions.

The manual optimisation of the MLP took the authors about one day of work. A

PentiumIII 1GHz processor with 512 MB of RAM was used for the tests.

Fig. 3a-b shows the learning curves for the three- and four-layer MLP configurations.

The figure refers to 20 independent learning trials that were run during the parameter

optimisation phase of the algorithm. The plot monitors the evolution of the average

classification accuracy of the EA population and of its fittest individual on the training

set of examples. The evolution of the classification accuracy of the fittest individual on

the test set is also reported. The plot refers to average values over the 20 learning trials

and monitors the learning process over 10000 generations. In both the cases, the curves

show the standard learning pattern of EAs with a brisk initial improvement of the

population fitness followed by slow convergence to the maximum values.

The smooth approximately exponential behaviour of the learning curves is the result of

the collective learning process of a population of individuals. While part of the

population may occasionally get stuck into local optima or flat areas of fitness, other

solutions improve their fitness, and gradually monopolise the whole population pool.

This gradual spread of successful solutions gives rise to the smooth approximately

exponential learning curves of the kind of Fig. 3.

The classification accuracies usually reach their peak well before the last generation.

Due to some overfitting of the training set, the generalisation capability (i.e. test set

accuracy) of the solutions slightly deteriorates towards the end of the evolution span.

Fig. 4a-b illustrates the evolution of the average size of the hidden layer of the three-

and four-layer MLP configurations. The figure refers to the same 20 learning trials

monitored in figs. 3 and shows the structure evolution process over 10000 generations.

Very soon the size of the MLP structures starts decreasing and increasingly more

compact solutions are found. This process continues also after the training accuracy

reaches its peak, driven by the selection procedure that favours compactness when equal

accuracy is attained. In the second part of the evolution period, the population

converges to the final solution and the evolution curves flatten.

6 CONCLUSIONS AND FURTHER WORK

This study expands previous work on the classification of wood veneer defects from

statistical features of wood sub-images using an MLP classifier system. The generation

of the MLP classifier is automated following the introduction of ANNGaT, an

evolutionary procedure for concurrent structure design and training of ANN systems.

The proposed algorithm evolves the size of the ANN hidden layer(s) while training the

connection weights.

Experimental evidence shows the ANNGaT algorithm builds highly compact MLP

structures capable of accurate and robust learning. The time duration of the proposed

algorithm is very reasonable.

Compared to the current approach based on the Taguchi method for the manual

optimisation of the MLP structure and on the BP rule for the training of the connection

weights, the proposed algorithm generates equally performing ANN solutions using

considerably smaller architectures. On average, the size of the hidden layer of the three-

layer MLP structures created by the ANNGaT algorithm is three times smaller than the

size of the empirically optimised three-layer configuration. Compared to an alternative

approach based on trial and error optimisation of the ANN structure, the ANNGaT

algorithm generates four-layer ANN structures that are on average two-thirds the size of

the manually generated counterparts. In both the cases, the proposed algorithm clearly

requires lower design costs since the process is fully automated.

Further investigation on the impact of the structure modification operations on the ANN

behaviour could help to produce less disruptive mutation procedures. Improvement of

the proposed algorithm may also result from submitting the EA learning parameters,

namely the weight mutation stepsize and the BP learning rate to the evolutionary

process. Finally, since different ANN architectures and learning parameters are

characterised by different learning curves, niching techniques could be used to let

differently parametrised sub-populations evolve separately before submitting them to

competition. Such approach would allow more informed evaluations of the candidate

solutions.

REFERENCES

Aboitiz, F. (1992), Mechanisms of adaptive evolution - Darwinism and Lamarckism

restated, Medical Hypotheses, vol. 38, no. 3, pp. 194-202.

Angeline, P. J. a and Fogel, D. B. (1997), An evolutionary program for the

identification of dynamical systems, in Proceedings of SPIE Volume 3077 : Application

and Science of Artificial Neural Networks III, Rogers, S. editor, SPIE The International

Society for Optical Engineering, Bellingham, WA, pp. 409-417.

Angeline, P. J., Sauders, G. M. and Pollack, J. B. (1994), An Evolutionary Algorithm

That Constructs Recurrent Neural Networks, IEEE Transactions Neural Networks, vol.

5, no. 1, pp. 54--65.

Balakrishnan, K. and Honavar, V. (1995), Evolutionary Design of Neural Architectures

- a Preliminary Taxonomy and Guide to Literature, Technical Report CS TR95-01.

Department of Computer Science, Iowa State University, Ames.

Branke, J. (1995), Evolutionary Algorithms for Neural Network Design and Training,

Technical Report no. 322, Institute AIFB, University Karlsruhe.

Branke, J. (1995), Evolutionary Algorithms for Neural Network Design and Training,

Technical Report no. 322, Institute AIFB, University Karlsruhe.

Brown A. D. and Card H. C. (1999), Cooperative-Competitive Algorithms for

Evolutionary Networks Classifying Noisy Digital ImagesNeural Processing Letters, vol.

10, no. 3, pp. 223-229.

Cangelosi, A. and Elman, J. L. (1995), Gene Regulation and Biological Development in

Neural Networks: an Exploratory Model, Technical Report, CRL-UCSD, University of

California San Diego.

Castillo, P. A., Carpio, J., Merelo, J. J., Prieto, A., Rivas, V. and Romero, G. (2000),

Evolving Multilayer Perceptrons. Neural Processing Letters, vol. 12, no. 2.

Darwen, P. J. (2000), Black Magic: Interdependence Prevents Principled Parameter

Setting, Self-Adapting Costs Too Much Computation, In Applied Complexity: From

Neural Nets to Managed Landscapes, pages 227-237

Eiben, A. E. and Smith, J. E. (2003), Introduction to evolutionary computing, Springer,

New York.

Fahlman, S. E. and Lebiere, C. (1990), The Cascade-Correlation Learning Architecture,

in Advances in Neural Information Processing Systems 2, Touretzky, D. S. editor,

Morgan Kaufmann, San Mateo, CA, pp. 524-532.

Fogel, D. B. (2000), Evolutionary Computation: Toward a New Philosophy of Machine

Intelligence, 2nd ed., IEEE Press, New York.

Fogel, D. B. and Chellapilla, K. (2002), Verifying Anaconda’s Expert Rating by

Competing Against Chinook: Experiments in Co-Evolving a Neural Checkers Player,

Neurocomputing, no. 42, pp. 69-86.

Fogel, D. B., Fogel L. J. and Porto, V. W. (1990), Evolutionary programming for

training neural networks, Proceedings International Joint Conference on NNs, S. Diego

CA - June 1990, pp. 601-605.

Fogel, L. J., Owens A. J. and Walsh, M. J. (1966), Artificial intelligence through

simulated evolution, J. Wiley, New York.

Gomez, F. J. and Miikkulainen, R. (2003), Active Guidance for a Finless Rocket

through Neuroevolution, Proceedings 2003 Genetic and Evolutionary Computation

Conference (GECCO), Chicago IL, pp. 2084-2095.

Hancock, P. J. B. (1992), Genetic Algorithms and Permutation Problems: a Comparison

of Recombination Operators for Neural Structure Specification, Combinations of

Genetic Algorithms and Neural Networks, Whitely, D. and Schaffer, J. D. editors IEEE

Computer Society Press

Harp, S. A., Samad, T. and Guha, A. (1990), Designing Application-Specific Neural

Networks Using the Genetic Algorithm, in Advances in Neural Information Processing

Systems 2, D. S. Touretzky, editor, Morgan Kaufmann, San Mateo, CA, pp. 447-454.

Haussler, A., Li, Y., Ng, K. C., Murray-Smith, D. J. and Sharman, K. C. (1995),

Neurocontrollers Designed by a Genetic Algorithm, Proceedings GALESIA First

IEE/IEEE International Conference on GAs in Engineering Systems: Innovations and

Applications, Sheffield UK - 1995, pp. 536-542.

Holland, J. H. (1975), Adaptation in Natural and Artificial Systems, Ann Arbor, MI:

University of Michigan Press.

Huber, H. A., McMillin, C. W. and McKinney, J. P. (1985), Lumber Defect Detection

Abilities of Furniture Rough Mill Employees. Forest Products Journal, vol. 35, no.

11/12, pp 79 - 82.

Hüsken, M. and Igel, C. (2002), Balancing Learning And Evolution, Proceedings

Generic and Evolutionary Computation Conference, (GECCO-2002), San Francisco

CA, USA, pp. 391-398.

Johansson, E. M., Dowla, F. U. and Goodman, D. M. (1991), Backpropagation Learning

for Multilayer Feed-Forward Neural Networks Using the Conjugate Gradient Method,

International Journal Neural Systems, vol. 2, no. 4, pp. 291-301.

Jung, S. Y. (2005), A Topographical Method for the Development of Neural Networks

for Artificial Brain Evolution, Artificial Life, no. 11, pp. 293-316.

Kitano, H. (1990), Designing Neural Networks Using Genetic Algorithms with Graph

Generation System, Complex Systems, vol. 4, no. 4, pp. 461-476.

Lappalainen, T., Alcock, R. J. and Wani, M. A. (1994). Plywood Feature Definition and

Extraction. Report 3.1.2, QUAINT, BRITE/EURAM project 5560, Intelligent Systems

Laboratory, School of Engineering, University of Wales, Cardiff.

LeCun, Y., Denker, J. S. and Solla, S. A. (1990), Optimal Brain Damage, In Advances

in Neural Information Processing Systems 2, Touretzky, D. S. editor, Morgan

Kaufmann, San Mateo, CA, pp. 598-605.

Menczer, F. and Parisi, D. (1992), Evidence of Hyperplanes in the Genetic Learning of

Neural Networks, Biological Cybernetics, vol. 66, pp. 283-289

Miller, G. F., Todd, P. M. and Hegde, S.U. (1989), Designing neural networks using

genetic algorithms, Proceedings 3rd International Conference on GAs and Applications,

Arligton VA - 1989, pp. 379-384.

Montana, D. and Davis, L. (1989), Training feedforward neural networks using genetic

algorithms, Proceedings 11th International Joint Conference on AI, Detroit MI - 1989,

pp. 762-767.

Nikolaev, N. Y. (2003), Learning polynomial feedforward neural networks by genetic

programming and backpropagation, IEEE Transactions on Neural Networks, vol. 14,

no. 2, pp. 337-350

Packianather, M. (1997), Design and Optimisation of Neural Network Classifiers for

Automatic Visual Inspection of Wood Veneer, Ph.D. Thesis, University of Wales,

College of Cardiff (UWCC) – UK

Packianather, M. S., Drake, P. R. and Rowlands, H. (2000), Optimising the Parameters

of Multilayered Feedforward Neural Networks through Taguchi Design of Experiments.

Quality and Reliability Engineering International, vol. 16, pp. 461-473

Parekh, R., Yang, J. H., Honavar, V. (2000), Constructive neural-network learning

algorithms for pattern classification, IEEE Transactions Neural Networks, vol. 11, no. 2,

pp. 436-451

Pham, D. T. and Alcock, R. J. (1996). Automatic Detection of Defects on Birch Wood

Boards. Proceedings of Institution of Mechanical Engineers, part E, Journal of Process

Mechanical Engineering, vol. 210, pp. 45-52.

Pham, D. T. and Alcock, R. J. (1999a), Recent Developments in Automated Visual

Inspection of Wood Boards, Advances in Manufacturing: Decision, Control and

Information Technology, Tzafestas ed., Springer - London, pp. 80-88

Pham, D. T. and Alcock, R. J. (1999b), Plywood Image Segmentation Using Hardware-

Based Image Processing Functions", Proceedings of Institution of Mechanical

Engineers, part B, vol. 213, pp. 431-434.

Pham, D. T. and Alcock, R. J. (1999c), Automated Visual Inspection of Wood Boards:

Selection of Features for Defect Classification by a Neural Network, Proceedings of

Institution of Mechanical Engineers, part E, vol. 213, pp. 231-245.

Pham, D. T. and Liu, X. (1995), Neural Networks for Identification, Prediction and

Control, Springler-Verlag Ltd., London.

Pham, D. T. and Sagiroglu, S. (2000), Neural Network Classification of Defects in

Veneer Boards, Proceedings of Institution of Mechanical Engineers, part B, vol. 214,

pp. 255-258.

Polzleitner, W. and Schwingshakl, G. (1992), Real-Time Surface Grading of Profiled

Wooden Boards, Industrial Metrology, vol. 2, pp. 283 -298.

Rechenberg, I. (1965), Cybernetic Solution Path of an Experimental Problem, Library

Translation no. 1122, Ministry of Aviation, Royal Aircraft Establishment, Farnborough,

Hants UK.

Reed, R. 1993, Pruning algorithms - A survey, IEEE Transactions Neural Networks,

vol. 4, pp. 740-747.

Roy, R. K. (2001), Design of Experiments Using The Taguchi Approach : 16 Steps to

Product and Process Improvement, John Wiley and Sons Ltd, New York.

Rumelhart, D. E. and McClelland, J. L. (1986), Parallel distributed processing:

exploration in the micro-structure of cognition, vol. 1-2, Cambridge, MIT Press.

Rychetsky, M., Ortmann, S. and Glesner, M. (1998), Correlation and Regression Based

Neuron Pruning Strategies, In Fuzzy-Neuro-Systems '98, 5th Int Workshop, Munich D.

Saravanan, N. and Fogel, D. B. (1994), Evolving Neurocontrollers using evolutionary

programming, Proceedings First IEEE Conference on Evolutionary Computation

(ICEC), Orlando FL - 1994, pp. 217-222.

Schiffmann, W. (2000), Encoding feedforward networks for topology optimization by

simulated evolution, Proceedings 4th International Conference on Knowledge-Based

Intelligent Engineering Systems & Allied Technologies (KES '2000), vol. 1, pp. 361-

364,

Seiffert, U. (2001), Multiple Layer Perceptron Training Using Genetic Algorithms,

Proceedings 9th European Symposium on Artificial Neural Networks (ESANN 2001),

Bruges B, pp. 159-164.

Skinner, A. and Broughton, J. Q. (1995), Neural Networks in Computational Materials

Science: Training Algorithms, Modelling and Simulation in Materials Science and

Engineering, vol. 3, pp. 371-390.

Smieja, F. J. (1993), Neural Network Constructive Algorithms: Trading Generalization

for Learning Efficiency?, Circuits, Systems and Signal Processing, vol. 12, no. 2, pp.

331-374

Srinivas, M. and Patnaik, L. M. (1991), Learning Neural Network Weights Using

Genetic Algorithms - Improving Performance by Search-Space Reduction, in

Proceedings of 1991 IEEE International Joint Conference on Neural Networks

IJCNN'91, Singapore, vol. 3, IEEE Press, New York, NY, pp. 2331--2336.

Stepniewski, S. W. and Keane, A. J. (1996), Topology design of feedforward neural

networks by genetic algorithms, in Proceedings of the 4th Int Conference on Parallel

Problem Solving from Nature (PPSN IV), pp. 771-780.

Thierens, D., Suykens, J., Vanderwalle, J., and De Moor, B. (1993), Genetic Weight

Optimisation of a Feedforward Neural Network Controller, in Artificial Neural

Networks and Genetic Algorithms, Albrecht, R.F., Reeves, C.R., and Steele, N.C.

editors, Springler-Verlag Wien, pp. 658-663.

Whitley, D. (1995), Genetic Algorithms and Neural Networks, Genetic Algorithms in

Engineering and Computer Science, Winter, G., Periaux, J., Galan M. and Cuesta, P.

editors, John Wiley, pp. 203-216.

Whitley, D. and Hanson, T. (1989), Optimising neural networks using faster, more

accurate genetic search, Proceedings 3rd International Conference on GAs and

Applications, Arligton VA - 1989, pp. 391-396.

Yao, X. (1999), Evolving Artificial Neural Networks, Proceedings IEEE, vol. 87, no. 9,

pp. 1423-1447.

Yao, X. and Liu, Y. (1997a), Fast evolution strategies, Proceedings of the 6th Annual

Conference on Evolutionary Programming (EP97), Lecture Notes in Computer Science,

vol. 1213, Springer-Verlag, Berlin, pp. 151-161.

Yao, X. and Liu, Y. (1997b), A New Evolutionary System for Evolving Artificial

Neural Networks, IEEE Transactions on Neural Networks, vol. 8, no. 3, pp. 694-713.

Yan, W., Zhu, Z. and Hu, R. (1997), Hybrid Genetic/BP Algorithm and its Application

for Radar Target Classification, Proceedings 1997 IEEE National Aerospace and

Electronics Conference, NAECON part 2, pp. 981-984.

BIOGRAPHY OF AUTHORS Doctor Marco Castellani obtained his Ph.D. degree in 2000 from University of Wales, Cardiff with a thesis on intelligent control of manufacturing of fibre optic components. Between 2001 and 2002, he worked for a private company in Germany on machine learning applications to natural language processing. Between 2002 and 2005, he was at the New University of Lisbon, where his research work included machine learning, machine vision, remote sensing, pattern recognition and time series prediction. Professor Hefin Rowlands is Director of Research & Enterprise at the University of Wales, Newport with responsibilities for developing the research culture and environment for staff and postgraduate students across the University. His doctorate thesis was on optimum design using Taguchi method with neural networks and genetic algorithms. He was awarded a University of Wales Personal Chair in 2002 and his current research interests are concerned with investigating the benefits companies achieve from deploying business improvement techniques such as six sigma. LIST OF FIGURES Fig. 1: The AVI System. Fig. 2: ANNGaT Architecture. Fig. 3: ANNGaT Evolution Curves – Classification accuracy Fig. 4: ANNGaT Evolution Curves – ANN structure. LIST OF TABLES Table 1: Class Distribution of Wood Veneer Data Set. Table 2: Parameter Setting of Multi-Layer Perceptron and Learning Algorithms. Table 3: Experimental Results.

Fig. 1: The AVI System.

Image segmentation

Image acquisition

Feature extraction

Classifier

CCD camera

veneer

.......weights encoding

Genotype

Phenotype

random node addition/deletion

random weights mutation

Lamarckism

yes

Solution

population

population

reproduction pool

new population

no

new

pop

ulat

ion

Random initialisation

Fitness evaluation

Selection

ANN design module

ANN training module

stop?

BP learning

Fig. 2: ANNGaT Architecture.

Three-Layer configuration - Classification Accuracy

0

10

20

30

40

50

60

70

80

90

100

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

Generations

Accuracy

Population Average - Training SetBest Solution - Training SetBest Solution - Test Set

a) Three-layer configuration

Four-Layer configuration - Classification Accuracy

0

10

20

30

40

50

60

70

80

90

100

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

Accuracy

Generations

Population Average - Training Set

Best Solution - Training Set

Best Solution - Test Set

b) Four-layer configuration Fig. 3: ANNGaT Evolution Curves – Classification accuracy

Three-Layer configuration - Structure Optimisation

0

2

4

6

8

10

12

14

16

18

20

22

24

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

Size of Layer

Generations

Best Solution

a) Three-layer configuration

Four-Layer configuration - Structure Optimisation

0

2

4

6

8

10

12

14

16

18

20

22

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

Size of Layer

Generations

Best Solution - First Layer

Best Solution - Second Layer

b) Four-layer configuration Fig. 4: ANNGaT Evolution Curves – ANN structure.

class number of examples 1 bark 20

2 clear wood 20

3 coloured streaks 20

4 curly grain 16

5 discoloration 20

6 holes 8

7 pin knots 20

8 rotten knots 20

9 roughness 20

10 sound knots 20

11 splits 20

12 streaks 20

13 worm holes 8

total examples 232

Table 1: Class Distribution of Wood Veneer Data Set.

Multi-Layer Perceptron Settings

Input nodes 17

Output nodes 13

Hidden layers *

MLP hidden nodes *

Activation function of hidden layer nodes Hyper-tangent

Activation function of output layer nodes Sigmoidal

Evolutionary Algorithm Settings BP ANNT ANNGaT

Trials 20 20 20

Learning Cycles * * *

Population size - 100 100

MLP structure mutation rate (node addition) - - 0.0225

MLP structure mutation rate (node deletion) - - 0.0275

MLP weights mutation rate - 0.25 0.25

Amplitude MLP weights mut. - 0.2 0.2

BP Lamarckian operator rate - 0.6 0.6

BP Learning coefficient 0.01 0.01 0.01

BP Momentum term 0.1 - -

Initialisation range for MLP weights [-0.05, 0.05] [-0.05, 0.05] [-0.05, 0.05]

Initialisation range for MLP hidden nodes - - [15, 25]

* depending from test

Table 2: Parameter Setting of Multi-Layer Perceptron and Learning Algorithms.

BP ANNT ANNGaT

MLP design method

Manual (P&S)

Taguchi (P&O)

manual (authors)

evolutionary (3 layers)

evolutionary (4 layers)

Accuracy 78.09 82.02 82.34 80.00 80.64

Standard Deviation

5.17 4.86 4.53 3.41 4.97

Upper Bound* 80.50 84.29 84.46 81.60 82.97

Lower Bound* 75.67 79.75 80.22 78.40 78.31

Learning cycles 4000 1500 2000 7000 2000

Manual (P&S) see Pham and Sagiroglu, 2000 Taguchi (P&O) see Packianather et al., 2000 * 95% confidence interval

Table 3: Experimental Results – Classification Accuracy.

ANNGaT Manual (P&S)

Taguchi (P&O)

Manual (Authors)

ANNGaT - 3 layers ANNGaT -4 layers

Average Max Min Average Max Min

hidden layer 1

17 45 35 14.15 17 8 12.31 14 9

hidden layer 2

17 - - - - - 12.25 16 9

Manual (P&S) see Pham and Sagiroglu, 2000 Taguchi (P&O) see Packianather et al., 2000

Table 4: Experimental Results – Structure Optimisation.

Evolutionary Artificial Neural Network Design and Training for wood veneer classification

Documents