MODULAR NEURAL NETWORKS: A SURVEYkovan.ceng.metu.edu.tr/...Neural_Networks_A_Survey.pdf · Modular Neural Networks (MNNs) is a rapidly growing eld in arti cial Neural Networks (NNs)

International Journal of Neural Systems, Vol. 9, No. 2 (April, 1999) 129–151c©World Scientific Publishing Company

MODULAR NEURAL NETWORKS: A SURVEY

GASSER AUDA∗ and MOHAMED KAMEL†

Pattern Analysis And Machine Intelligence Lab, Systems Design Engineering Department,University of Waterloo, ON, N2L 3G1 Canada

Received 10 December 1998Revised 2 February 1999Accepted 28 May 1999

Modular Neural Networks (MNNs) is a rapidly growing field in artificial Neural Networks (NNs) research.This paper surveys the different motivations for creating MNNs: biological, psychological, hardware, andcomputational. Then, the general stages of MNN design are outlined and surveyed as well, viz., taskdecomposition techniques, learning schemes and multi-module decision-making strategies. Advantagesand disadvantages of the surveyed methods are pointed out, and an assessment with respect to practicalpotential is provided. Finally, some general recommendations for future designs are presented.

1. Introduction

A Modular Neural Network (MNN) is a Neural Net-

work (NN) that consists of several modules, each

module carrying out one sub-task of the NN’s global

task, and all modules functionally integrated. A

module can be a sub-structure or a learning sub-

procedure of the whole network. The network’s

global task can be any neural network application,

e.g., mapping, function approximation, clustering or

associative memory application.

MNN is a rapidly growing field in NNs research.

Researchers from several backgrounds and objectives

are contributing to its growth. For example, moti-

vated by the “non-neuromorphic” nature of the cur-

rent artificial NN generation, some researchers with

a biology-background are suggesting modular struc-

tures. Their goal is either to model the biological

NN itself, i.e., a reverse engineering study, or to try

to build artificial NNs which achieve the high ca-

pabilities of the biological system. Motivated by

the psychology of learning in the human system,

some other researchers modularize the NN’s learn-

ing in an attempt to achieve clearer representation of

information and less amount of internal interference.

Another group of researchers develop modular NNs

to fulfill the constraints put by the current hardware-

implementation technology. Nevertheless, most of

the work in the MNN field aims to enhance the com-

putational capabilities of the nonmodular alterna-

tives, e.g., enhancing the networks’ generalization,

scalability, representation, and learning speed.

Figure 1 shows the growth of the MNN field in

a profile very much similar to the growth in the NN

field. Notice that 44% of the MNNs research is done

in the last two years. This illustrates the recent high

interest in the field.

Biologists have studied modularization of the

natural brain long time ago (e.g., Ref. 1). How-

ever, the first two attempts to build artificial mod-

ular neural networks, we are aware of, were in

November 1987. E. Micheli-Tzanakou2 outlined,

briefly, a model he built of the vertebrate retina us-

ing an artificial MNN. He designed a collection of

modules connected in series and parallel and used

them to study the effects of lateral connectivity. In

the same month, the Third Conference of Artificial

∗E-mail: [email protected]†To whom requests are to be sent, E-mail: [email protected], Fax (519) 746-4791.

129

130 G. Auda & M. Kamel

0

200

400

600

800

1000

1200

1400

1600

89 90 91 92 93 94 95 96

Num

ber

of P

aper

s

Year

Growth in MNN research

ChronologicalCummulative

0

5000

10000

15000

20000

25000

30000

35000

40000

45000

50000

89 90 91 92 93 94 95 96

Num

ber

of P

aper

s

Year

Growth in NN research

ChronologicalCummulative

Fig. 1. Growth in the NN and MNN fields, respectively, according to information based on the INSPEC (EngineeringIndex) database.

Intelligence for Space Applications, Huntsville, Al,

USA, published an abstract written by E. Fiesler

and A. Choudry,3 in which they suggested NNs as

“a possible architecture” for building multi-modular

space systems.

In 1988, the International Neural Network So-

ciety (INNS) first annual meeting witnessed three

new MNN designs. A modular design was proposed

for studying information processing in the nervous

system.4 The other two designs were application-

oriented. They were built for sorting5 and pattern-

recognition,6 respectively.

In 1989, the field grew substantially, and tens of

research papers were produced (refer to the provided

bibliography for details).

This paper surveys the different motivations for

creating MNNs; biological, psychological, hardware,

and computational (Sec. 2). Then, the general

stages of MNN design are outlined and surveyed as

well, viz., task decomposition techniques, training

schemes, and multi-module decision-making strate-

gies (Sec. 3). The advantages and disadvantages of

the surveyed methods are pointed out, and an as-

sessment with respect to practical potential is pro-

vided. Section 4 gives a quick reference to the major

attempts to use MNNs in real applications. Finally,

some general recommendations for future designs are

presented (Sec. 5).

2. Motivations

The importance of analyzing the motivations behind

building MNN is that these motivations can be used

in measuring the plausibility and performance of the

models. We consider four types of motivations: bio-

logical, psychological, computational, and hardware-

implementation.

2.1. Biological motivations

Historically, the Biological Neural Network (BNN)

structure was the main motivation behind build-

ing NNs, i.e., artificial NNs.7 NN founders aimed

at mimicing the functionality of the human brain

in order to build useful computational models.8–10

Moreover, NNs are considered, by many biologists,

as one of the important methods which will help us

understand the biological neural system itself.11–13

However, a lot of criticism has been directed at

the current generation of NNs because it ignores

the biological facts.11,14,15 There are calls for a new

neuromorphic, i.e., truly biologically-plausible, NN

generation with a promise that this will lead to sig-

nificant advances in computational power.11,16,17

It is true that NNs, so far, apply several assump-

tions which are biologically implausible. Its full, or

even uniform, connectivity pattern18 is isolated from

external factors.19 Synapses are identical in nature

and they are only transferring a discrete amount of

current into cells.11 The neuronal state depends on

the binary existence of an action or even an aver-

age rate of firing.11 Very limited computing resources

are available and network’s scale-up is restricted.17

Evolution of learning for recognition depends on

Modular Neural Networks: A Survey 131

mathematical functions, e.g., minimum-error, rather

than hierarchical feature-extraction from images.15

Network structures are identical.14,19

However, the explosive growth of new informa-

tion in neuroscience is still incapable of providing

a complete vision about the biological NN system

itself.14 This prevents the true emulation of BNN

using an NN.16 Now, there is a tremendous amount

of information at the component level, i.e., molec-

ular and intracellular.14 However, there is an obvi-

ous lack of information at the higher (and more ab-

stract) levels. For example, although the number of

discovered types of neurons in the human brain is

estimated as tens of millions,14 circuit diagrams are

still incapable of representing the simplest functions

performed by parts of the brain (e.g., the cerebellum

or striate cortex).14 Therefore, assessing the NN’s

“biological plausibility” is a dilemma by itself.

This lack of information on how exactly the bio-

logical system managed to achieve its marvelous ca-

pabilities prevents us from achieving the dream of a

truly neuromorphic NN generation for real applica-

tions. For example, when the numeral-recognition

NN model in Ref. 15 is “forced to be more like the

brain,” recognition accuracy degrades from 74% to

44% in the testing samples.

Therefore, we think that the plausibility of an NN

should be assessed according to the desired objective,

whether it is reverse engineering of the biological

system, or solving a certain computational problem

using an engineering model. We think that reverse

engineering should be for the mere purpose of under-

standing the complexities of the natural system.20

Examples of reverse engineering systems include,

studying the behavior of the hidden units when the

inputs and outputs are trained to patterns similar to

some input-output information given by the nervous

system11,21,22; studying the “what” and “where” of

cortical visual systems using the famous multiple-

experts MNN23; and studying the dynamics of the

hippocampus brain region which performs learning

and memory cognitive functions using an NN VLSI

model.24

On the other hand, the engineering model does

not have to be fully plausible, from the absolute bio-

logical point of view, as long as it is computationally

functioning. It is recommended, though, that system

engineers extract “fresh” ideas from BNN discover-

ies, e.g., building a three-layered network with a new

processing unit based on the cortical column.25 How-

ever, attempting to build a general NN model which

has the structural and computational capabilities of

the biological system is far from realistic, according

to the latest technological and anatomical advances.

Focusing our discussion on MNN designs, there

are several ideas extracted from BNNs. Some of

them are given below.

(a) Modularity. Biological modularity is the

first idea which motivated many MNN de-

signs. It appears that the brain is modular on

different spatial scales. On the smallest scale,

synapses are clustered on dendrites. On the

largest scale, the brain is composed of sev-

eral anatomically and functionally distinct

areas.26,27 Between these two levels, colum-

nar structures appear28 with intracolumnar

connections.18

(b) Functional specialization concept. The

visual cortex processes different attributes of

the visual scene, such as form, color and

motion, in separate “anatomically distinct”

regions (modules)29 (Fig. 2). There are cer-

tain groups of neurons in the cortex respond-

ing specifically to certain important faces and

tastes.26 Local and global stimulus percep-

tions are handled by parts of the left and

right brain-hemispheres, respectively.27 This

suggests similar functional specialization in

the MNN modules.30–32

(c) Fault tolerance. Biological modules, for

example in the visual cortex, are com-

municating but functionally independent;

bilateral damage to a particular part of

the human cortex can result in a partial

loss of sensation ability for a color, pat-

tern, or motion, without any other no-

table deficit.17,29,33,34 This reinforces apply-

ing biologically-motivated MNNs to systems

in hazardous environments.9,35,36 It is also

a motivation for building MNN hardware

implementations.16,18,30,35,37 However, sim-

ulating what we now know about the bio-

logical “circuits” may well be beyond the

capability of the current hardware technol-

ogy. For example, when part of the bio-

logical retina was simulated by a new arti-

ficial retina implant system,37 the resultant


Fig. 2. Modular structure based on functional specializa-tion is the basic design of the visual cortex (functionalmodules as defined in Ref. 29).

hardware architecture goes “beyond the

current decade’s complexity of bioelectronic

systems”.37

(d) Competition/Cooperation among

modules on the micro- or macro-

biological level. Cell groups are sometimes

subject to competitive exclusion.38 Visual

cortex modules cooperate via certain con-

nectivity patterns in forming the overall vi-

sion of objects.29 These observations suggest

forms of communication among modules in

order to take the final system’s decision.39,40

The area of macro-biology is also rich in

such ideas. Cooperation among “intelligent”

workers in the ants kingdom, for example,

gives new capabilities which are far more

complicated than the individual-ant capabil-

ity or even the simple summation of them

all.41 Through interesting communication

and cooperation strategies, “army-ants” are

able to form complicated tasks like building

long “bridges” of their bodies to connect tree

branches, and carrying relatively large and

heavy items collectively.41 This motivates

the idea of designing a cooperation scheme

which would give a performance much better

than the direct summation of all modules’

capabilities.39,42

(e) Scalability. In the biological sense, scal-

ability refers to the ability of the brain to

keep a nearly constant connectedness and

processing time in spite of a range of sizes

of its modules, which may reach several or-

ders of magnitude.43 It is suggested that

this is a result of the brain’s highly modular

structure.31 However, the exact patterns of

connectivity and integration of information

among the brain’s modules is still completely

unknown.14 Scalability, if implemented in

MNNs, is crucially important for building

large-scale hardware implementations.18

(f) Extendibility. A limited number of neurons

is assigned to each column (no more than

100 18). This suggests that the brain avoids

possible convergence problems by keeping the

columns relatively small.18 Hence, new mod-

ules can be added (used) without fundamen-

tally changing the other modules’ behavior.

This feature motivated changing the current

non-expandable networks into new modular

architectures where adding more modules al-

lows, hopefully, unlimited extendibility.18,44

2.2. Psychological motivations

The following are some human learning-system fea-

tures which motivated modularization of the NN

learning algorithms and structures.

(a) Learning in stages. There is a great

amount of evidence from developmental

psychology for the hypothesis that human

learning is not just one-pass-learning. For

example, it can be divided into distinct non-

interfering stages such as sensormotoric in-

telligence, imaginative intelligence, and for-

mal operations.26,45 These stages motivates

a similar behavior in MNN design (e.g., the

layer-by-layer learning in Ref. 26 and the

focus-of-attention cueing in Ref. 46). “Adap-

tive behavior” also motivated a modular

learning system in Ref. 47.

(b) Mixing supervised and unsupervised

learning. Unsupervised learning corre-

sponds to learning by discovery. In the

human learning system, supervised (goal

oriented) learning uses the concepts

acquired by unsupervised learning to


establish desired associations.26,48 This idea

is applied in MNNs by supporting the super-

vised learning module (stage) by a pre-stage

of unsupervised feature extraction or vector

quantization.26 In Ref. 49, a psycholinguistic

model of child language-development moti-

vates a modular neural net with a mixture

of supervised and unsupervised modules; su-

pervised modules for simulating the effects of

the environment, and unsupervised modules

for simulating the effects of self-learning.

(c) Free retroactive interference. Retroac-

tive interference implies that new training

patterns may affect the learning of nearly

all existing patterns.18 This kind of “catas-

trophic interference” appears in simulation

studies with the current state-of-the-art NN

models, and proves to be dependent on the

pattern of presenting learning samples and

their overlap in the hidden layer.18 The

free retroactive interference learning which

humans enjoy motivated MNNs. For ex-

ample, CALM18,50 is an MNN designed

to develop a distinct internal representa-

tion which discriminates between subsequent

training items as long as they are sufficiently

different.

(d) Decomposing tasks. Human informa-

tion processing systems sometimes decom-

pose a complex task into parts of man-

ageable size in order to solve it. This is

the way which, for example, humans cope

with NP-completeness.26 This approach mo-

tivated modular learning.26 A similar ap-

proach is used for solving the traveling sales-

man problem using a MNN mathematical

model.51

Finally, there are some attempts to simulate

psychological systems using the NN approach in

a way similar to reverse engineering of biological

systems.52,53 In this way, psychological phenomena,

such as “catastrophic interference” and even the

“problem of consciousness,” can be studied through

an NN implementation of a human model.52,54

2.3. Hardware motivations

As NN models are becoming more complex and their

applications more sophisticated, hardware imple-

mentation is becoming crucial for the development of

the whole field.18 Meanwhile, conventional hardware

is slowly approaching its theoretical limits.18 There-

fore, it is important to develop new structures which

have less memory and speed requirements, especially

when dealing with large scale applications.16,18,55,56

The main constraints on the hardware implemen-

tations of NN are the number of fan-in (-out) connec-

tions per node18; speed of processing information57;

and the length of physical connections.30 The fol-

lowing are some advantages of the MNN over the

nonmodular NNs with respect to these constraints.

(a) Sometimes, a tree-like topology, with atomic

processors (adders) at the nodes, is used

to calculate the summations of weights be-

fore multiplying them by the node transfer

function.18 This makes the speed of the feed-

forward process proportional to (the loga-

rithm of) the number of weights per node18

(Fig. 3).

(b) In case of virtual implementation of NNs,

nodes and/or connections are multiplexed

over the available processors and commu-

nication channels.58 For example, imple-

menting an MNN on NCUBE,56 BSP400

neurocomputer,59 and T800 transputer.60

Having less connections per node results in

decrease of the communication overhead on

the buses and efficient use of utilizing the

processors.18,61

(c) In case of physical implementation of an NN,

all nodes and connections are physically im-

plemented in hardware. Voltages are used to

w3w1 w2 w4 w5 w6 w8w7

+ +

+

+ +

+

+

a4 a5 a7 a8a6a3a2a1

Fig. 3. Dividing incoming weights to a node over a num-ber of processors in a tree topology.


represent activations at the connections.18,62

If learning is performed off-line (nonadap-

tive implementation), the currents resulting

from the summation of voltages over resis-

tors (weights) are presented as the (analog)

inputs to nodes. A limited fan-out is im-

portant for allowing enough current supply

and decreasing the complexity of the cir-

cuit design.57 In case of on-line learning, ca-

pacitors, for example, are used to store the

weight values.18 The constant supply needed

for decreasing the volatility of weight val-

ues is directly affected by the number of

connections of the network.18 Optical imple-

mentation of connections is, perhaps, free

from this constraint.18,57 In all of the above

cases, a modular NN structure would pro-

vide the required low number of connections

and limited fan-out.63 Moreover, it would

fit a corresponding modular hardware archi-

tecture with improved scalability, area ef-

ficiency, reduced interconnection problems,

and increased robustness.64

(d) Distributing the representation of multidi-

mensional spaces among multiple networks

(modules) can reduce the length and the com-

plexity of the required connections.30,57,65

Finally, although MNNs would facilitate hard-

ware execution of NNs, there is still a deficiency

in the accompanying (parallel) software which suf-

fers from serious limitations, such as difficulty of

“thinking parallel” and inability of serial-parallel

software conversion systems in handling complex

problems.18

2.4. Computational motivations

2.4.1. Complexity of learning

A difficult NN performance problem is the choice

of the set of neural network weights which performs

the desired mapping or classification function (some-

times called the loading problem). Although this

problem is NP-complete,66 there are greedy algo-

rithms (e.g., gradient search algorithms) which can

offer good approximations.8,67 However, such algo-

rithms suffer from the high coupling (interdepen-

dency) they introduce in training different hidden

nodes.39,56 This high coupling does not cope with

the variation in complexity of sub-parts of the task

under consideration.

If we consider a classification task as an exam-

ple, the network during learning will, first, learn

the easy (nonoverlapping, odd, separable) categories,

i.e., their output-error will decay first. In doing so,

it will devote most of its hidden units for them.56

The difficult (overlapping) categories left will not

find enough hidden units to accurately define their

decision boundaries. If learning proceeds till a satis-

factory performance is obtained for those categories,

the learning of the other easy tasks will be corrupted

as a result of memorizing of features of individual

categories.56 In other words, there will always be par-

tial over- and under-learning as long as the hidden

units are coupled.68 We think that slow-learning and

overlearning problems in nonmodular networks are

just symptoms of this coupling drawback.

Figure 4 and Table 1 show an experiment we have

carried out to illustrate the above point on a nine-

class 2D classification problem learned by a three-

layer Backpropagation network. Notice the range

of complexity (overlap) among the different classes.

Table 1 shows that after 20 sweeps over the train-

ing set, the network has learned classes 2, 4, 8, and

9. However, the rest of the classes were still under-

learned. Then, after 100 sweeps over the training

set, the network has now learned classes 1, 6, 8, and

9. However, 2, and 4 are now over-trained, i.e., their

recognition performance starts degrading, while 3, 5,

and 7 are still under-learned. After 200 sweeps, the

network has learned classes 1, 3, 5, 8, and 9. How-

ever, the over-trained classes included class 6, while

class 7 is still under-learned. Adding more layers or

nodes will only solve the problem in case the whole

network works as a look-up table (i.e., the case of

overlearning).

To understand the reason behind this behav-

ior, consider the Backpropagation learning equation,

∆W (n+1) = α∆W (n)−εdE/dW , where ∆W (n) is

the weight change vector for epoch n, dE/dW is the

current weight change vector, α is a constant which

determines the effect of past weight changes on the

current change, and ε is the learning rate. As learn-

ing proceeds, the less overlapping classes will show a

higher rate of progress than the more complex ones,

i.e., the effect of dE/dW of the unique classes will

be more significant, and they will, eventually, suc-

ceed in capturing more hidden nodes. The straight-


98

7Class 9Class 8Class 7Class 6

Class 4Class 3Class 2Class 1

Class 5

Fig. 4. A nine-class 2D classification problem.

Table 1. There will always be partial over-and under-learning as long as the hiddenunits are coupled.

Sweeps 20 100 200

Overall 46.8% 52.0% 59.4%

Class (1) 0% 99% 99%

Class (2) 100% 0% 0%

Class (3) 0% 13% 95%

Class (4) 98% 76% 68%

Class (5) 19% 0% 82%

Class (6) 0% 81% 0%

Class (7) 5% 0% 0%

Class (8) 100% 100% 100%

Class (9) 100% 100% 100%

forward solution to this problem is to give the more

complex (weakly recognized) classes higher ε (learn-

ing rate) than the others (as suggested by Ref. 56).

However, we think that this solution fails because of

the “catastrophic interference” phenomenon.18 The

introduction of higher ε for a certain class will,

certainly, enhance its performance. However, what

has been learned about all other classes would be cor-

rupted, and the overall performance would degrade.

The optimum method for avoiding such a problem

would be to train the NN weights one by one rather

than training the whole network,26 hence avoiding

coupling among hidden nodes. Because of the prac-

tical difficulty of this approach, the layer-by-layer

modular learning scheme is suggested for nonmod-

ular NNs.26

For modular structures, dividing the classifi-

cation task to separate neural modules decreases

the coupling effect. Modules would, then, solve

less complex sub-problems, i.e., not problems with

less overlap, but rather, problems with less over-

lap variations.39,56,69 Although choosing the sets of

weights on the local (modular) level is still NP-

complete, learning will be easier for the smaller

sub-tasks which are more homogeneous (in terms of

overlapping).39

2.4.2. NN size

Theoretically, a multilayer feedforward network

with as few as one hidden layer is capable of

approximating any function with any desired de-

gree of accuracy.70 However, there is a condition


that there should be sufficiently many hidden units

available.70 Determining the sufficient number of

hidden units is another challenging problem.67

Meanwhile, there is no general criteria for choos-

ing the network’s size (number of hidden nodes) since

the parameters of each application demand differ-

ent network capabilities. What is common for all

applications is this: If the network is too small, it

will not be able to grasp the different dimensions

of complexity of the problem. If the network is too

large, it will be “too capable” of representing the

function, i.e., it will act as a look-up table.67 There

have been several attempts to overcome this prob-

lem. Examples are: determining the network’s size

by trial-and-error, starting learning with a very large

network and then pruning unchanging nodes and/or

connections, and starting learning with a small net-

work and gradually adding hidden nodes as learning

evolves.67 However, these methods introduce much

complexity to the learning procedure and they are

still lacking in generality.67 Modularization involves

decreasing the size of the neural network’s “learning

unit.” This is the main computational merit which

an MNN would offer. A limited network size is also

specifically important if there is a limited storage ca-

pability available (for example, a large data base for

a Japanese character recognition application56).

2.4.3. Generalization

Generalization is the ability of the NN to respond,

with a reasonable accuracy, to inputs that were not

part of the training data. It is a measure of how well

the NN performs on the actual problem once training

is complete.67 Generalization is usually measured by

testing the performance of the NN at unlearned in-

put samples. The number of training samples is im-

portant for guaranteeing good generalization. Many

researchers relate this number to the size of the NN,

and discuss the relation between them in both ways:

What NN size gives the best generalization given a

certain number of training samples71? And, how

many training samples are required for training a

network with a predefined size72?

The most famous theoretical worst-case bound

on the number of training samples is the VC

dimension,72–75 which is, roughly, a linear function

in the number of weights.76 It is suggested that the

number of training samples be larger than the VC

dimension in order to obtain good generalization.67

There are different rules-of-thumb to determine how

large this number should be.18,67 However, practi-

cally speaking, many researchers experience that the

VC-dimension only offers a limited guide to the de-

sign (e.g., Ref. 72). The quality of the training sam-

ples and their place with respect to the borders of

the different decisions are also important factors, not

just their number.77

In any case, decomposing the objective task over

smaller, sparsely-connected, and less complex mod-

ules decreases the connections-per-node ratio sub-

stantially, and hence, decreases the theoretical, as

well as the practical, bounds of the required num-

ber of training samples.78 This factor is specifically

important if there is a limited number of available

training samples (e.g., when NN is used for model-

ing historical events79).

2.4.4. Speed of learning

It is noticed in supervised classification applica-

tions that the more accurate the developed learn-

ing scheme is, the slower it is during learning.67,80

However, there are applications where online learn-

ing and adaptation is important. There are sev-

eral attempts to speed learning in traditional NN

paradigms (e.g., “smart” initialization,7 pruning

during learning,81 and variations on gradient search

algorithms based on advanced learning rate, momen-

tum adaptation, heuristic rules, or advanced opti-

mization techniques7,82,83). However, these tech-

niques always introduce additional parameters which

are usually defined according to some loose rules-of-

thumb. If those parameters are not chosen prop-

erly, they can actually slow the rate of convergence.67

Another approach for enhancing NN speed suggests

the hardware implementation of the learning pro-

cess. However, several technical problems still exist

for large nonmodular NNs (Sec. 2.3).

The need for high-speed learning motivates

MNN, where small modules (trained in parallel)

would enhance the speed of convergence.30,40,51,56,84

A modular structure also improves the speed of

learning by reducing the effect of conflicting training

information (or crosstalk as Jacobs put it in Ref. 30).

Crosstalk degrades the ability of the network to per-

form correctly for a group of patterns, and hence,

delays the development towards a good solution.30


hiddenlayer

outputlayer

hiddenlayer

outputlayer

inputlayer

B

inputlayer

A

Fig. 5. According to Ref. 30, network A is susceptible to spatial crosstalk, while network B is not.

There are two types of crosstalk from which non-

modular NN suffer, spatial and temporal crosstalk.30

Spatial crosstalk occurs when the outputs of the net-

work provide conflicting error information to a single

hidden unit in a single iteration. This is common in

networks where the hidden layer is fully connected

to the output layer (Fig. 5).

Temporal crosstalk occurs when the network

receives conflicting training information over time

(iterations), e.g., when the network is forced to learn

several dissimilar functions simultaneously. How-

ever, when those dissimilar functions are learned

in different modules, smaller and simpler than the

large nonmodular network, learning speed is greatly

enhanced.30 In Ref. 85, modularizing the structure

of the recurrent version of Backpropagation net-

work enhances the speed, although some accuracy is

sacrificed.

3. MNN Design stages

We suggest three general steps which are common

in most MNN designs: task decomposition, training,

and multimodule decision-making (Fig. 6). Task de-

composition is to divide the learning task into several

sub-tasks, and assign each sub-task to one module

of the MNN. The task-decomposition idea is similar

to the old “divide and conquer” concept in benefit-

ing from some additional knowledge about the task

in order to break it into simpler (and more man-

ageable) sub-tasks. Then, modules would learn in

parallel or in a certain sequence according to the

Task Decomposition

Training Modules

Multi-Module-Decision-Making

Fig. 6. The general three main stages for designing MNN.

design. When the whole network is ready to operate,

a multimodule decision-making strategy has to take

part in order to integrate the different local decisions

(at the modular level) into a global one.

MNNs’ performance is a function in the efficiency

of the three design stages. The MNN design has to

give a reasonable balance between sub-tasks simpli-

fication and decision-making efficiency. While the

task-decomposition algorithm attempts to produce

sub-tasks as simple as they can be, the modules are

expected to give the multimodule decision-making

strategy enough information to take an accurate

global decision.


3.1. Task decomposition

The following are different ways of defining modules

(sub-tasks).

3.1.1. Naturally-defined modules

In some cases, the MNN handles a group of tasks

which are naturally separate (different). This sug-

gests a direct assignment of each task to one module.

Examples are: determining truck type and truck ve-

locity using different modules,86 assigning each time

delay NN to one speaker,87 training two modules ac-

cording to the two levels of likelihood of myocardial

infarction,88 assigning one Backpropagation net for

each block of the image,89 dividing the controller net-

work into position control and damping module,26 di-

viding the stop consonants into voiced and voiceless

groups,90,91 and dividing raw signals into stationary

and impulsive.69

However, we think that a systematic task-

decomposition technique is needed for the following

reasons. For example, consider the following classifi-

cation problems.

(a) In many cases, a “visual image” of the dif-

ferent classes are not available, e.g., medi-

cal data.92 This makes separating groups of

classes into distinct sub-tasks a difficult job.

(b) When the number of classes increases sub-

stantially, e.g., recognizing 100 words in

Ref. 55, it is usually difficult for the MNN-

designer to figure out which classes should be

grouped in separate sub-tasks.

(c) Sometimes, even when the designer has a vi-

sual image of the classes, the utilized feature-

vector may fail in representing the visual

inter- and intraclass variations, e.g., vowel

recognition problem in Ref. 92. Hence, a

systematic approach for task-decomposition

would be important so that sub-tasks would

be defined according to the available repre-

sentation given by the features rather than

what the features should be.

Generally, there are two ways of “modularizing”

an NN systematically: modularizing learning and

modularizing structure.

3.1.2. Modularizing learning

The current NN technology is criticized for its non-

explicit structure of learning.26 An NN is still viewed

as a “black-box”, and information about intermedi-

ate learning steps are unavailable. This does not al-

low evaluation and/or debugging of independent and

distinct sub-tasks.

Modularizing the learning procedure implies

learning in stages, whether the NN structure is

modular or nonmodular. Modular learning tech-

niques make the NN technology more acceptable

in an industrial engineering environment.26 Beside

giving better evaluating and debugging facilities,

modularization also makes input-data representation

easier to understand than the case of nonmodular

networks.93 Modular learning approaches include the

following.

(a) Supporting supervised learning by

feature extraction. Motivated by the hu-

man information processing system, an un-

supervised learning stage is introduced be-

fore the supervised learning stage in order

to represent data in a more advantageous

way.26 Reducing the dimensionality of the

input data via unsupervised feature extrac-

tion decreases the computational expenses

by decreasing the number of weights of the

supervised learning NN. Moreover, this will

enhance the generalization abilities of the

network by decreasing the number of free

parameters.26,71,94 Feature extraction tech-

niques include using an unsupervised delta-

rule self organization hidden layer before the

supervised layers.26 In Ref. 95, the hidden

layer of a supervised feedforward network is

used to create an abstraction of the data

which is used as the input to another super-

vised module. A similar technique is found

in Ref. 96 although the hidden layer is con-

nected to a Learning Vector Quantization

(LVQ) network instead of the (basically lin-

ear) output layer. A better performance is

obtained.

(b) Supporting supervised learning by

quantization. Instead of reducing the size

of the supervised learning task, the task de-

composition technique reduces the complex-

ity of the task. For example, transforming

the nonlinear task into a linear one using a

self-organizing quantization module. Then, a


linear single layer classifier can be efficiently

applied to the linear task.26

(c) Decomposing the learning data set. In

Ref. 97, the learning data samples are decom-

posed, according to their training error, into

difficult- (high error) and easy-to-learn (low

error) sets. After learning the easy-to-learn

subset, it is removed from the training data,

and the algorithm is applied iteratively.

(d) Minimizing interactions between

weights. The interdependency problem of

the Backpropagation weights is investigated

in Ref. 26. A layer-by-layer weight change

is suggested as a solution.26 Repeatedly: the

first hidden layer is modified for a certain

number of iterations. Then, its activations

are mapped onto the output layer in order

to modify the rest of the weight matrix,26,94

and so on.

3.1.3. Modularizing structure

In this approach, the application task is decom-

posed so that it would be distributed over sev-

eral structurally separate modules (similar to the

anatomically-distinct brain regions of the previous

section). Since classification and function approxi-

mation are the main applications used with MNNs,

they will dominate our discussion here.

Decomposing a classification task involves clus-

tering (grouping) the categories into a certain num-

ber of groups. Then, a separate classifier (neural

module) can be applied to every group. Figure 7

shows the general architecture of several modular

structures. In Ref. 98, we have carried out several

experiments to compare these networks with each

others and with the nonmodular alternative. The

experiments showed the superiority of the modu-

lar networks (except for the case of the decoupled

modules).

In Ref. 99, the input space itself rather than the

categories is “sliced” into several subspaces.

The simplest method for grouping categories uses

a single network which learns all the different cate-

gories. The “inefficiency” in learning is then detected

by its output confusion matrix, and used as a key

for gathering confused categories in groups. Confus-

ing classes can be defined according to human ob-

servation of the network’s errors.55 In Ref. 100, a

systematic approach is proposed which draws the

error-relations of the different classes as a graph,

and then, separates parts of dense connections. This

method is used in Ref. 55 for grouping speech words

and in Ref. 100 for grouping handwritten charac-

ters. Special modules are assigned for the confusing

(overlapping) parts (classes) of the input space, and

hence, solves the classification problem in a simple

and direct way. However, this method is sensitive to

odd testing samples which may give incorrect confu-

sions. Moreover, it requires a long supervised train-

ing phase before determining the groups.

A genetic algorithm was proposed to optimize

task allocation to the different modules.101 A faster

and more systematic way uses an unsupervised mod-

ule for clustering categories as a prestage before

learning. Then, supervised modules would learn the

different groups separately. The following are exam-

ples of unsupervised techniques used for preparing

the data for a modular structure.

(a) A Self Organizing Map (SOM) is used in

Refs. 102 and 103 for clustering categories.

The maximum distance clustering algorithm

is, alternatively, used in Ref. 104 and the

optimum number of clusters is calculated

according to the minimum cluster tightness

change.

(b) The Adaptive Resonance Theory (ART) net-

work is used in Refs. 40, 42, 105 and 106. The

feature of balancing attention and arousal,

and the use of the vigilance factor allows,

automatically, choosing of a suitable number

of clusters according to the considered data

and building a hierarchy of groups.42,105 ART

also allows a flexible expansion of the clas-

sification problem by assigning new classes

to existing groups or creating new groups if

necessary.

(c) An alternative method to ART is the adap-

tive K-means algorithm.107 Different ver-

sions are used to cluster Kanji characters,108

Japanese characters,56 and speech words.109

(d) LVQ network is used to obtain a set of refer-

ence vectors.56,110 These references are used

to partition the feature space into sub-spaces,

each one being assigned to a sub-net.

Determining the modules’ sizes is one of the

famous MNN-performance problems. Applying the


task decomposition technique without putting any

constraint on modules’ sizes may lead to a domina-

tion of the classification sub-tasks by one module.

Although all of the other modules would be simple

and efficient, the dominating module would suffer

from the same drawbacks of the nonmodular net-

works (e.g., Refs. 31, 40, 56, 108, 110 and 111). In

Ref. 109, a constraint is put on the sub-databases

that they should be equal in size. This, obviously,

decreases the flexibility of the task-decomposition

(clustering) technique to decide the size of the clus-

ter only according to similarities among its mem-

bers. Sometimes, an adaptive task-decomposition

technique is defined to construct the MNN during

learning. This is discussed in the next section.

3.2. Training modules

After the task-decomposition stage and the defini-

tion of the different algorithmic or structural mod-

ules, learning begins.

In most modular learning cases, modules (learn-

ing stages) are carried out sequentially.26,94,96,97 In

Ref. 95, however, an attempt is made to decouple

the learning equations so that the modules would be

executed in parallel for faster error propagation.

On the other hand, modular structures often train

modules in parallel, independent of each other, i.e.,

their learning equations are completely decoupled.

This allows faster convergence at the expense of

using extra parallelism.39,40,51,56,88,97,99,108,112,141 In

Ref. 104, after defining clusters, the switching net

and the leaf nets are trained in parallel.

Some MNNs adaptively define sub-tasks, and

sometimes modules, during learning.

(a) In Ref. 110, the MNN automatically grows

according to the introduced learning sam-

ples. The LVQ network determines the cre-

ation of new branch networks if the similar-

ity measure is less than a certain threshold.

This allows the size to grow during learn-

ing according to the complexity of the prob-

lem. An alternative approach creates new

hidden nodes, rather than new branch net-

works, also according to the uniqueness of

the arriving training samples.113 A similar in-

cremental approach is given in Ref. 114. A

design is made specifically for inserting pre-

trained parts in the network topology.115

(b) In Ref. 102, the SOM assigns each new train-

ing sample to one sub-net according to the

result of clustering. This technique suffers

from the sensitivity of the SOM towards over-

lapping classes with complex decision bound-

aries, and hence, the assignment of samples of

the same category to different modules.102 To

enhance the performance, the training sam-

ple is directed to the nearest two modules

rather than one, as in the version of Ref. 103.

(c) In Refs. 7, 30–32, as learning samples are in-

troduced, the weights of the expert networks

are modified so as to reduce the sum of the

squared error between the output of the sys-

tem and the desired output. The one which

comes closer to producing the desired output

is considered a winner. If the error is sig-

nificantly improved, the gating modules will

perform task-decomposition by assigning the

input pattern to this winner expert. If the

performance does not improve, the outputs

of the gating network will approach neutral

values, i.e., this sample is not yet clearly as-

signed to one of the modules.

The mixture of experts has been used

successfully in several applications. How-

ever, functions separated using this error-

based technique should be fundamentally

different. Otherwise, all functions will be

learned by all modules.7 An asymptoti-

cal analysis has proved that performance

will be at least as good as the nonmodu-

lar network learning with the same (local)

learning scheme.116 To avoid this limita-

tion, an alternative which trains modules

independently is suggested.117 Other tech-

niques are suggested to improve the general-

ization capabilities of the mixture of experts

MNN, such as regularization penalty ap-

plied to the gating network118; learning with

missing features119; building a hierarchal

structure120; and proposing new object func-

tions (for example, the EM algorithm121 and

the Agglomerative clustering algorithm122).

(d) The multi-sieving NN classifies samples us-

ing rough sieves at the beginning of learn-

ing and then using finer ones gradually.123

Sieving modules are, iteratively, added with

the progress of training after freezing of the


older ones and removal of their samples from

the training set. A similar idea is given in

Ref. 124.

An interaction (cooperation) among modules

may be required for achieving better learning ob-

jectives. One way of cooperation is by merging

the different modules’ weights, output values, or

nodes:

(a) In Ref. 125, a number of identical mod-

ules are trained on the same task and their

weights are averaged every certain epoch

in order to avoid local minima. How-

ever, estimating the number of training sam-

ples needed for training each module is a

“dilemma”.126 If N is the number of training

samples and M is the number of modules, a

single network trained on the whole N sam-

ples is proven to be more efficient than the

whole ensemble when each module is trained

on distinct N/M samples. Meanwhile, when

the ensemble modules are trained with larger

“overlapping” data sets, positive correla-

tions among these sets increase and efficiency

degrades as well.126

(b) In Ref. 35, multiple identical modules

(famously called, NN ensemble, first pro-

posed in Ref. 127) are used and their outputs

are averaged so as to increase fault tolerance

by filtering out the effect of injured modules.

A similar technique is used to avoid local

minima on the modular level.128 The obvious

drawback of this approach is its high compu-

tational requirements and, consequently, its

low training speed. Some techniques have re-

cently been suggested to “prune” the whole

modules which prove to increase the global

error.128,129 Several theoretical studies which

attempt to optimize the combination of the

ensemble modules are presented (for exam-

ple, Refs. 130 and 131). Moreover, a cross-

validation technique is suggested for improv-

ing performance.132

(c) In Ref. 90, several three-layer modules are

trained separately. When they reach the de-

sired performance levels, the weights at their

hidden nodes are fixed. Then, all the hidden

nodes are merged (concatenated) and some

hidden nodes are added to glue them, i.e., to

discover additional features when learning is

resumed (Fig. 8). The resultant large net-

work now has a smart start-point for further

learning (tuning) after weights are released.

Sharing information among modules may be set

subject to a certain condition. For example, only

when the learning BP falls into a local minimum in

Refs. 133 and 134, the SOM intervenes by classify-

ing the presented patterns and changing the weights

accordingly.

Some structural modules learn in a sequential

order and allow passing of information after each

stage. In Ref. 135, the knowledge module waits for a

bias from the experience module. In Ref. 136, matrix

inversion and frequency modulation are formulated

as a sequence of functions learned by a sequence of

modules.

After learning is completed, a strategy is needed

for integrating the local decisions of the modules into

the final global one, i.e., to carry out multimodule

decision-making.

3.3. Multimodule decision-making

The relationship among modules may vary from com-

plete decoupling, through cooperation, to competi-

tion. In Ref. 137, there has been an attempt to

combine cooperation and competition. However, the

result is an assignment network which is incapable of

adaptive learning, and hence, incapable of handling

most practical applications.

3.3.1. Decoupled modules

In case of complete decoupling, the decision is taken

according to the absolute maximum activation of

all the modules.69,88,109 In this scheme of decision-

making, every sub-net has no information about

the others, and hence, there is no inhibition to any

high reaction in a wrong module.39 To build a strat-

egy which is more capable of dealing with complex

problems, there should be some defined relation-

ship among the decision boundaries of the different

modules.

3.3.2. Competitive modules

Competition among modules is sometimes car-

ried out using an unsupervised NN.40,56,103,109,110


Fig. 7. Eight of the most common MNN architectures.


Fig. 8. Merge-and-glue network. Two learning phases.

Unsupervised NNs, typically, contain two layers of

neurons. One is the input layer, which receives the

testing sample vector. The second is the output

(competitive) layer which has a certain number of

nodes, each referring to one of the modules. When

a testing input is presented, the output node which

has a weight vector closest to the current input (in

some distance metric) will be the only active node (in

the winner-take-all competition). Then, the module

which corresponds to this node will be allowed to

take the final decision.

The problem with this technique is the high sen-

sitivity of the unsupervised networks (for example,

SOM, ART and LVQ) in handling “complex” feature

space.7,40,56,102,103,110 Unsupervised networks cate-

gorize patterns that are “near enough,” according

to a fixed distance measure, into the same category.

This methodology is unsuitable for classifying com-

plex patterns, where the distances among the dif-

ferent categories and even within the same category

are highly variable. The following are some solutions

suggested to avoid this limitation.

(a) In Ref. 40, a low vigilance ART is used.

Although the low vigilance (coarse catego-

rization) is stable enough, it only allows

clearly separated clusters. This impairs the

proposed MNN in dealing with practical

problems.

(b) In Refs. 102 and 103, the three best match-

ing output nodes are considered, and an ad-

ditional measure is used for confirming the

final decision. A similar approach is ap-

plied in Ref. 56, where the unsupervised de-

cision is supported by testing the “Euclidean

distance” from some predefined vectors.

(c) In Ref. 110, the matching scores of the LVQ

network are multiplied by the values of the

outputs of the stem networks in order to add

some stability to the LVQ decisions.

(d) In Refs. 97 and 104, a more rigid solution

trains a supervised network with the resul-

tant clusters of the unsupervised technique.

The supervised network, which offers better

“generalization,” is used afterwards for se-

lecting the winner sub-net. A similar ap-

proach is that of Refs. 30 and 31, where

competition is carried out in the gating net-

work. This network is already (supervisedly)

trained to assign samples to modules during

the learning stage.

3.3.3. Cooperative modules

In the cooperative decision-making schemes, all the

outputs of the different sub-nets are combined rather

than one winner sub-net being used. The degree of

the modules’ dependency differs from one technique

to the other. It is suggested in Ref. 138 that combin-

ing the result of several independent classifiers is bet-

ter than combining dependent classifiers even with

better individual performance.

One way of cooperation defines an other bit at

the output of each module.56,88,91 This bit gives an

alarm if the sub-net decides that this sample does

not belong to its group of decisions. This gives one

additional bit of information which may not be suf-

ficient in case of more than two modules. Moreover,

the other local decision is learned using training sam-

ples from all the classes outside the network. This

training can be quite complex if the diversity in these

classes is large.69

The objective of a voting scheme in real-life elec-

tions is to represent the information available at the

different bids in order to reach a “fair” final deci-

sion. A fair decision can be defined as the deci-

sion which best represents the consensus of voters.139

This decision-making scheme motivates multimodule

decision-making in MNNs, where the modules are


modeled as voters in a single ballot election.39,112,140

The aim is, also, to reach a representative final de-

cision according to the local opinions (activations)

of the modules. Voting is used for cooperation

among multiple identical modules trained on the

same large task.112,127,141 However, identical net-

works, even with different initialization schemes,

tend to produce similar mistakes.112 Therefore, the

single module performance will only be slightly

enhanced.112,141

In most of the MNNs with voting modules, the

Majority (Plurality) voting scheme is used, i.e., each

module “nominates” one class and the class which

takes the majority of the votes wins. Therefore,

the Majority vote only uses the maximum activa-

tion at the output layer (which corresponds to the

module’s choice). However, a study of the feasibility

of utilizing other voting schemes is given in Ref. 42.

It is concluded that the voting schemes which con-

sider both the order and the preferences of the votes

are more accurate in taking the final classification

decision. Examples of such schemes are the Fuzzy

(Average) vote and the Nash vote.42 When these

schemes are utilized with the Cooperative Modu-

lar Neural Network (CMNN), much better results

are obtained.42,142,143 A similar scheme which uses

“fuzzy integrals” to combine several NN activations

is in Ref. 144. Fuzzy and Nash votes can be used in

enhancing the performance of several majority-vote

MNNs.112,141,145

In Ref. 146, a visual explanation of the decisions

of the different classifier modules is provided. This

allows the system to benefit from the information

available at the human user.

4. MNNs in real applications

Artificial NNs, in general, proved to be extremely

successful for many real applications (refer to the

July, 1997, issue of the IEEE transactions on NNs

for examples). However, the nonmodular NNs

technical problems — discussed above — encour-

aged new designs of MNNs for several applica-

tions. Examples are: reverse engineering natural

systems,2,23,50,54,147 classification,89,148–157 func-

tion approximation,158–164 filtering,165 unsupervised

clustering,140,166–168 control engineering,3,123,169–172

and mathematical modeling.51,136,173,174 Figure 9

shows a comparison between two kinds of MNN re-

search: application-oriented and theory-oriented.

4

5

6

7

8

9

10

11

12

13

90 91 92 93 94 95 96

Com

ulat

ive

perc

enta

ge o

f app

licat

ion-

orie

nted

MN

N p

aper

s

Year

Fig. 9. The percentage of application-oriented MNNs rel-ative to the theory-oriented MNNs.

5. Conclusions

In this paper, modular neural network structures and

algorithms are surveyed. Based on the presented

information, the following recommendations can be

given for future designers.

(a) Other related fields can be utilized in de-

signing MNNs. An example is, using multi-

participant decision-making techniques in

integrating the local decisions of the differ-

ent modules. Another example is studying

the biological discoveries for extracting more

fresh ideas for modularization, e.g., the vi-

sual cortex specialized modules. However, for

building engineering solutions to real applica-

tions, the MNN design does not have to be

completely “biologically plausible.”

(b) The task decomposition technique should put

a constraint on the modules’ sizes. However,

modules should not be forced to be equal-

sized because this will limit the system’s flex-

ibility. Also, it is recommended to design

an automatic task-decomposition technique

rather than depending on human experience.

However, the fixed distance-criterion used in

unsupervised NNs is unsuitable for complex

problems. Changing this criterion according

to different levels of data-complexity may be

more efficient. This can be achieved, for ex-

ample, by using a hierarchy of unsupervised

NNs for task decomposition.

(c) The task-decomposition technique should be


able to deal with the wide range of over-

laps in the input space, i.e., to create more

“homogeneous” sub-tasks for modules.

(d) Decoupling the learning equations of the dif-

ferent modules allows them to learn in par-

allel. This gives a faster and less complex

learning.

(e) Multimodule decision-making should effi-

ciently integrate the local “experiences” of all

the modules.

(f) It appears from the survey that MNN-

designers usually do not balance the simpli-

fication of sub-tasks and the efficiency of the

multimodule decision-making strategy. In

other words, the task-decomposition algo-

rithm should produce sub-tasks as simple as

they can be, but meanwhile, modules have

to be able to give the multimodule decision-

making strategy enough information to take

an accurate global decision.

(g) The performance of a new MNN should be

compared with other state-of-the-art MNNs

rather than with the traditional models

which are, usually, less efficient. Also, the

performance criteria should include, in addi-

tion to accuracy and speed, aspects like the

feasibility of hardware implementation, com-

putational “work,” and the memory require-

ments of the model.

References

1. J. Hochberg 1964, Perception (Prentice Hall, NewJersy).

2. E. Micheli-Tzanakou 1987, “A neural networkmodel of the vertebrate retina,” in 9th Ann. Conf.IEEE Engineering in Medicine and Biology Society(Boston, MA, USA), November 13–16.

3. E. Fiesler and A. Choudry 1987, “Neural networks asa possible architecture for the distributed control ofspace systems,” in 3rd Conf. Artificial Intell. SpaceAppl. (Washington D.C., USA), p. 401.

4. D. Hartline 1988, “Models for simulation of real neu-ral networks, in Int. Neural Network Soc. (INNS)First Ann. Meeting (Boston, USA), p. 256.

5. D. Gray, A. Michel and W. Porod 1988, “Appli-cation of neural networks to sorting problems,” inInt. Neural Network Soc. (INNS) First Ann. Meet-ing (Boston, USA), p. 441.

6. Y. Anzai and T. Shimada 1988, “Modular neuralnetworks for shape and/or location recognition,” inInt. Neural Network Soc. (INNS) First Ann. Meeting(Boston, USA), p. 158.

7. NeuralWare Inc. 1993, NeuralWorks II/Plus: Soft-ware manual (NeuralWare Inc., PA, USA).

8. D. Rumelhart, G. Hinton and R. Williams 1986“Learning internal representations by error prop-agation,” in Parallel Distributed Processing, eds.D. Rumelhart and J. McClelland (MIT Press),pp. 318–362.

9. S. Regers and M. Kabrisky 1991, An Introductionto Biological and Artificial Neural Networks for Pat-tern Recognition, Vol. TT4. SPIE Optical Engineer-ing Press.

10. G. Auda 1992, “Performance criteria for neural net-work evaluation,” Master’s thesis, Cairo University,Egypt.

11. D. Gardner 1993, “Introduction: Toward neural net-works,” in The Neurobiology of Neural Networks, ed.D. Gardner (MIT Press), pp. 1–20.

12. A. Selverston 1993, “Modeling of neural circuits:What have we learned?” Ann. Rev. Neurosci. 16,531–546.

13. M. Ito 1993, “What can we expect from neural net-work models?” in Int. Joint Conf. Neural Networks,Vol. 1, Nagoya, Japan.

14. T. Bullock 1993, “Integrative systems research inthe brain: Resurgence and new opportunities,” Ann.Rev. Neurosci. 16, 1–15.

15. M. Serna and L. Baird 1996, “Comparison of brainstructure to backpropagation-learned-structure,” inIEEE Int. Conf. Neural Networks (ICNN96), Vol. 2(Washington, D.C., USA), pp. 706–711.

16. P. Baldi and N. Toomarian 1993, “Learning trajecto-ries with a hierarchy of oscillatory modules,” in Int.Conf. Neural Networks (USA), pp. 1172–1176.

17. T. Lee 1991, Structure Level Adaptation for ArtificialNeural Networks (Kluwer Academic Publishers).

18. J. Murre 1992, Learning and Categorization in Mod-ular Neural Networks (Harvester-Wheatcheaf).

19. E. Marder 1988, “Modulating a neuronal network,”Nature 335(22), 296–297.

20. P. Hudson and R. Phaf 1991, “Orders of approxi-mation of neural networks to brain structure: Lev-els, modules and computing power,” in ArtificialNeural Networks, eds. T. Kohonen, K. Makisara,O. Simula and J. Kangas (Elsevier Science Publish-ers), pp. 1025–1028.

21. E. Mingolla, H. Neumann and L. Pessoa 1994, “Amulti-scale network model of brightness perception,”in World Congress on Neural Networks, Vol. 4, (SanDiego, USA), pp. 299–306.

22. B. Graham and D. Willshaw 1994, “Capacity andinformation efficiency of a brain-like associate net,”in Neural Info. Proc. Syst. – Nat. Synth. (Denver,USA).

23. K. Cave, J. Rueckl and S. Kosslyn 1989, “Why arewhat and where processed by separate cortical visualsystems? A computational investigation,” J. Cogni-tive Neuroscience 1(2), 171–186.


24. R. Tsai, B. Sheu and T. Berger 1996, “Vlsi designfor real-time signal processing based on biologicallyrealistic neural models,” in IEEE Int. Conf. Neu-ral Networks (ICNN96), Vol. 2 (Washington, D.C.,USA), pp. 676–679.

25. F. Guyot, F. Alexandre and J. Halton 1991, “Thecortical column: A new processing unit for multilay-ered networks,” Neural Networks 4, 15–25.

26. T. Hrycej 1992, “Modular learning in neural net-works: A modularized approach to classification,”(Wiley).

27. S. Johannes, B. Wieringa, M. Matzke and T. Munte1996, “Hierarchical visual stimuli: Electrophysiolog-ical evidence for separating left hemispheric globaland local processing mechanisms in humans,” Neu-roscience Lett. 210(2), 111–114.

28. A. Chernjavsky and J. Moody 1990, “Note on de-velopment of modularity in simple cortical mod-els,” in Adv. Neural Info. Proc. Syst., Vol. 2, ed.D. Touretzky (Morgan Kaufmann), pp. 133–140.

29. S. Zeki and S. Shipp, 1988, “The functional logic ofcortical connections,” Nature 335(22), 311–317.

30. R. Jacobs 1990, Task Decomposition Through Com-petition in a Modular Connectionist Architecture,PhD thesis, University of Massachusets, Amherst,MA, USA.

31. R. Jacobs, M. Jordan and A. Barto 1991, “Task de-composition through competition in a modular con-nectionist architecture: The what and where visiontasks,” Cognitive Science 15(2), 219–250.

32. R. Jacobs, M. Jordan and A. Barto 1991, “Task de-composition through competition in a modular con-nectionist architecture: The what and where visiontasks,” Neural Computation 3, 79–87.

33. S. Aglioti, A. Beltramello, A. Bonazzi andM. Corbetta 1996, “Thumb-pointing in humans afterdamage to somatic sensory cortex,” Exp. Brain Res.109(1), 92–100.

34. A. Cowey 1981, “Why are there so many visualareas?” in The Organization of the Cerebral Cortex,ed. F. Schmitt (MIT Press), pp. 395–413.

35. W. Lincoln and J. Skrzypek 1990, “Synergy of clus-tering back propagation networks,” in Adv. NeuralInfo. Proc. Syst., Vol. 2, ed. D. Touretzky (MorganKaufmann), pp. 650–657.

36. A. Gupta, L. Mui, A. Agarwal and P. Wang 1994,“An adaptive modular neural network with appli-cation to unconstrained character recognition,” Int.J. Pattern Recognition and Artificial Intell. 8(5),1189–1222.

37. M. Schwarz, B. Hosticka, R. Hauschild, W. Mokwa,M. Scholles and H. Trieu 1996, Hardware archi-tecture of a neural net based retina implant forpatients suffering from retinitis pigmentosa,” inIEEE Int. Conf. Neural Networks (ICNN96), Vol. 2(Washington, D.C., USA), pp. 653–658.

38. G. Edelman 1981, “Group selection as the basisfor higher brain function,” in The Organization ofthe Cerebral Cortex, ed. F. Schmitt (MIT Press),pp. 538–563.

39. G. Auda, M. Kamel and H. Raafat 1994, “A new neu-ral network structure with cooperative modules,” inWorld Cong. Comput. Intell., Vol. 3 (Florida, USA),pp. 1301–1306.

40. W. Tsai, H. Tai and A. Reynolds 1994, “An art2-bp supervised neural net,” in World Cong. NeuralNetworks, Vol. 3 (San Diego, USA), pp. 619–624.

41. N. Franks 1989, “Army ants: A collective intelli-gence,” American Scientist 77, 139–145.

42. G. Auda, M. Kamel and H. Raafat 1995, “Votingschemes for cooperative neural network classifiers,”in IEEE Int. Conf. Neural Networks, ICNN’95,Vol. 3 (Perth, Australia), pp. 1240–1243.

43. C. Stevens 1989, “How cortical interconnectednessvaries with network size,” Neural Computation 1,473–479.

44. H. Ayestaran and R. Prager 1993, “Towards con-tinuously learning neural networks,” in Int. JointConf. Neural Networks, Vol. 3 (Nagoya, Japan),pp. 2280–2283.

45. R. Phaf, J. Murre and G. Wolters 1992, “Calm: Cate-giorizing and learning module,” Neural Networks 5,55–82.

46. M. Moya, L. Hostetler, M. Koch and R. Fogler 1995,“Cueing, feature discovery, and one-class learning forsynthetic aperture radar automatic target recogni-tion,” Neural Networks 8(7/8), 1081–1102.

47. M. Patel 1995, “Constraints on task and search com-plexity in ga+nn models of learning and adaptive be-haviour,” in Evolutionary Computing. AISB Work-shop (Sheffield, UK), pp. 200–223.

48. R. Phaf, J. Murre and G. Wolters 1991, “Calm net-works: A modular approach to supervised and un-supervised learning,” in 1991 IEEE Int. Joint Conf.Neural Networks, Vol. 1 (Singapore), pp. 649–656.

49. S. Abidi 1996, “Neural networks and child devel-opment: A simulation using a modular neural net-work architecture,” in IEEE Int. Conf. Neural Net-works (ICNN96), Vol. 2 (Washington, D.C., USA),pp. 840–845.

50. B. Happel and J. Murre 1994, “Design and evolu-tion of modular neural network architectures,” Neu-ral Networks 7(6/7), 895–1004.

51. Y. Foo and H. Szu 1989, “Solving large-scale opti-mization problems by divide-and-conquer neural net-works,” in Int. Joint Conf. Neural Networks, Vol. 1(New York, USA), pp. 507–511.

52. P. Werbos 1994, “Neural nets, conciousness, ethicsand the soul,” in World Cong. Neural Networks,Vol. 1 (San Diego, USA), pp. 221–228.

53. R. Shadmehr, T. Brashers-Krug and E. Todorov1994, “Catastrophic interference in human motor


learning,” in Adv. Neural Info. Proc. Syst., Vol. 7,eds. M. Mozer, D. Touretzky and M. Hasselmo (MITPress).

54. T. Brashers-Krug, Reza Shadmehr and E. Todorov1994, “Catastrophic interference in human motorlearning,” in Neural Info. Proc. Syst. – Nat. Synth.(Denver, USA), pp. 18–26.

55. H. Hackbarth and J. Mantel 1991, “Modular connec-tionist structure for 100-word recognition,” in Int.Joint Conf. Neural Networks, Vol. 2, (Seattle, USA),pp. 845–849.

56. K. Joe, Y. Mori and S. Miyake 1990, “Constructionof a large scale neural network: simulation of hand-written Japanese character recognition on ncube,”Concurrency 2(2), 79–107.

57. R. Webb 1993, “Optoelectronic implementation ofneural networks,” Int. J. Neural Systems 4(4),435–444.

58. J. Ouali, G. Saucier and J. Trilhe 1991, “Distributedlarge neural networks on silicon,” in Silicon architec-tures for neural nets (Elsevier Science Publishers),pp. 11–29.

59. S. Kleynenberg, J. Murre, J. Heemskerk and P. Hud-son 1991, “Integrating the bsp400 neurocomputerwith the metanet network environment,” in 1991IEEE Int. Joint Conf. Neural Networks (Singapore),pp. 1476–1480.

60. J. Murre 1991, “Transputer implementations of neu-ral networks: An analysis,” in Artificial Neural Net-works, eds. T. Kohonen, K. Makisara, O. Simulaand J. Kangas (Elsevier Science Publishers),pp. 1537–1541.

61. R. Means 1994, “High speed parallel hardware per-formance issues for neural network applications,” inWorld Cong. Comput. Intell., Vol. 1, (Florida, USA),pp. 10–16.

62. R. Liu, C. Wu and I. Jou 1995, “Cmos current-modeimplementation of spatiotemporal probabilistic neu-ral networks for speech recognition,” J. VLSI SignalProc. 10, 67–84.

63. M. Costa, E. Fillipi and E. Pasero 1994, “A modu-lar cyclic neural network for character recognition,”in World Cong. Neural Networks, Vol. 3 (San Diego,USA), pp. 204–209.

64. H. Djahanshahi, M. Ahmadi, G. Jullien andW. Miller 1996, “A modular architecture for hy-brid vlsi neural networks and its application in asmart photosensor,” in IEEE Int. Conf. Neural Net-works (ICNN96), Vol. 2 (Washington, D.C., USA),pp. 868–873.

65. Z. Xu and C. Kwong 1996, “A decomposition princi-ple for complexity reduction of artificial neural net-works,” Neural Networks 9(6), 999–1016.

66. A. Blum and R. Rivest 1988, “Training a 3-node neu-ral network is np-complete,” in 1st ACM WorkshopComput. Learning Theory (Cambridge, MA, USA),pp. 9–18.

67. D. Hush and B. Horne 1993, “Progress in supervisedneural networks: What’s new since lippmann?” IEEESignal Proc. Mag. 10(2), 8–39.

68. G. Auda, A. El-Dessouki and A. Darwish 1993,“A comparative study of some supervised neuralnetworks,” The Egyptian Computer Journal 21(1),97–104.

69. M. de Bollivier, P. Gallinari and S. Thiria 1991, “Co-operation of neural nets and task decomposition,” inInt. Joint Conf. Neural Networks, Vol. 2 (Seattle,USA), pp. 573–576.

70. K. Hornik, M. Stinchcombe and H. White 1989,“Multilayer feedforward networks are universal ap-proximators,” Neural Networks 2, 359–366.

71. E. Baum and D. Haussler 1989, “What size net givesvalid generalization?” Neural Computation 1(1),151–160.

72. S. Holden 1994, “How practical are vc dimensionbounds?” in World Cong. Comput. Intell., Vol. 1(Florida, USA), pp. 327–332.

73. V. Vapnik and A. Chervonenkis 1971, “On the uni-form convergence of relative frequencies of eventsto their prababilities,” Theory of Probability and itsApplications 16, 264–280.

74. Y. Abu-Mostafa 1989, “The vapnik-chervonenkis di-mension: Information versus complexity in learning,”Neural Computation 1, pp. 312–317.

75. T. Rognvaldsson 1993, “Pattern discrimination usingfeedforward networks: A benchmark study of scalingbehavior,” Neural Computation 5, 483–491.

76. P. Bartlett 1993, “Vapnik-chervonenkis dimensionbounds for two- and three-layer networks,” NeuralComputation 5, 371–373.

77. S. Ahmad 1988, “A study of scaling and gen-eralization in neural networks,” Technical ReportUIUCDCS-R-88-1454, Department of ComputerScience.

78. Y. Bennani 1995, “A modular and hybrid connec-tionist system for speaker identification,” NeuralComputation 7, 791–798.

79. M. Smith 1993, Neural Networks for Statistical Mod-eling (Van Nostrand Reinhold).

80. G. Tesauro and B. Janssens 1988, “Scaling relation-ships in back-propagation learning,” Complex Sys-tems 2, 39–44.

81. Y. Le Cun, J. Denker and S. Solla 1990, “Op-timal brain damage,” in Adv. Neural Info. Proc.Syst., Vol. 2, ed. D. Touretzky (Morgan Kaufmann),pp. 598–605.

82. P. Werbos 1991, “Links between artificial neural net-works (ann) and statistical pattern recognition,” inArtificial Neural Networks and Statistical PatternRecognition: Old and New Connections (Elsevier Sci-ence Publishers), pp. 11–31.

83. F. Wang and Q. Zhang 1996, “An adaptive and fullysparse training approach for multilayer perceptrons,”in IEEE Int. Conf. Neural Networks (ICNN96),Vol. 3 (Washington, D.C., USA), pp. 102–107.


84. C. Mohan, R. Anand, K. Mehrotra and S. Ranka1995, “Efficient classification for multiclass problemsusing modular neural networks,” IEEE Trans. Neu-ral Networks 6(1), 117–124.

85. D. Zipser 1990, “Subgrouping reduces complexityand speeds up learning in recurrent networks,”in Adv. Neural Info. Proc. Syst., Vol. 2, ed.D. Touretzky (Morgan Kaufmann), pp. 638–641.

86. N. Gagarin, I. Flood and P. Albrecht 1994, “Comput-ing truck attributes with artificial neural networks,”J. Comput. Civil Engi. 8(2), 179–200.

87. S. Nakamura and H. Sawaki 1992, “Performancecomparison of neural network architectures,” Sys-tems and Computer in Japan 23(14), 72–83.

88. W. Baxt 1992, “Improving the accuracy of an artifi-cial neural network using multiple differently trainednetworks,” Neural Computation 4, 772–780.

89. P. Mundkur and U. Desai 1991, “Automatic tar-get recognition using image and network decompo-sition,” in 1991 IEEE Int. Joint Conf. Neural Net-works (Singapore), pp. 281–286.

90. A. Waibel 1989, “Modular construction of time-delay neural networks for speech recognition,” NeuralComputation 1, 39–46.

91. G. Poo and Y. Ou 1994, “A large phonemic time-delay neural network technique for all mandarin con-sonants recognition,” in Proc. 1994 IEEE Region10’s 9th Ann. Int. Conf. (USA), pp. 521–525.

92. P. Murphy and D. Aha 1994, “Uci repository of ma-chine learning databases,” Technical Report, Univer-sity of California, Irvine, USA.

93. S. Haykin 1994, Neural Networks: A ComprehensiveFoundation (IEEE Press).

94. T. Hrycej 1993, “A modular architecture for effi-cient learning,” in Int. Conf. Neural Networks, Vol. 1(USA), pp. 557–562.

95. D. Ballard 1987 “Modular learning in neural net-works,” in 6th Nat. Conf. Artificial Intell. (USA),pp. 279–284.

96. M. de Bollivier, P. Gallinari and S. Thiria 1990, “Co-operation of neural nets for robust classification,” inInt. Joint Conf. Neural Networks, Vol. 1 (San Diego,USA), pp. 113–120.

97. C. Chiang and H. Fu 1994, “A divide-and-conquermethodology for modular supervised neural networkdesign,” in World Cong. Comput. Intell., Vol. 1(Florida, USA), pp. 119–124.

98. G. Auda and M. Kamel 1997, “Modular neuralnetwork classifiers: A comparative study,” in 3rdInt. Conf. Neural Networks Appl. (NEURAP’97)(Marseilles, France), pp. 41–47.

99. C. Lursinsap and V. Coowanitwong 1994, “Analysisof input vector space for speeding up learning in feed-forward neural networks,” in World Cong. on NeuralNetworks, Vol. 3 (San Diego, USA), pp. 414–419.

100. H. Raafat and M. Rashwan 1993, “A tree structuredneural network,” in Int. Conf. Document Analysisand Recognition ICDAR93 (Japan), pp. 939–942.

101. T. Drabe and W. Bressgott 1997, “Task alloca-tion for multiple-network architectures,” in IEEEInt. Conf. Neural Networks (ICNN97), Vol. 1(Houston, Texas, USA), pp. 491–494.

102. Y. Nishikawa 1990, “Nn/i: A new neural networkwhich divides and learns environments,” in Int. JointConf. Neural Networks, Vol. 1 (USA), pp. 684–687.

103. H. Kita, H. Masataki and Y. Nishikawa 1991, “Im-proved version of a network for large-scale patternrecognition tasks,” in 1991 IEEE Int. Joint Conf.Neural Networks (Singapore), pp. 2080–2085.

104. E. Corwin, S. Greni, A. Logar and K. Whitehead1994, “A multi-stage neural network classifier,” inWorld Cong. Neural Networks, Vol. 3 (San Diego,USA), pp. 198–203.

105. G. Bartfai 1994, “Hierarchical clustering with artneural networks,” in World Cong. Comput. Intell.,Vol. 2 (Florida, USA), pp. 940–944.

106. N. Griffith 1994, “Using art2 and bp co-operativelyto classify musical sequences,” in Int. Conf. ArtificialNeural Networks ICANN94, Vol. 1, Italy.

107. L. Burke 1991, “Clustering characterization of adap-tive resonance,” Neural Networks 4, 485–491.

108. Y. Mori and K. Joe 1990, “A large-scale neuralnetwork which recognizes handwritten kanji charac-ters,” in Adv. Neural Info. Proc. Syst., Vol. 2, ed.D. Touretzky (Morgan Kaufmann), pp. 415–422.

109. J. Huang and A. Kuh 1993, “A neural network iso-lated word recognition system for moderate sizeddatabases,” in Int. Conf. Neural Networks, Vol. 1(USA), pp. 387–391.

110. A. Iwata, H. Kawajiri and N. Suzumura 1991, “Clas-sification of hand-written digits by a large scale neu-ral network combnet-ii,” in 1991 IEEE Int. JointConf. Neural Networks (Singapore), pp. 1021–1026.

111. M. Jordan and L. Xu 1993, “Convergence propertiesof the em approach to learning in mixture-of-expertsarchitectures,” Technical Report 9302, Departmentof Brain and Cognitive Science, MIT.

112. R. Battiti and A. Colla 1994, “Democracy in neu-ral nets: Voting schemes for classification,” NeuralNetworks 7(4), 691–707.

113. R. Shadafan and M. Niranjan 1994, “A dynamicneural network architecture by sequential partition-ing of the input space,” Neural Computation 6,1202–1222.

114. T. Salome and H. Bersini 1994, “An algorithmfor self-structuring neural net classifiers,” in Int.Conf. Neural Networks (ICNN94), Vol. 3 (San Diego,USA), pp. 1307–1311.

115. W. Jansen, M. Diepenhorst, J. Nijhuis andL. Spaanenburg 1997, “Assembling engineeringknowledge in a modular multi-layer perceptron neu-ral network,” in IEEE Int. Conf. Neural Net-works (ICNN97), Vol. 1 (Houston, Texas, USA),pp. 232–237.


116. L. Wang, N. Nasrabadi and S. Der 1997, “Asymp-totical analysis of a modular neural network,” inIEEE Int. Conf. Neural Networks (ICNN97), Vol. 2(Houston, Texas, USA), pp. 1019–1022.

117. S. Schaal and C. Atkeson 1995, “From isolation tocooperation: An alternative view of a system of ex-perts,” in Adv. Neural Info. Proc. Syst., Vol. 8, eds.M. Mozer, D. Touretzky and M. Hasselmo (MITPress).

118. V. Ramamurti and J. Gosh 1997, “Regularizationand error bars for the mixture of experts network,” inIEEE Int. Conf. Neural Networks (ICNN97), Vol. 1(Houston, Texas, USA), pp. 221–225.

119. V. Tresp and M. Taniguchi 1994, “Combining es-timators using non-constant weighting functions,”in Neural Info. Proc. Syst. – Nat. Synth. (Denver,USA), pp. 419–426.

120. K. Chen and H. Chi 1997, “A modified mixturesof experts architecture for classification with di-verse features,” in IEEE Int. Conf. Neural Net-works (ICNN97), Vol. 1 (Houston, Texas, USA),pp. 215–220.

121. L. Xu, M. Jordan and J. Hinton 1994, “An al-ternative model for mixtures of experts,” in Neu-ral Info. Proc. Syst. – Nat. Synth. (Denver, USA),pp. 633–640.

122. S. Medasani and R. Krishnapuram 1997, “De-termination of the number of components inGaussian mixtures using agglomerative clustering,”in IEEE Int. Conf. Neural Networks (ICNN97),Vol. 3 (Houston, Texas, USA), pp. 1412–1417.

123. B. Lu, H. Kita and Y. Nishikawa 1994, “A multi-sieving neural network architecture that decomposeslearning tasks automatically,” in World Cong. Com-put. Intell., Vol. 3 (Florida, USA), pp. 1319–1324.

124. K. Mohiuddin, J. Mao and T. Fujisaka 1995, “Atwo-stage multi-network ocr system with a soft pre-classifier and a network selector,” in Int. Conf. Docu-ment Analysis and Recognition ICDAR95 (Montreal,Canada), pp. 78–81.

125. R. Venkateswaran and Z. Obradovic 1994, “Efficientlearning through cooperation,” in World Cong. Neu-ral Networks, Vol. 3 (San Diego, USA), pp. 390–395.

126. N. Ueda and R. Nakano 1996, “Generalization errorof ensemble estimators,” in IEEE Int. Conf. Neu-ral Networks (ICNN96), Vol. 3 (Washington, D.C.,USA), pp. 90–95.

127. L. Hansen and P. Salamon 1990, “Neural networkensembles,” IEEE Transactions on Pattern Analysisand Machine Intelligence 12(10), 993–1003.

128. S. Mukherjee and T. Fine 1996, “Ensemble prun-ing algorithms for accelerated training,” in IEEEInt. Conf. Neural Networks (ICNN96), Vol. 3(Washington, D.C., USA), pp. 96–101.

129. J. Ahn, J. Kim and S. Cho 1995, “Ensemblecompetitive learning neural networks with reducedinput dimension,” Int. J. Neural Systems 6(2),132–142.

130. S. Hashem 1997, “Algorithms for optimal linear com-binations of neural networks,” in IEEE Int. Conf.Neural Networks (ICNN97), Vol. 1 (Houston, Texas,USA), pp. 242–247.

131. M. Sonmez, M. Sun, C. Li and R. Sclabassi 1997,“A hierarchical decision module based on multipleneural networks,” in IEEE Int. Conf. Neural Net-works (ICNN97), Vol. 1 (Houston, Texas, USA),pp. 238–241.

132. A. Krogh and J. Vedelsby 1994, “Neural networkensembles, cross validation and active learning,” inNeural Info. Proc. Syst. – Nat. Synth. (Denver,USA), pp. 231–238.

133. H. Elsherif and M. Hambaba 1994, “On modify-ing the weights in a modular recurrent connectionistsystem,” in World Cong. Comput. Intell. (Florida,USA), pp. 535–539.

134. H. Elsherif and M. Hambaba 1993, “A modular neu-ral network architecture for pattern classification,”in IEEE Workshop Neural Networks Signal Proc.(Baltimore, USA), pp. 232–239.

135. S. Zein-Sabatto, W. Hwang and D. Marpaka 1994,“Neural networks sharing knowledge and experi-ence,” in World Cong. Neural Networks, Vol. 3 (SanDiego, USA), pp. 613–618.

136. K. Rohani and M. Manry 1991, “The design of multi-layer perceptrons using building blocks,” in Int.Joint Conf. Neural Networks, Vol. 2 (Seattle, USA),pp. 497–501.

137. A. Maren and R. Pap 1994, “The cooperative-competitive network: A neural network method formaking assignments,” in World Cong. Neural Net-works, Vol. 1 (San Diego, USA), pp. 456–461.

138. G. Rogova 1994, “Combining the results of severalneural network classifiers,” Neural Networks 7(5),777–781.

139. H. Normi 1987, Comparing Voting Systems(D. Reidel Publishing Company).

140. P. Watta, M. Akkal and M. Hassoun 1997,“Decoupled-voting hamming associative memorynetworks,” in IEEE Int. Conf. Neural Net-works (ICNN97), Vol. 2 (Houston, Texas, USA),pp. 1188–1193.

141. E. Alpaydin 1993, “Multiple networks for functionlearning,” in Int. Conf. Neural networks, Vol. 1(California, USA), pp. 9–14.

142. G. Auda 1996, “Cooperative modular neural networkclassifiers,” PhD thesis, University of Waterloo, Sys-tems Design Engineering Department, Canada.

143. G. Auda, M. Kamel and H. Raafat 1996, “Modu-lar neural network architectures for classification,”in Int. Conf. Neural Networks (ICNN96), Vol. 2(Washington D.C., USA), pp 1279–1284.

144. S. Cho 1995, “A neuro-fuzzy architecture for highperformance classification,” in Adv. Fuzzy LogicNeural Networks Genetic Algorithms. IEEE/NagoyaUniv. World Wisepersons Workshop (Japan),pp. 66–71.


145. D. Lee and S. Srihari 1995, “A theory of classifiercombination: The neural network approach,” in Int.Conf. Document Analysis and Recognition ICDAR95(Montreal, Canada), pp. 42–46.

146. C. Ornes and J. Sklansky 1997, “A visual multi-expert neural classifier,” in IEEE Int. Conf. NeuralNetworks (ICNN97), Vol. 3 (Houston, Texas, USA),pp. 1448–1453.

147. I. Otto, E. Guigon and Y. Burnod 1990, “Coopera-tion between temporal and parietal networks for in-variant recognition,” in Proc. Int. Neural NetworkConf., Vol. 1 (Paris, France), pp. 480–483.

148. D. Jimenez 1997, “Locating anatomical land-marks for prosthetics design using ensemble neu-ral networks,” in IEEE Int. Conf. Neural Net-works (ICNN97), Vol. 1 (Houston, Texas, USA),pp. 81–85.

149. V. Petridis and A. Kehagias 1996, “A recurrent net-work implementation of time series classification,”Neural Computation 8, 357–372.

150. P. Blonda, V. la Forgia, G. Pasquariello andG. Stalino 1994, “A modular neural system for theclassification of remote sensing images,” in Int. Conf.Artificial Neural Networks ICANN94, Vol. 2 (Italy),pp. 1173–1176.

151. L. Devillers and C. Dugast 1994, “Hybrid sys-tem combining expert-tdnns and hmms for contin-uous speech recognition,” in Int. Conf. AcousticsSpeech Signal Proc. ICASSP94, Vol. 1 (Australia),pp. 165–168.

152. T. Lefebvre, J. Nicolas and P. Degoul 1990, “Nu-merical to symbolic conversion for acoustic signalclassification using a two-stage neural architecture,”in Proc. Int. Neural Network Conf., Vol. 1 (Paris,France), pp. 119–122.

153. E. Allen, M. Menon and P. Dicaprio 1990, “A mod-ular architecture for object recognition using neuralnetworks,” in Proc. Int. Neural Network Conf., Vol. 1(Paris, France), pp. 35–37.

154. J. Lampinen and E. Oja 1995, “Distortion tolerantpattern recognition based on self-organizing featureextraction,” IEEE Trans. on Neural Networks 6(3),539–547.

155. J. Diaz-Verdejo, J. Segura-Luna, P. Garcia-Teodoroand A. Rubio-Ayuso. Slhmm 1994, “A continu-ousspeech recognition system based on alphanet-hmm,” in Int. Conf. Acoustics Speech Signal Proc.ICASSP94, Vol. 1 (Australia), pp. 213–216.

156. Y. Zhao, R. Schwartz, J. Sroka and J. Makhoul1994, “Hierarchical mixtures of experts methodologyapplied to continuous speech recognition,” in Neu-ral Info. Proc. Syst. – Nat. Synth. (Denver, USA),pp. 859–865.

157. A. de Carvalho, M. Fairhurst and D. Baisset 1994,“A modular boolean architecture for pattern recog-nition,” in Int. Conf. Neural Networks (ICNN94),Vol. 7 (San Diego, USA), pp. 4349–4352.

158. E. Wan 1997, “Combining fossil and sunspot data:Committee predictions,” in IEEE Int. Conf. NeuralNetworks (ICNN97), Vol. 4 (Houston, Texas, USA),pp. 2176–2180.

159. R. Schwaerzel and B. Rosen 1997, “Improving theaccuracy of financial time series prediction usingensemble networks and high order statistics,” inIEEE Int. Conf. Neural Networks (ICNN97), Vol. 4(Houston, Texas, USA), pp. 2045–2050.

160. S. Cho, Y. Cho and S. Yoon 1997, “Reliable rollforce prediction in cold mill using multiple neu-ral networks,” IEEE Trans. Neural Networks 8(4),874–882.

161. C. Tham 1996, “A hierarchical cmac architec-ture for context dependent function approximation,”in Int. Conf. Neural Networks (ICNN96), Vol. 1(Washington D.C., USA), pp. 629–634.

162. H. Lee, K. Mehrotra, C. Mohan and S. Ranka1994, “An incremental network construction algo-rithm for approximating discontinuous functions,” inInt. Conf. Neural Networks (ICNN94), Vol. 4 (SanDiego, USA), pp. 2191–2196.

163. M. Moussa and M. Kamel 1994, “Efficient learn-ing of generic grasping functions using a set of lo-cal experts,” in World Cong. Comput. Intell., Vol. 2(Florida, USA), pp. 183–190.

164. S. Fels and J. Hinton 1994, “Glove-talkii: Mappinghand gestures to speech using neural networks,”in Neural Info. Proc. Syst. – Nat. Synth. (Denver,USA), pp. 843–850.

165. A. Nelson and E. Wan 1997, “Neural speech en-hancement using dual extended kalman filtering,” inIEEE Int. Conf. Neural Networks (ICNN97), Vol. 4(Houston, Texas, USA), pp. 2170–2175.

166. S. Suzuki and H. Ando 1994, “Unsupervised clas-sification of 3d objects from 2d views,” in Neu-ral Info. Proc. Syst. – Nat. Synth. (Denver, USA),pp. 1340–1345.

167. B. Kosko 1989, “Unsupervised learning in noise,” inInt. Joint Conf. Neural Networks IJCNN-89, Vol. 1(Washington D.C., USA), pp. 7–17.

168. S. Ozawa, K. Tsutsumi and N. Baba 1994, “Theestimation of cross-coupled hopfield nets as aninteractive modular neural network,” in Int. Conf.Neural Networks (ICNN94), Vol. 2 (San Diego,USA), pp. 1340–1345.

169. R. Khosla and T. Dillon 1997, “Intelligent hy-brid multi-agent architecture for engineering com-plex systems,” in IEEE Int. Conf. Neural Net-works (ICNN97), Vol. 4 (Houston, Texas, USA),pp. 2449–2454.

170. A. Driss, V. Rodrigues and P. Cohen 1994, “Park-ing a vision-guided automobile vehicle,” in 3rdIEEE Conf. Control Appl., Vol. 1 (Glasgow, UK),pp. 59–64.

171. F. Zanella and M. Corvi 1994, “A neural reflexive


navigation system,” in Int. Conf. Artificial NeuralNetworks ICANN94, Vol. 2 (Italy), pp. 1290–1294.

172. M. Copeland 1995, “Modular network control forrobot manipulator,” in Proc. IEEE Southeastcon’95.Visualize the Future, Vol. 1 (NC, USA), pp. 183–187.

173. F. Casilari, A. Jurado, G. Pansard, A. Diaz-Estrella

and F. Sandoval 1996, “Modelling aggregate hetero-geneous atm sources using neural networks,” Elec-tronics Letters 32(4), 363–365.

174. P. Tino and J. Sajda 1995, “Leraning and extract-ing initial mealy automata with a modular neuralnetwork model,” Neural Computation 7, 822–844.

MODULAR NEURAL NETWORKS: A SURVEYkovan.ceng.metu.edu.tr/...Neural_Networks_A_Survey.pdf · Modular Neural Networks (MNNs) is a rapidly growing eld in arti cial Neural Networks (NNs)

Documents