Negnevitsky, Pearson Education, 2002 Negnevitsky, Pearson Education, 2002 Chapter 6 Chapter 6 Artificial neural Artificial neural networks: networks: Introduction, or how the brain Introduction, or how the brain works works The neuron as a simple computing The neuron as a simple computing element element The perceptron The perceptron Multilayer neural networks Multilayer neural networks
49
Embed
Negnevitsky, Pearson Education, 2002 1 Chapter 6 Artificial neural networks: n Introduction, or how the brain works n The neuron as a simple computing.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Chapter 6 Chapter 6 Artificial neural networks:Artificial neural networks: Introduction, or how the brain worksIntroduction, or how the brain works The neuron as a simple computing elementThe neuron as a simple computing element The perceptronThe perceptron Multilayer neural networksMultilayer neural networks
A A neural networkneural network is a model of reasoning inspired by the is a model of reasoning inspired by the human brain. human brain.
The brain consists of a densely interconnected set of nerve The brain consists of a densely interconnected set of nerve cells, or basic information-processing units, called cells, or basic information-processing units, called neuronsneurons. .
The human brain incorporates nearly 10 billion neurons and The human brain incorporates nearly 10 billion neurons and 60 trillion connections, 60 trillion connections, synapsessynapses, between them. , between them.
By using multiple neurons simultaneously, the brain can By using multiple neurons simultaneously, the brain can perform its functions much faster than the fastest computers in perform its functions much faster than the fastest computers in existence today.existence today.
Each neuron has a very simple structure, but an army of such Each neuron has a very simple structure, but an army of such elements constitutes a tremendous processing power. elements constitutes a tremendous processing power.
A neuron consists of a cell body, A neuron consists of a cell body, somasoma, a number of fibers , a number of fibers called called dendritesdendrites, and a single long fiber called the , and a single long fiber called the axonaxon..
Neural Networks and the BrainNeural Networks and the Brain
The neuron computes the weighted sum of the input signals The neuron computes the weighted sum of the input signals and compares the result with a and compares the result with a threshold valuethreshold value, , . .
If the net input is less than the threshold, the neuron output If the net input is less than the threshold, the neuron output is –1. is –1.
if the net input is greater than or equal to the threshold, the if the net input is greater than or equal to the threshold, the neuron becomes activated and its output is +1.neuron becomes activated and its output is +1.
The neuron uses the following transfer or The neuron uses the following transfer or activationactivation functionfunction::
This type of activation function is called a This type of activation function is called a sign functionsign function. . (McCulloch and Pitts 1943)(McCulloch and Pitts 1943)
Can a single neuron learn a task?Can a single neuron learn a task?
Start off with earliest/ simplestStart off with earliest/ simplest In 1958, In 1958, Frank RosenblattFrank Rosenblatt introduced a training introduced a training
algorithm that provided the first procedure for algorithm that provided the first procedure for training a simple ANN: a training a simple ANN: a perceptronperceptron. .
The perceptron is the simplest form of a neural The perceptron is the simplest form of a neural network. It consists of a single neuron with network. It consists of a single neuron with adjustableadjustable synaptic weights and a synaptic weights and a hard limiterhard limiter. .
The operation of Rosenblatt’s perceptron is based The operation of Rosenblatt’s perceptron is based on the on the McCulloch and Pitts neuron modelMcCulloch and Pitts neuron model. The . The model consists of a linear combiner followed by a model consists of a linear combiner followed by a hard limiter. hard limiter.
The weighted sum of the inputs is applied to the The weighted sum of the inputs is applied to the hard limiter, which produces an output equal to +1 hard limiter, which produces an output equal to +1 if its input is positive and if its input is positive and 1 if it is negative. 1 if it is negative.
The aim of the perceptron is to classify inputs, The aim of the perceptron is to classify inputs,
xx11, , xx22, . . ., , . . ., xxnn, into one of two classes, say , into one of two classes, say
AA11 and and AA22. .
In the case of an elementary perceptron, the n-In the case of an elementary perceptron, the n-dimensional space is divided by a dimensional space is divided by a hyperplanehyperplane into into two decision regions. The hyperplane is defined by two decision regions. The hyperplane is defined by the the linearly separablelinearly separable function function::
making small adjustments in the weights making small adjustments in the weights to reduce the difference between the actual and to reduce the difference between the actual and
desired outputs of the perceptron. desired outputs of the perceptron. Learns weights such that Learns weights such that output is consistent with output is consistent with
the training examples.the training examples. The initial weights are randomly assigned, The initial weights are randomly assigned,
usually in the range [usually in the range [0.5, 0.5], 0.5, 0.5],
How does the perceptron learn its classification How does the perceptron learn its classification tasks?tasks?
If at iteration If at iteration pp, the actual output is , the actual output is YY((pp) and the ) and the desired output is desired output is YYd d ((pp), then the error is given by:), then the error is given by:
where where pp = 1, 2, 3, . . . = 1, 2, 3, . . .
Iteration Iteration pp here refers to the here refers to the ppth training example th training example presented to the perceptron.presented to the perceptron.
If the error, If the error, ee((pp), is positive, we need to increase ), is positive, we need to increase perceptron output perceptron output YY((pp), but if it is negative, we ), but if it is negative, we need to decrease need to decrease YY((pp).).
The perceptron learning ruleThe perceptron learning rule
where where pp is iteration # = 1, 2, 3, . . . is iteration # = 1, 2, 3, . . . is the is the learning ratelearning rate, a positive constant less than unity (1)., a positive constant less than unity (1). Intuition:Intuition:
Weight at next iteration is based on an adjustment from the current Weight at next iteration is based on an adjustment from the current weightweight
Adjustment amount is influenced by the amount of the error, the Adjustment amount is influenced by the amount of the error, the size of the input, and the learning ratesize of the input, and the learning rate
Learning rate is a free parameter that must be “Learning rate is a free parameter that must be “tunedtuned”” The perceptron learning rule was first proposed by The perceptron learning rule was first proposed by Rosenblatt Rosenblatt in in
1960. 1960. Using this rule we can derive the perceptron training algorithm for Using this rule we can derive the perceptron training algorithm for
Step 1Step 1: Initialisation: InitialisationSet initial weights Set initial weights ww11, , ww22,…, ,…, wwnn and threshold and threshold to to
random numbers in the range [random numbers in the range [0.5, 0.5]. 0.5, 0.5].
(during training, If the error, (during training, If the error, ee((pp), is positive, we ), is positive, we need to increase perceptron output need to increase perceptron output YY((pp), but if it is ), but if it is negative, we need to decrease negative, we need to decrease YY((pp).)).)
Perceptron’s training algorithmPerceptron’s training algorithm
Step 2Step 2: Activation: ActivationActivate the perceptron by applying inputs Activate the perceptron by applying inputs xx11((pp), ),
xx22((pp),…, ),…, xxnn((pp) and desired output ) and desired output YYd d ((pp). ).
Calculate the actual output at iteration Calculate the actual output at iteration pp = 1 = 1
where where nn is the number of the perceptron inputs, is the number of the perceptron inputs, and and stepstep is a step activation function. is a step activation function.
Perceptron’s training algorithm (continued)Perceptron’s training algorithm (continued)
Step 3Step 3: Weight training: Weight trainingUpdate the weights of the perceptronUpdate the weights of the perceptron
where where ΔΔ wwii ( (pp) ) is the weight correction for weight is the weight correction for weight ii
at iteration at iteration pp..
The weight correction is computed by the The weight correction is computed by the delta ruledelta rule::
Step 4Step 4: Iteration: IterationIncrease iteration Increase iteration pp by one, go back to by one, go back to Step 2Step 2 and and repeat the process until convergence.repeat the process until convergence.
)()()1( pwpwpw iii
Perceptron’s training algorithm (continued)Perceptron’s training algorithm (continued)
Two-dimensional plots of basic logical operationsTwo-dimensional plots of basic logical operations
x1
x2
1
(a) AND (x1 x2)
1
x1
x2
1
1
(b) OR (x1 x2)
x1
x2
1
1
(c) Exclusive-OR(x1 x2)
00 0
A perceptron can learn the operations A perceptron can learn the operations ANDAND and and OROR, but not , but not Exclusive-ORExclusive-OR. .
Exclusive-OR Exclusive-OR is NOT linearly separableis NOT linearly separable This limitation stalled neural network research for more This limitation stalled neural network research for more
A multilayer perceptron is a feedforward neural A multilayer perceptron is a feedforward neural network with one or more hidden layers. network with one or more hidden layers.
The network consists of an The network consists of an input layerinput layer of source of source neurons, at least one middle or neurons, at least one middle or hidden layerhidden layer of of computational neurons, and an computational neurons, and an output layeroutput layer of of computational neurons. computational neurons.
The input signals are propagated in a forward The input signals are propagated in a forward direction on a layer-by-layer basis.direction on a layer-by-layer basis.
Most popular of 100+ ANN learning algorithmsMost popular of 100+ ANN learning algorithms Learning in a multilayer network proceeds the same Learning in a multilayer network proceeds the same
way as for a perceptron. way as for a perceptron. A training set of input patterns is presented to the A training set of input patterns is presented to the
network. network. The network computes its output pattern, and if there The network computes its output pattern, and if there
is an error is an error or in other words a difference between or in other words a difference between actual and desired output patterns actual and desired output patterns the weights are the weights are adjusted to reduce this error.adjusted to reduce this error.
The difference is in the number of weights and The difference is in the number of weights and architecture …architecture …
In a back-propagation neural network, the learning algorithm has In a back-propagation neural network, the learning algorithm has two phases: two phases: a training input pattern is presented to the network input a training input pattern is presented to the network input
layer. layer. The network propagates the input pattern from layer to The network propagates the input pattern from layer to
layer until the output pattern is generated by the output layer until the output pattern is generated by the output layer. layer.
Activation function generally sigmoid Activation function generally sigmoid If this pattern is different from the desired output, an error is If this pattern is different from the desired output, an error is
calculated and calculated and then propagated backwards through the then propagated backwards through the network from the output layer to the input layer.network from the output layer to the input layer. The weights The weights are modified as the error is propagated.are modified as the error is propagated.
See next slide for picture …See next slide for picture …
Step 1Step 1: Initialisation: InitialisationSet all the weights and threshold levels of the Set all the weights and threshold levels of the network to random numbers uniformly network to random numbers uniformly distributed inside a small range:distributed inside a small range:
where where FFii is the total number of inputs of neuron is the total number of inputs of neuron ii
in the network. The weight initialisation is done in the network. The weight initialisation is done on a neuron-by-neuron basis.on a neuron-by-neuron basis.
The back-propagation training algorithmThe back-propagation training algorithm
Step 2Step 2: Activation: ActivationActivate the back-propagation neural network by Activate the back-propagation neural network by applying inputs applying inputs xx11((pp), ), xx22((pp),…, ),…, xxnn((pp) and desired ) and desired
((aa) Calculate the actual outputs of the neurons in ) Calculate the actual outputs of the neurons in the hidden layer:the hidden layer:
where where nn is the number of inputs of neuron is the number of inputs of neuron jj in the in the hidden layer, and hidden layer, and sigmoidsigmoid is the is the sigmoidsigmoid activation activation function.function.
Step 3Step 3: Weight training: Weight trainingUpdate the weights in the back-propagation network Update the weights in the back-propagation network propagating backward the errors associated with output propagating backward the errors associated with output neurons.neurons.((aa) Calculate the error gradient for the neurons in the ) Calculate the error gradient for the neurons in the output layer:output layer:
wherewhere (error at output unit k)(error at output unit k)
Calculate the weight corrections:Calculate the weight corrections:
(weight change for j to k link)Update the weights at the output neurons:Update the weights at the output neurons:
Step 4Step 4: Iteration: IterationIncrease iteration Increase iteration pp by one, go back to by one, go back to Step 2Step 2 and and repeat the process until the selected error criterion repeat the process until the selected error criterion is satisfied.is satisfied.
The effect of the threshold applied to a neuron in the The effect of the threshold applied to a neuron in the hidden or output layer is represented by its weight, hidden or output layer is represented by its weight, , , connected to a fixed input equal to connected to a fixed input equal to 1.1.
The initial weights and threshold levels are set The initial weights and threshold levels are set randomly as follows:randomly as follows:ww1313 = 0.5, = 0.5, ww1414 = 0.9, = 0.9, ww2323 = 0.4, = 0.4, ww2424 = 1.0, = 1.0, ww3535 = = 1.2, 1.2,
Network represented by McCulloch-Pitts model Network represented by McCulloch-Pitts model for solving the for solving the Exclusive-ORExclusive-OR operation operation
Accelerated learning in multilayer Accelerated learning in multilayer neural networksneural networks
A multilayer network learns much faster when the A multilayer network learns much faster when the sigmoidal activation function is represented by a sigmoidal activation function is represented by a hyperbolic tangenthyperbolic tangent::
where where aa and and bb are constants. are constants.
Suitable values for Suitable values for aa and and bb are: are: aa = 1.716 and = 1.716 and bb = 0.667 = 0.667
We also can accelerate training by including a We also can accelerate training by including a momentum termmomentum term in the delta rule: in the delta rule:
where where is a positive number (0 is a positive number (0 1) called the 1) called the momentum constantmomentum constant. Typically, the momentum . Typically, the momentum constant is set to 0.95.constant is set to 0.95.
This iteration’s change in weight is influenced by This iteration’s change in weight is influenced by last iteration’s change in weight !!!last iteration’s change in weight !!!
This equation is called the This equation is called the generalised delta rulegeneralised delta rule..
)()()1()( ppypwpw kjjkjk
Accelerated learning in multilayer neural networksAccelerated learning in multilayer neural networks
Learning with adaptive learning rateLearning with adaptive learning rate
To accelerate the convergence and yet avoid the To accelerate the convergence and yet avoid the
danger of instability, we can apply two heuristics:danger of instability, we can apply two heuristics:
Heuristic 1Heuristic 1If the change of the sum of squared errors has the same If the change of the sum of squared errors has the same algebraic sign for several consequent epochs, then the algebraic sign for several consequent epochs, then the learning rate parameter, learning rate parameter, , should be increased., should be increased.
Heuristic 2Heuristic 2If the algebraic sign of the change of the sum of If the algebraic sign of the change of the sum of squared errors alternates for several consequent squared errors alternates for several consequent epochs, then the learning rate parameter, epochs, then the learning rate parameter, , should be , should be decreased.decreased.
If the sum of squared errors at the current epoch If the sum of squared errors at the current epoch exceeds the previous value by more than a exceeds the previous value by more than a predefined ratio (typically 1.04), the learning rate predefined ratio (typically 1.04), the learning rate parameter is decreased (typically by multiplying parameter is decreased (typically by multiplying by 0.7) and new weights and thresholds are by 0.7) and new weights and thresholds are calculated. calculated.
If the error is less than the previous one, the If the error is less than the previous one, the learning rate is increased (typically by multiplying learning rate is increased (typically by multiplying by 1.05).by 1.05).
Learning with adaptive learning rate (con)Learning with adaptive learning rate (con)