ERROR BACKPROPAGATION ALGORITHM Why Error Back Propagation Algorithm is required? Lack of suitable training methods formultilayer perceptrons (MLP)s led to a waning ofinterest in NN in 1960s and 1970s. This was changed by the reformulation of the backPropagation training method for MLPs in the mid-1980s by Rumelhart et al. Backpropagation was created by generalizing the Widrow-Hoff learning rule to multiple- layer networks and nonlinear differentiable transfer functions. Standardbackpropagation is a gradient descent algorithm, as is the Widrow-Hoff learning rule, in which the network weights are moved along the negative of the gradient of the performance function. The term backpropagation refers to the manner in which the gradient is computed for nonlinear multilayer networks. As in simple cases of the delta learning rule training studied before, input patterns are submitted during the back-propagation training sequentially. If a pattern is submittedand its classification or association is determined to be erroneous, the synaptic weights as well as the thresholds are adjusted so that the current least mean square classification error is reduced. The input l output mapping, comparison of target and actual values, and adjustment, if needed, continue until all mapping examples from the training set are learned within an acceptable overall error. Usually, mapping error is cumulative andcomputed over the full training set. During the association or classification phase, the trained neural network itself operates in a feedforward manner. However, the weight adjustments enforced by the learning rules propagate exactly backward from the output layer through the so-called "hidden layers" toward the input layer.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Lack of suitable training methods for multilayer perceptrons (MLP)s led to a waning of
interest in NN in 1960s and 1970s. This was changed by the reformulation of the
backPropagation training method for MLPs in the mid-1980s by Rumelhart et al.
Backpropagation was created by generalizing the Widrow-Hoff learning rule to multiple-layer networks and nonlinear differentiable transfer functions. Standard
backpropagation is a gradient descent algorithm, as is the Widrow-Hoff learning rule, in
which the network weights are moved along the negative of the gradient of the
performance function. The term backpropagation refers to the manner in which the
gradient is computed for nonlinear multilayer networks.
As in simple cases of the delta learning rule training studied before, input patterns are
submitted during the back-propagation training sequentially. If a pattern is submitted
and its classification or association is determined to be erroneous, the synaptic weights
as well as the thresholds are adjusted so that the current least mean square classification
error is reduced. The input l output mapping, comparison of target and actual values,and adjustment, if needed, continue until all mapping examples from the training set are
learned within an acceptable overall error. Usually, mapping error is cumulative and
computed over the full training set.
During the association or classification phase, the trained neural network itself operates
in a feedforward manner. However, the weight adjustments enforced by the learning
rules propagate exactly backward from the output layer through the so-called "hidden
The weight adjustment formula of Eqn 3 can accordingly be rewritten as
Eqn 10 represents the general formula for delta training/learning weight adjustments for a single-layer network. It also follows that the adjustments of weight wkj is proportional
to the input activation yj, and to the error signal value at the kth neuron’s output.
The delta value needs to be explicitly computed for specifically chosen activation
There are two modes of updation of weights1. Batch mode
2. Incremental mode
When the weights are being changed immediately after a training pattern is presented
then it is called as incremental approach.
When the weights are changed only after all the training patterns are presented then it is
called as batch mode. This mode requires additional local storage for each connection to
maintain the immediate weight changes.
The BP learning algorithm is an example of optimization problem. [Note:- an
optimization problem is the problem of finding the best solution from all feasiblesolutions]. The essence of the error back-propagation algorithm is the evaluation of the
contribution of each particular weight to the output error. There are many difficulties
that arise in the implementation of the algorithm. One of the problems is that the error
minimization procedure may produce only a local minimum of the error function.
where t and t-1 represents the current and most recent training step respectively and
a is user-selected positive momentum constant. This second term is called as
momentum term. For N steps using momentum method, the current weight is
expressed as
Typically a is choosen between 0.1 and 0.8.
What is the significance of this momentum term?
From the above figure it is seen that in the case of A’and A”the signs are same. So
combining the gradient component of adjacent step would result in convergencespeed-up. But in the case of B’ and B” the signs are different. This shows that if the
gradient component changes sign in two consecutive iterations, the learning rate
along this axis should be decreased.
This indicates that the momentum term typically helps to speed up convergence and
to achieve an efficient and more reliable learning profile.
Momentum term technique can be recommended for problems where convergence
occur too slowly or for cases when learning is difficult to achieve.