Artificial Neural Network NeuroSolutions Dong C. Park
Dec 03, 2015
Artificial Neural Network
NeuroSolutionsDong C. Park
Contents
1. Introduction1.1 What’s an Artificial Neural Network?
1.1.1 Basic computational elements in brain1.1.2 Basic computational elements in ANN1.1.3 Why ANNs?1.1.4 Historical Perspective1.1.5 Implementation of ANN1.1.6 Application of ANN
Contents
2. Early Artificial Neural Networks2.1 Perceptron
2.1.1 Operation2.1.2 Perceptron Training2.1.3 EX-OR Problem2.1.4 Linear Separability
2.2 Perceptron Training2.3 The Delta Rule
Contents
3. Multi-Layer Perceptron Type ANN3.1 Structure of Multi-Layered Perceptron Type ANN3.2 Error Back Propagation Algorithm3.3 ANN Training with EBP3.4 Training and Testing EBP3.5 Problems in Using EBP
3.5.1 Convergence3.5.2 Local Minima Problem3.5.3 Non-stationary Data3.5.4 Generalization Vs. Memorization Issue
Contents
4. Kohonen’s Self-Organizing Neural Networks4.1 Basic Structure
4.2 Operation and Training
Introduction
What’s an Artificial Neural Network?
• A loose definition : highly interconnected arrays of elementary processors(neuron).
• Resembles the functional architecture of the brain.
Introduction
Basic computational elements in brain
• Neuron Basic unit of brain structure Functions: signal amplication and processing• Axon Electrochemical signals are transmitted from one neuron to an other through axons.• Synapse Junction between neurons’ axons. Strength of synapse controls the amount of transferred signal from signal-sending neuron to the target neuron across the synapse.
Introduction
Introduction
Basic computational elements in ANN
• Neuron Integration and processing : Integrates all incoming signal and applies the integrated signal to a certain transfer function to produce output signal according to the specific transfer function• Weight Like axon and synapse : Manages the strength of the propagated signal through the weight.
Identification of the optimal weights for a given problem is a subject of the training of an artificial neural network.
Introduction
Why ANNs?
• Massively parallel processing Good for real-time application• Self-trainable: learns by itself Needs training data• Fault tolerant by distributed information Partially damaged recoverable• Generalization capability Once trained, a network’s response can be insensitive to minor
variation in its input. In pattern recognition, however, noise or distortion to the pattern should be rejected by pre-processing.
Introduction
• Historical Perspective
• D. Hebb (1949): Hebbian learning – starting point for ANN training.• 1950s and 1960s: Biological and psychological insights for the first AN
N.• Minsky, Rosenblatt, and Widrow: developed perceptron. Applied t
o weather prediction, electrocardiogram, and artificial vision• In 1969, Minsky found problems with the perceptron. Perceptron
can not solve EX-OR problem.• They predicted that a possibility to overcome this problem in the fut
ure.: an “learning algorithm” for the multi-layered machine will be found.
Introduction
Historical Perspective
• 1970s and Mid 1980’s: Dark age of ANN research. Teuvo Kohonen, Stephen Grossberg, and James Anderson
• 1986: Rumerhart, Hinton, and Williams invented a training algorithm for multi-layered networks, called Error Backpropagation Algorithm overcomes the limitations presented by Minsky in 1969.
(Webros(1964), Parker(1982), LeCun(1985))
Introduction
Implementation of ANN
• Computer simulation with accelerators
• VLSI implementation : Adaptive Solutions
• Optical computer approach
Introduction
Application of ANN Most of pattern classification and regression areas.
• Speech synthesis and recognition: text to speech translation• Image processing and analysis: hand-written character recognition• Power system: Electric load forecasting• Medical engineering: diagnosis• Adaptive control: robotics• Communication system: equalizer design• Optimization task: communication routing• Financial market prediction: stock price prediction / FDS
Early Artificial Neural Network
Perceptron (1)• operation
Figure 1.1: A Schematic Diagram of Neuron in Perceptron
A schemic description of a typical perceptron is shown in Fig. 1.1. With a given input signal, neuron first integrates all the incoming signals multiplied with corresponding weights as follows:
Early Artificial Neural Network
Perceptron (2) The neuron then pass the integrated signal to a given nonlinearity.
If we assume a simple step function with a threshold as thenonlinearity, then the output can be calculated as:
Of course, the output will be binary in this case.
Early Artificial Neural Network
Perceptron Training (1)Rosenblatt states in his book, Principles of Neurodynamics,Washington D.C.: Spartan Books(1962), that
“A Perceptron could learn anything it could represent.”
Here, the concepts of representation and learning can be summarized as:
• Representation : the ability of a network to simulate a specified function(concept)• Learning : existence of a systematic procedure for adjusting the network weights to produce that function.
Early Artificial Neural Network
Perceptron Training (2)
The following example can show ideas of the above concepts.
Example : Assume that we have a machine that responds corresponding to the face of a dice shown to the machine. The machine shows a sign of one if the face of a dice is an even number and zero if the face of a dice is an odd number. The machine can be constructed by a perceptron with certain set of weights.
Early Artificial Neural Network
Perceptron Training (3)If the weights of the perceptron can be found(regardless of howthey are found) so that it has the function, the perceptron can represent the desired machine.
If the perceptron’s weights can be found by a systematic procedure,it is learnable.Usually if it is learnable, then it can also represent the concept. However, the reverse is not true.
In the following sections, we will investigate about a problem thatcan be represented but can not be learned by a single-layerperceptron.
Early Artificial Neural Network
EX-OR Problem (1)The EX-OR problem is given by the table below:
When we want to have a perceptron that represents the EX-OR problem, we can have the perceptron shown in Fig. 1.2.
Early Artificial Neural Network
EX-OR Problem (2)
As we defined, the neuron first integrate all the inputs given to theperceptron as follows:
Early Artificial Neural Network
EX-OR Problem (3)Assume that . Then, the output of the perceptron will be
5.0
5.0
Therefore, the classification boundary will be given by:
Or
Early Artificial Neural Network EX-OR Problem (4)From the above equation, we can change W1 and W2 in order toClassify the EX-OR outputs. As we can see from Fig. 1.3, lines of
W1x1+W2x2-0.5=0 with all the possible cases of W1 and W2
pass through (0, 0.5).
Figure 1.3: Classification boundaries for W1x1+W2x2-0.5=0
Early Artificial Neural Network
EX-OR Problem (5)
If we draw the region for y=0 and y=1 by solving the inequalities as shown in Fig. 1.3, and ,we can see that it is impossible to classify the zeros and ones of EX-OR problem with the above inequalities(these are from the perceptron). (Note: The EX-OR problem needs at least two classification boundaries.)Generally, there is no way to represent the EX-OR problem by using this(single-layer) perceptron.
Early Artificial Neural Network
Linear Separability(1)As we can see from the EX-OR problem, a perceptron with 2-dimensional inputs(two input case) provides us a straight line for its classification boundary equation(W1x1+W2x2+ =0).
If we think about other cases with different numbers of input variables geometrically, the classification boundary(or separator) is given in Table 1.1.
Table 1.1: Number of inputs VS. its Separator in a Perceptron
Early Artificial Neural Network
Linear Separability(2)
As the number of dimensions increases, the separator becomes
harder to classify more complex classification boundaries.
♠ One way to overcome the Linear Separability Problem in
Perceptron → Add more layers !!
Early Artificial Neural Network Linear Separability(3)
The following example shows how the perceptron can avoid itslinear separability problem by adding more layers. Even thoughAdding more layers properly and finding proper weights are not easy task, it is possible to have more complex classification boundaries by using multi-layer perceptron.
Example: Let’s assume that we have two perceptrons, P1 and P2, asshown in Figures 1.4 and 1.5. Each of P1 and P2 creates a straightline as its own classification boundary → Please find the classification boundaries for P1 and P2 by yourself and draw the
regions for zero and one on x1 and x2 domain.
Early Artificial Neural Network
Linear Separability(4)
Figure 1.4: The Perceptron P1
and its classification boundary.
Figure 1.5: The Perceptron P2 and its
classification boundary. Note that the
regions for 0 and 1 are changed when we
compare with P1.
Early Artificial Neural Network
Linear Separability(5)If we use the outputs of P1 and P2 as inputs of another perceptronand use the weights as shown in Figure 1.6, then the two-layer perceptron of Figure 1.6 can perform EX-OR problem.
Figure 1.6: A Two-Layer Perceptron
Early Artificial Neural Network
Linear Separability(6)The intermediate values in the two-layer perceptron shown in Fig. 1.6 can be summarized in Table 1.2. Figure 1.8 shows the resulting classification with this two-layer perceptron for the EX-OR problem. Output layer does “AND” operation! The next layer after the input layer is often called as hidden layer.
Table 1.2: Input-output relations in the two-layer perceptron with P1 and P2
Early Artificial Neural Network
Linear Separability(7)
Generally, a perceptron with more that 2-layers can represent any classification boundary formed by lines.(Note: Some books use 3-layer perceptron in stead of 2-layers. The 3-layer means input layer, one hidden layer, and one output layer of neurons. Since single-layer perceptron means a perceptron with input and output layers, it is more proper to call the perceptron with input layer, one hidden layer, and output layer of neurons as two-layer perceptron.)
Early Artificial Neural Network Linear Separability(8)Homework No.1 Find a two-layer perceptron that has classification boundary shown in Figure 1.7 for the EX-OR problem.
Figure 1.7: Classification boundaries for Homework No.1
Early Artificial Neural Network
Linear Separability(9)
Figure 1.8: The classified regions by a two-layer perceptron formed by P1 and P2 as hidden layer and an output layer. The output layer performs the logical AND operation. ( : training data for 1. ⊙: ♁training data points for 0.)
Early Artificial Neural Network
Linear Separability(10)Homework No.2 Find a three-layer perceptron that has classification boundary shown in Figure 1.9.
* Problems with the above approach to find a perceptron for a given classification boundary.
Figure 1.9: Classification boundaries for Homework No.2
Early Artificial Neural Network
Linear Separability(11)
Given complex classification boundary → Need many neurons !
For practical problems → we don’t know the classification boundary and knowing the classification boundary from data is the problem !
Our brain has millions of neurons and out problem is how to find the classification boundary from data.
All we discussed and solved in this section are about the representation of problem. What we need is not representation, but learning. That is, how to find the weights with given data is our given task. In the next section we’ll discuss how to find the weights of a perceptron, automatically for a given set of data.
Early Artificial Neural Network Perceptron Training Algorithm(1)The Perceptron Training Algorithm can be described as follows:
0. Begin by initializing all weights to some different numbers.
(Usually, put numbers between 0.0 to 10.0.)
1. apply an input and calculate the output y.
2. a. if the output is correct(y=t, t = target), go to step 3.
b. else if y=0, (t=1 and y=0), increase the weights by half(η=0.5)
corresponding to the one whose input is one.
else if y=1, (t=0 and y=1), decrease the weights by half(η=0.5)
corresponding to the one whose input is one.
3. if all the inputs produce correct outputs, then go to step 4. else
go to step 1.
4. end
Early Artificial Neural Network
Perceptron Training Algorithm(2)When we use an iterative algorithm like this, we use the following terminologies: Iteration : Iteration number is total number of data presentations to
the perceptron. Epoch : Epoch number is the number of data presentation for a specific data to the perceptron.
For example, if we have 3 data and presented the data to the perceptron with the following sequence : data 1 → data 2 → data 3 → data 1 → data 2 → data 3 → data 1 → data 2 → data 3 → data 1 → data 2 → data 3 → data 1 → data 2 → data 3Then, the iteration number is 15 and the epoch number is 5 since the total number of data presentations is 15 and data 1 has been shown to the perceptron 5 times.
Early Artificial Neural Network Perceptron Training Algorithm(3)The data need to be given to the perceptron one by one until the proper weight is found. Usually, we need to put stopping criteria in order to prevent the algorithm from running indefinitely. (For example, if we are training a single layer perceptron for EX-OR problem, then it will not converge to the proper weights forever since the single layer perceptron can not solve EX-OR problem. In this case, we need to stop the training if total number of iteration reaches a certain number.). The following is most widely used stopping criteria :
• Maximum number of iteration for the algorithm • Minimum error
It means that the algorithm should stop if either it reaches total number of iterations or it reaches minimum error.
Note : The increment or decrement of weights (η) in 2.b is called learning gain. We can use different values of learning gain and bias. Usual value of the learning gain is zero to one, 0 < η < 1.
Early Artificial Neural Network
Perceptron Training Algorithm(4)Example of Running the Perceptron Training Algorithm
When bias is given as 0.55 (=0.55) for the following logical AND problem, we can train the perceptron shown below respond properly to the training data.
This example shows how to find the weights of the perceptron shown above for the given training data.
Early Artificial Neural Network
Perceptron Training Algorithm(5)Step 0 Set w1=0.5 and w2=0.6.
Step 1 Put input and target pair : (x1, x2 ; t) = (0, 0 ; 0).
S=w1x1+w2x2=(0.5)(0)+(0.6)(0)=0
y=0 since S0.55, (η=0.55).Step 2 Since y=t=0, go to Step 3.Step 3 Since we did not try other data, go to Step 1.Step 1 Put the next input and target pair : (x1, x2 ; t) = (0, 1 ; 0).
S=w1x1+w2x2=(0.5)(0)+(0.6)(1)=0.6
y=1 since S0.55, (η=0.55).Step 2 Since y=1 and t=0, go to Step 2-b.
Decrease w2 by half(since w2 is the weight whose input is one.)
Therefore, w2=0.3 ← new weight for w2.
Early Artificial Neural Network
Perceptron Training Algorithm(6)Step 3 Since we didn’t try the other data(the 3rd and the 4th), go to Step 1.Step 1 Put the next input and target pair : (x1, x2 ; t) = (1, 0 ; 0).
S=w1x1+w2x2=(0.5)(1)+(0.3)(0)=0.5
y=0 since S0.55.Step 2 Since y=t=0, go to Step 3.Step 3 Since we didn’t try the 4th data, go to Step 1.Step 1 Put the next input and target pair : (x1, x2 ; t) = (1, 1 ; 1).
S=w1x1+w2x2=(0.5)(1)+(0.3)(1)=0.8
y=1 since S0.55, (η=0.55).Step 2 Since y=t=1, go to Step 3.Step 3 We finish the first epoch(all four data). However, the second data did not give right answer, we need to do it again(the second epoch). Go to Step 1.
Early Artificial Neural Network
Perceptron Training Algorithm(7)Step 1 Put the first input and target pair : (x1, x2 ; t) = (0, 0 ; 0).
S=w1x1+w2x2=(0.5)(0)+(0.3)(0)=0 y=0 since S≤0.55, (η=0.55).
Step 2 Since y=t=0, go to Step 3.Step 3 We need to show the 2nd data. Go to Step 1.Step 1 Put the second input and target pair : (x1, x2 ; t) = (0, 1 ; 0).
S=w1x1+w2x2=(0.5)(0)+(0.3)(1)=0.3 y=0 since S≤0.55, (η=0.55).
Step 2 Since y=t=0, go to Step 3.Step 3 We need to show the 3rd data. Go to Step 1.Step 1 Put the 3rd input and target pair : (x1, x2 ; t) = (1, 0 ; 0).
S=w1x1+w2x2=(0.5)(1)+(0.3)(0)=0.5 y=0 since S≤0.55.
Early Artificial Neural Network Perceptron Training Algorithm(8)Step 2 Since y=t=0, go to Step 3.Step 3 We need to show the 4th data. Go to Step 1.Step 1 Put the 4th input and target pair : (x1, x2 ; t) = (1, 1 ; 1).
S=w1x1+w2x2=(0.5)(1)+(0.3)(1)=0.8 y=1 since S0.55, (η=0.55).
Step 2 Since y=t=1, go to Step 3.Step 3 We now can see that all four data give correct answers y=t. Now we
can stop the training. If all four data did not give correct answers y=t, then you need to go back to the first data and do the same thing again until you get the correct answers for all four data.
The classification boundary found in this train is :0.5x1+0.3x2=0.55
Early Artificial Neural Network
Perceptron Training Algorithm(9)
The Table 1.3 shows the change in classification boundaries of perceptron before and after the training for AND data. Note that the output of perceptron for the input data(x1=0, x2=1) was 1 before the training, but the perceptron gives 1 for the input data(x1=0, x2=1) after the training.
Table 1.3: The classification boundaries of perceptron before and after the training for AND data.
Early Artificial Neural Network
Perceptron Training Algorithm(10)
Homework No.3 Program the perceptron training algorithm using C language.Specification of the program:
• Input-to-program: maximum-number-of-data=4, number-of-inputs=2, number-of-output=1.
• Output-of-program: output value for a given input data
For example, after you train the perceptron with the AND problem, if you put 1.1 and 1.3 for x1 and x2, the perceptron should give output as 1.0(since x11.0 and x2 1.0).
Early Artificial Neural Network
The Delta Rule (1)This is an extension of the perceptron training algorithm to continuous inputs/outputs.New concept : the Error between target and actual output from a perceptron with given input data.
where tp : target value of the input pattern pyp : actual output of perceptron after showing the input pattern p
Early Artificial Neural Network
The Delta Rule (2)or the delta rule can be expressedn by the following equation:
where I is the amount of change in weight Wi, = learning gain or learning rate, xi
= the input connected to weight Wi.
By using the delta rule, we can summarize the training equation:
where n is the iteration index.
Early Artificial Neural Network
The Delta Rule (3)
Note : Choosing is not easy. Too small it need too long time to learn.Too large it oscillate and does not converge.
Normally, we use 0 1.
Early Artificial Neural Network
The Delta Rule (4)
Derivation of Delta RuleAssume that we are given the following simple neuron in 1.10. Note that the neuron does not have nonlinearity.
Figure 1.10: A Single-Layer Perceptron without nonlinearity
Early Artificial Neural Network
The Delta Rule (5)The output of the neuron can be written as follows:
where
Early Artificial Neural Network
The Delta Rule (6)
The Delta Rule can be derived by using the chain rule in Calculus. If we use the concept of Error or Energy as:
Early Artificial Neural Network
The Delta Rule (7)
What we want to do is to find a set of weights that minimize the error or energy (E). We can accomplish this by using the steepest-descent algorithm. The steepest-descent algorithm is also known as Newton’s algorithm in Numerical Analysis.
The steepest-descent algorithm finds the first derivative of the objective function(or energy function) with respect to the variable. And then change the value of variable to the negative direction of the first derivative in order to minimize the energy.
Please refer any Numerical Analysis book for steepest-descent algorithm or Newton’s algorithm.
Early Artificial Neural Network
The Delta Rule (8)If we find the first derivative of energy with respect to the weight ,
Therefore, the negative of the first derivative of Energy is:
And the change of weights in each iteration is
Early Artificial Neural Network
The Delta Rule (9)Gradient-Descent (Steepest-Descent) Algorithm
Problem: Given an objective(error) function which is a function of variable, find the value of variable that minimizes the objective function.
Example: Given an objective function y=(x-1)2, find the value of x that minimizes y.
Solution: We can do this by However, in our Gradient-Descent Algorithm, it is a little bit different. We start from any initial value of x, called x0, and try to find the optimal value of x,
called that minimizes y.
Early Artificial Neural Network The Delta Rule (10)The Gradient-Descent algorithm says:
Let x(0)=3 and =0.2. In out case,
Therefore, if we apply the algorithm, we can get the following:
Finally, it approaches the optimal value of x=1.0.
Early Artificial Neural Network
The Delta Rule (11)
Homework #4 Do the same example with different initial value of x, that is, x0=-8.0. Leave other parameters the same as the above. Find x1, x2, … , x9.
Reference: David Luenberger, Linear and Nonlinear Programming, 2nd Ed, Addison-Wesley Pub. Co., Reading, MA, 1984
Early Artificial Neural Network
The Delta Rule (12)Figure 1.11 shows an example of error surface in 2-dimensional weight space. This just explains how a simple error surface looks like. Most error surfaces are not this simple, but are very complex to have many hills and valleys. Of course, the error surface of a given problem can not be known in practical problems is more than two dimensions. Therefore, it is almost impossible to express the error surface graphically even we can know the error surface with a given problem.
Gradient-descent algorithm is one way to find a weight value that gives minimum error iteratively starting any initial weights. Figure 1.12 describes the gradient-descent algorithm for a simple problem with an error surface as in 1.11. With the initial weight w0, the gradient-descent algorithm finds paths to the weight, w* for minimum error, iteratively.
Early Artificial Neural Network
The Delta Rule (13)
Figure 1.11: An Example of Error Surface in 2-Dimensional Weight Space
Early Artificial Neural Network The Delta Rule (14)
Figure 1.12: Weight change in Gradient-Descent Algorithm(2-Dimensional Weight Space)
Multi-Layer Perceptron Type ANN
Structure of Multi-Layered Perceptron Type ANN (1)Multi-layered perceptron(MLP) type artificial neural networks consist of several layers; one input layer, one or more hidden layers, and one output layer as depicted in Fig. 2.1. Neurons in a layer are interconnected generally to all the neurons in adjacent layer.
Each neuron receives its inputs from the neurons in the higher layer through interconnections and propagates its activation to the neurons in the next lower layer. When h hidden layers exist, the layer 0 and layer (h+1) denote the input and output layers, respectively. Then, the activation of a neuron j in the layer k, is defined as
Multi-Layer Perceptron Type ANN
Structure of Multi-Layered Perceptron Type ANN (2) where
and i covers all the neurons in the layer (k-1). Note that the activation of the