Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008
Prénom Nom
Document Analysis:Artificial Neural Networks
Prof. Rolf Ingold, University of Fribourg
Master course, spring semester 2008
© Prof. Rolf Ingold
2
Outline
Biological vs. artificial neural networks Artificial neuron model Artificial neural networks Multi-layer perceptron Feed-forward activation Learning approach Back-propagation method Optimal learning Illustration of JavaNNS
© Prof. Rolf Ingold
3
Biological neurons
Artificial neural networks are inspired by biological neurons of the central nervous system each neuron is connected
to many other neurons information is transmitted
via synapses (electro-chemical process)
a neuron receives input from its dendrites, and transmit output via the axon to synapses
© Prof. Rolf Ingold
4
Biological vs artificial networks
up to 108approx. 1013number de synapses
up to 106approx. 1010number of neurons
very fastrelatively slowtransmission time
mathematical function
chemicalprocessing
artificial neural network
biological neural network
© Prof. Rolf Ingold
5
Artificial neuron model
A neuron receives input signals x1, ..., xn
These signals are multiplied by synaptic weights w1, ..., wn, which
can be positive or negative The activation of the neuron
is transmitted to a non linearfunction f with threshold w0
The output signal
y = f (a-w0)
is then propagated to otherneurons
i
ii xwa
© Prof. Rolf Ingold
6
Characteristics of artificial neural networks
Artificial neural networks may vary in different aspects the topology of the network, i.e.
the number of neurons, possibly organized in layers or classes
how each neuron (of a given layer/class) is connected to its neighbors
the transfer function used in each neuron
The use and the learning strategy has to be adapted
© Prof. Rolf Ingold
7
Topology of the neural network
The synaptic connections have a major influence on the behavior of the neural network
Two main categories can be considered feed-forward networks where each neuron is propagating its
output signal to neurons that have not yet been used as special case, the multi-layer perceptron has a
sequence of layers such than a neuron from one layer is connected only to neurons of the next layer
dynamic networks where neurons are connected without restrictions, in a cyclic way
© Prof. Rolf Ingold
8
Multi-layer perceptron
The multi-layer perceptron (MLP) has 3 (or more) layers an input layer with one input neuron per feature one or several hidden layers having each an arbitrary number
of neurons, connected to the previous layer an output layer with one neuron per class each neuron being
connected to the previous layer Hidden and output layers can
be completely or only partly connected
The decision is in favor of the class corresponding to the highest output activation
© Prof. Rolf Ingold
9
Impact of the hidden layer(s)
Networks with hidden layers generate arbitrary decision boundaries however the number of hidden layers has no impact !
© Prof. Rolf Ingold
10
Feed-forward activation
As for the single perceptron, the feature space is augmented with a feature x0=1 to take into account the bias w0 .
Each neuron j of a hidden layer computes an activation
with
Each neuron k of an output layer computes an activation
with
)( jj netfy
xw tj
d
ijiij wxnet
0
)( kk netfz
ywynet tk
d
ikjjk w
0
© Prof. Rolf Ingold
11
Transfer function
The transfer function f is supposed to be monotonic increasing, within the range [-1,+1] antisymmetric, i.e. f (-net) = - f (net) continuous and derivable (for back-propagation)
Typical functions are step (simple threshold)
ramp
sigmoid
sinon1
0si1)( 0
0
wawaf
Twasi
TwaTsiTx
Twasi
waf
0
0
0
0
1
/
1
)(
Twae
ewaf Twa
Twa
2/)(tanh1
1)( 0/)0(
/)0(
0
© Prof. Rolf Ingold
12
Learning in a multi-layer perceptron
Learning consists of setting the weights w, based on training samples
The method is called back-propagation, because the training error is propagated recursively from the output layer back to the hidden and input layers
The training error on a given pattern is defined as the squared difference between the desired output and the observed output, i.e.
In practice, the desired output is +1 for the correct class and 1 (or sometimes 0) for all other classes
C
kkk ztJ
1
22
2
1
2
1)( ztw
© Prof. Rolf Ingold
13
Back-propagation of errors
The weight vectors are changed in the direction of their gradient
where is the learning rate
ww
J
© Prof. Rolf Ingold
14
Error correction on the output layer
Since the error does not directly depend upon wji we apply the
differential chain rule
with
and
Thus the update rule becomes
kj
kk
kj
k
kkj w
net
w
net
net
J
w
J
)(')( kkkk
k
kkk netfzt
net
z
z
J
net
J
jkj
k yw
net
jkjkkkkj yynetfztw )(')(
© Prof. Rolf Ingold
15
Error correction on the hidden layer(s)
Applying the following chain rule
with
Finally the update rule becomes
ji
j
jji w
y
y
J
w
J
j
kc
kkk
c
kkk
jj y
zztzt
yy
J
11
2
2
1
kjkj
k
k
k
j
k wnetfy
net
net
z
y
z)('
ijji
j
j
j
ji
j xnetfw
net
net
y
w
y)('
c
kkkjkjk
c
kkk
j
wwnetfzty
J
11
)('
ijik
c
kkkjji xxnetfww
)('1
© Prof. Rolf Ingold
16
Learning algorithm
The learning process starts with randomly initialized weights The weights are adjusted iteratively by patterns from the training set
the pattern is presented to the network and the feed-forward activation is computed
the output error is computed the error is used to
update the weightsreversely, from the output layer to thehidden layers
The process is repeated until a quality criteriais reached
f'(z)
[q+1][1]w[1]
w[2]
w[3]
w[n]
[q]
[q+1][2]
[q+1][3]
[q+1][n]
© Prof. Rolf Ingold
17
Risk of overfitting
Minimizing the global error over all training sample tends to produce overfitting
To avoid overfitting, the best strategy is to minimize the global error on a validation set which is independent of the training set
© Prof. Rolf Ingold
18
JavaNNS
JavaNNS is an interactive software framework for experimenting artificial neural networks, it has been developed at University of Tübingen it is based on SNNS, an efficient ANN kernel written in C
It supports the following features multiple topologies (MLP, dynamic networks, ...) various transfer functions various learning strategies network pruning ...
© Prof. Rolf Ingold
19
Font recognition with JavaNNS
Original neuralnetwork with 9hidden units
© Prof. Rolf Ingold
20
Pruned neural network for font recognition
Neural networkobtained after pruning