Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Prénom Nom

Document Analysis:Artificial Neural Networks

Prof. Rolf Ingold, University of Fribourg

Master course, spring semester 2008

© Prof. Rolf Ingold

2

Outline

Biological vs. artificial neural networks Artificial neuron model Artificial neural networks Multi-layer perceptron Feed-forward activation Learning approach Back-propagation method Optimal learning Illustration of JavaNNS


3

Biological neurons

Artificial neural networks are inspired by biological neurons of the central nervous system each neuron is connected

to many other neurons information is transmitted

via synapses (electro-chemical process)

a neuron receives input from its dendrites, and transmit output via the axon to synapses


4

Biological vs artificial networks

up to 108approx. 1013number de synapses

up to 106approx. 1010number of neurons

very fastrelatively slowtransmission time

mathematical function

chemicalprocessing

artificial neural network

biological neural network


5

Artificial neuron model

A neuron receives input signals x1, ..., xn

These signals are multiplied by synaptic weights w1, ..., wn, which

can be positive or negative The activation of the neuron

is transmitted to a non linearfunction f with threshold w0

The output signal

y = f (a-w0)

is then propagated to otherneurons

i

ii xwa


6

Characteristics of artificial neural networks

Artificial neural networks may vary in different aspects the topology of the network, i.e.

the number of neurons, possibly organized in layers or classes

how each neuron (of a given layer/class) is connected to its neighbors

the transfer function used in each neuron

The use and the learning strategy has to be adapted


7

Topology of the neural network

The synaptic connections have a major influence on the behavior of the neural network

Two main categories can be considered feed-forward networks where each neuron is propagating its

output signal to neurons that have not yet been used as special case, the multi-layer perceptron has a

sequence of layers such than a neuron from one layer is connected only to neurons of the next layer

dynamic networks where neurons are connected without restrictions, in a cyclic way


8

Multi-layer perceptron

The multi-layer perceptron (MLP) has 3 (or more) layers an input layer with one input neuron per feature one or several hidden layers having each an arbitrary number

of neurons, connected to the previous layer an output layer with one neuron per class each neuron being

connected to the previous layer Hidden and output layers can

be completely or only partly connected

The decision is in favor of the class corresponding to the highest output activation


9

Impact of the hidden layer(s)

Networks with hidden layers generate arbitrary decision boundaries however the number of hidden layers has no impact !


10

Feed-forward activation

As for the single perceptron, the feature space is augmented with a feature x0=1 to take into account the bias w0 .

Each neuron j of a hidden layer computes an activation

with

Each neuron k of an output layer computes an activation

with

)( jj netfy

xw tj

d

ijiij wxnet

0

)( kk netfz

ywynet tk

d

ikjjk w

0


11

Transfer function

The transfer function f is supposed to be monotonic increasing, within the range [-1,+1] antisymmetric, i.e. f (-net) = - f (net) continuous and derivable (for back-propagation)

Typical functions are step (simple threshold)

ramp

sigmoid

sinon1

0si1)( 0

0

wawaf

Twasi

TwaTsiTx

Twasi

waf

0

0

0

0

1

/

1

)(

Twae

ewaf Twa

Twa

2/)(tanh1

1)( 0/)0(

/)0(

0


12

Learning in a multi-layer perceptron

Learning consists of setting the weights w, based on training samples

The method is called back-propagation, because the training error is propagated recursively from the output layer back to the hidden and input layers

The training error on a given pattern is defined as the squared difference between the desired output and the observed output, i.e.

In practice, the desired output is +1 for the correct class and 1 (or sometimes 0) for all other classes

C

kkk ztJ

1

22

2

1

2

1)( ztw


13

Back-propagation of errors

The weight vectors are changed in the direction of their gradient

where is the learning rate

ww

J


14

Error correction on the output layer

Since the error does not directly depend upon wji we apply the

differential chain rule

with

and

Thus the update rule becomes

kj

kk

kj

k

kkj w

net

w

net

net

J

w

J

)(')( kkkk

k

kkk netfzt

net

z

z

J

net

J

jkj

k yw

net

jkjkkkkj yynetfztw )(')(


15

Error correction on the hidden layer(s)

Applying the following chain rule

with

Finally the update rule becomes

ji

j

jji w

y

y

J

w

J

j

kc

kkk

c

kkk

jj y

zztzt

yy

J

11

2

2

1

kjkj

k

k

k

j

k wnetfy

net

net

z

y

z)('

ijji

j

j

j

ji

j xnetfw

net

net

y

w

y)('

c

kkkjkjk

c

kkk

j

wwnetfzty

J

11

)('

ijik

c

kkkjji xxnetfww

)('1


16

Learning algorithm

The learning process starts with randomly initialized weights The weights are adjusted iteratively by patterns from the training set

the pattern is presented to the network and the feed-forward activation is computed

the output error is computed the error is used to

update the weightsreversely, from the output layer to thehidden layers

The process is repeated until a quality criteriais reached

f'(z)

[q+1][1]w[1]

w[2]

w[3]

w[n]

[q]

[q+1][2]

[q+1][3]

[q+1][n]


17

Risk of overfitting

Minimizing the global error over all training sample tends to produce overfitting

To avoid overfitting, the best strategy is to minimize the global error on a validation set which is independent of the training set


18

JavaNNS

JavaNNS is an interactive software framework for experimenting artificial neural networks, it has been developed at University of Tübingen it is based on SNNS, an efficient ANN kernel written in C

It supports the following features multiple topologies (MLP, dynamic networks, ...) various transfer functions various learning strategies network pruning ...


19

Font recognition with JavaNNS

Original neuralnetwork with 9hidden units


20

Pruned neural network for font recognition

Neural networkobtained after pruning

Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Documents