Top Banner
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008
20

Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Prénom Nom

Document Analysis:Artificial Neural Networks

Prof. Rolf Ingold, University of Fribourg

Master course, spring semester 2008

Page 2: Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

© Prof. Rolf Ingold

2

Outline

Biological vs. artificial neural networks Artificial neuron model Artificial neural networks Multi-layer perceptron Feed-forward activation Learning approach Back-propagation method Optimal learning Illustration of JavaNNS

Page 3: Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

© Prof. Rolf Ingold

3

Biological neurons

Artificial neural networks are inspired by biological neurons of the central nervous system each neuron is connected

to many other neurons information is transmitted

via synapses (electro-chemical process)

a neuron receives input from its dendrites, and transmit output via the axon to synapses

Page 4: Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

© Prof. Rolf Ingold

4

Biological vs artificial networks

up to 108approx. 1013number de synapses

up to 106approx. 1010number of neurons

very fastrelatively slowtransmission time

mathematical function

chemicalprocessing

artificial neural network

biological neural network

Page 5: Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

© Prof. Rolf Ingold

5

Artificial neuron model

A neuron receives input signals x1, ..., xn

These signals are multiplied by synaptic weights w1, ..., wn, which

can be positive or negative The activation of the neuron

is transmitted to a non linearfunction f with threshold w0

The output signal

y = f (a-w0)

is then propagated to otherneurons

i

ii xwa

Page 6: Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

© Prof. Rolf Ingold

6

Characteristics of artificial neural networks

Artificial neural networks may vary in different aspects the topology of the network, i.e.

the number of neurons, possibly organized in layers or classes

how each neuron (of a given layer/class) is connected to its neighbors

the transfer function used in each neuron

The use and the learning strategy has to be adapted

Page 7: Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

© Prof. Rolf Ingold

7

Topology of the neural network

The synaptic connections have a major influence on the behavior of the neural network

Two main categories can be considered feed-forward networks where each neuron is propagating its

output signal to neurons that have not yet been used as special case, the multi-layer perceptron has a

sequence of layers such than a neuron from one layer is connected only to neurons of the next layer

dynamic networks where neurons are connected without restrictions, in a cyclic way

Page 8: Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

© Prof. Rolf Ingold

8

Multi-layer perceptron

The multi-layer perceptron (MLP) has 3 (or more) layers an input layer with one input neuron per feature one or several hidden layers having each an arbitrary number

of neurons, connected to the previous layer an output layer with one neuron per class each neuron being

connected to the previous layer Hidden and output layers can

be completely or only partly connected

The decision is in favor of the class corresponding to the highest output activation

Page 9: Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

© Prof. Rolf Ingold

9

Impact of the hidden layer(s)

Networks with hidden layers generate arbitrary decision boundaries however the number of hidden layers has no impact !

Page 10: Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

© Prof. Rolf Ingold

10

Feed-forward activation

As for the single perceptron, the feature space is augmented with a feature x0=1 to take into account the bias w0 .

Each neuron j of a hidden layer computes an activation

with

Each neuron k of an output layer computes an activation

with

)( jj netfy

xw tj

d

ijiij wxnet

0

)( kk netfz

ywynet tk

d

ikjjk w

0

Page 11: Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

© Prof. Rolf Ingold

11

Transfer function

The transfer function f is supposed to be monotonic increasing, within the range [-1,+1] antisymmetric, i.e. f (-net) = - f (net) continuous and derivable (for back-propagation)

Typical functions are step (simple threshold)

ramp

sigmoid

sinon1

0si1)( 0

0

wawaf

Twasi

TwaTsiTx

Twasi

waf

0

0

0

0

1

/

1

)(

Twae

ewaf Twa

Twa

2/)(tanh1

1)( 0/)0(

/)0(

0

Page 12: Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

© Prof. Rolf Ingold

12

Learning in a multi-layer perceptron

Learning consists of setting the weights w, based on training samples

The method is called back-propagation, because the training error is propagated recursively from the output layer back to the hidden and input layers

The training error on a given pattern is defined as the squared difference between the desired output and the observed output, i.e.

In practice, the desired output is +1 for the correct class and 1 (or sometimes 0) for all other classes

C

kkk ztJ

1

22

2

1

2

1)( ztw

Page 13: Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

© Prof. Rolf Ingold

13

Back-propagation of errors

The weight vectors are changed in the direction of their gradient

where is the learning rate

ww

J

Page 14: Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

© Prof. Rolf Ingold

14

Error correction on the output layer

Since the error does not directly depend upon wji we apply the

differential chain rule

with

and

Thus the update rule becomes

kj

kk

kj

k

kkj w

net

w

net

net

J

w

J

)(')( kkkk

k

kkk netfzt

net

z

z

J

net

J

jkj

k yw

net

jkjkkkkj yynetfztw )(')(

Page 15: Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

© Prof. Rolf Ingold

15

Error correction on the hidden layer(s)

Applying the following chain rule

with

Finally the update rule becomes

ji

j

jji w

y

y

J

w

J

j

kc

kkk

c

kkk

jj y

zztzt

yy

J

11

2

2

1

kjkj

k

k

k

j

k wnetfy

net

net

z

y

z)('

ijji

j

j

j

ji

j xnetfw

net

net

y

w

y)('

c

kkkjkjk

c

kkk

j

wwnetfzty

J

11

)('

ijik

c

kkkjji xxnetfww

)('1

Page 16: Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

© Prof. Rolf Ingold

16

Learning algorithm

The learning process starts with randomly initialized weights The weights are adjusted iteratively by patterns from the training set

the pattern is presented to the network and the feed-forward activation is computed

the output error is computed the error is used to

update the weightsreversely, from the output layer to thehidden layers

The process is repeated until a quality criteriais reached

f'(z)

[q+1][1]w[1]

w[2]

w[3]

w[n]

[q]

[q+1][2]

[q+1][3]

[q+1][n]

Page 17: Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

© Prof. Rolf Ingold

17

Risk of overfitting

Minimizing the global error over all training sample tends to produce overfitting

To avoid overfitting, the best strategy is to minimize the global error on a validation set which is independent of the training set

Page 18: Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

© Prof. Rolf Ingold

18

JavaNNS

JavaNNS is an interactive software framework for experimenting artificial neural networks, it has been developed at University of Tübingen it is based on SNNS, an efficient ANN kernel written in C

It supports the following features multiple topologies (MLP, dynamic networks, ...) various transfer functions various learning strategies network pruning ...

Page 19: Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

© Prof. Rolf Ingold

19

Font recognition with JavaNNS

Original neuralnetwork with 9hidden units

Page 20: Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

© Prof. Rolf Ingold

20

Pruned neural network for font recognition

Neural networkobtained after pruning