Top Banner
Rutgers CS440, Fall 2003 Neural networks Reading: Ch. 20, Sec. 5, AIMA 2 nd Ed
19

Rutgers CS440, Fall 2003 Neural networks Reading: Ch. 20, Sec. 5, AIMA 2 nd Ed.

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Rutgers CS440, Fall 2003 Neural networks Reading: Ch. 20, Sec. 5, AIMA 2 nd Ed.

Rutgers CS440, Fall 2003

Neural networks

Reading: Ch. 20, Sec. 5, AIMA 2nd Ed

Page 2: Rutgers CS440, Fall 2003 Neural networks Reading: Ch. 20, Sec. 5, AIMA 2 nd Ed.

Rutgers CS440, Fall 2003

Outline

• Human learning, brain, neurons• Artificial neural networks• Perceptron• Learning in NNs• Multi-layer NNs• Hopfield networks

Page 3: Rutgers CS440, Fall 2003 Neural networks Reading: Ch. 20, Sec. 5, AIMA 2 nd Ed.

Rutgers CS440, Fall 2003

Brains & neurons

• Brain is composed of over a billion nerve cells that communicate with each other through specialized contacts called synapses. ~1000 synapses / neuron => extensive and elaborate neural circuits

• Some synapses excite neurons and cause them to generate signals called action potentials, large transient voltage changes that propagates down their axons. Other synapses are inhibitory and prevent the neuron from generating action potentials. The action potential propagates down the axon to the sites where the axon has made synapses with the dendrites of other nerve cells.

http://www.medicine.mcgill.ca/physio/cooperlab/cooper.htm

Page 4: Rutgers CS440, Fall 2003 Neural networks Reading: Ch. 20, Sec. 5, AIMA 2 nd Ed.

Rutgers CS440, Fall 2003

Artificial neurons

• Computational neuroscience• McCulloch-Pitts model

inj

gj

xj

wj0wj1

wji

wjN

x0=-1

x1

xi

xNInputfunction

Activationfunction

Output

Offset / bias

Weights

Input links Output links

)(

'2

1

1

10010

xWgx

gxwgwxwgxwgx

Nx

x

x

jNwjwjwjjjj

N

iijij

N

iijijj

Page 5: Rutgers CS440, Fall 2003 Neural networks Reading: Ch. 20, Sec. 5, AIMA 2 nd Ed.

Rutgers CS440, Fall 2003

Activation function

• Threshold activation function (hard threshold, step function)• Sigmoid activation function (soft threshold)• Bias / offset shifts activation function left/right

gj

inj

gj

inj

0,1

0,0)(

in

ining

ineing

1

1)(

Page 6: Rutgers CS440, Fall 2003 Neural networks Reading: Ch. 20, Sec. 5, AIMA 2 nd Ed.

Rutgers CS440, Fall 2003

X

Y

-1

X V Y

wx

wy

w0

OR

X

Y

-1

X V Y

wx=1

wy=1

w0=0.5

OR

Implementing functions

• Artificial neurons can implement different (boolean) function

X

Y

-1

X ^ Y

wx

wy

w0

AND

X

-1

Xwx

w0

NOT

X

Y

-1

X ^ Y

wx=1

wy=1

w0=1.5

AND

X

-1

Xwx=1

w0=0.5

NOT

Page 7: Rutgers CS440, Fall 2003 Neural networks Reading: Ch. 20, Sec. 5, AIMA 2 nd Ed.

Rutgers CS440, Fall 2003

Networks of neurons

• Feed-forward networks:– single-layer networks

– multi-layer networks

– Feed-forward networks implement functions, have no internal state

• Recurrent networks:– Hopfield networks have

symmetric weights (Wij = Wji)

– Boltzmann machines use stochastic activation functions,

– Recurrent neural nets have directed cycles with delays

Page 8: Rutgers CS440, Fall 2003 Neural networks Reading: Ch. 20, Sec. 5, AIMA 2 nd Ed.

Rutgers CS440, Fall 2003

Perceptron

• Rosenblatt, 1958• Single layer, feed-forward network

x1

x2

x3

x4

x5

y1

y2

y3

Input layer Output layer

wji

x1

x2

y1

Page 9: Rutgers CS440, Fall 2003 Neural networks Reading: Ch. 20, Sec. 5, AIMA 2 nd Ed.

Rutgers CS440, Fall 2003

Expressiveness of perceptron

• With threshold function, can represent AND, OR, NOT, majority• Linear separator (linear discriminant classifier)

0:

0:

0

0

N

iiji

N

iiji

xwX

xwX

0:

0:

WXX

WXX

Line: WX = 0

Page 10: Rutgers CS440, Fall 2003 Neural networks Reading: Ch. 20, Sec. 5, AIMA 2 nd Ed.

Rutgers CS440, Fall 2003

Expressiveness of perceptron (cont’d)

• Can only separate linearly separable classes

x1

x2

0 1

1

x1

x2

0 1

1

x1

x2

0 1

1

AND OR XOR

?

x1

x2

0

C1

C2

Page 11: Rutgers CS440, Fall 2003 Neural networks Reading: Ch. 20, Sec. 5, AIMA 2 nd Ed.

Rutgers CS440, Fall 2003

Perceptron example:geometric interpretation

• Two inputs, one output

w’ x – w0 = 0

122

21

2

1

2

1

ww

x

xx

w

ww

w0

w1 = cos

w2

= s

in

)'()( 002211 wxwgwxwxwgy

xpw’ xp

x1

x2

w’xp-w0

Page 12: Rutgers CS440, Fall 2003 Neural networks Reading: Ch. 20, Sec. 5, AIMA 2 nd Ed.

Rutgers CS440, Fall 2003

Perceptron learning

• Given a set of (linearly separable) points, X+ and X- ( or a set of points with labels, D = { (x,y)k }, y { 0, 1 } ), find a perceptron that separates (classifies) the two

• Algorithm: – Minimize error of classification. How?– Start with some random weights and adjust them (iteratively) so that the error is

minimal

K

k

K

kk

N

ikiik errxwgyE

1 1

221

2

0,2

1

)(minarg* wEww

i

li

li

kik

K

kk

i

w

Eww

xingerrw

E

)1()(

,1

)('

in

g’(in)

Page 13: Rutgers CS440, Fall 2003 Neural networks Reading: Ch. 20, Sec. 5, AIMA 2 nd Ed.

Rutgers CS440, Fall 2003

Analysis of the perceptron learning algorithm

• Consider 1D case: 1 input, 1 output

0

)1(0

)(0

10

0

)1)(('

)(

w

Eww

ingerrw

E

wxgy

ll

k

K

kk

x0

w0(l-1)

0 0 -1 0 0err

)('00)(')1(00)1)((' 3310

xgxgingerrw

Ek

K

kk

)('))('( 3)1(

03)1(

00

)1(0

)(0 xgwxgw

w

Eww llll

w0(l)

)(' 3xg

x3

Page 14: Rutgers CS440, Fall 2003 Neural networks Reading: Ch. 20, Sec. 5, AIMA 2 nd Ed.

Rutgers CS440, Fall 2003

Perceptron learning (cont’d)

• Converges to consistent classifiers if data is linearly separable

Page 15: Rutgers CS440, Fall 2003 Neural networks Reading: Ch. 20, Sec. 5, AIMA 2 nd Ed.

Rutgers CS440, Fall 2003

Multilayer perceptrons

• Number of hidden layers• Usually fully connected• 2-layers can represent all

continuous functions• 3-layers can represent all

functions• Learning:

backpropagation algorithm, extension of the perceptron learning– Backpropagate the error from

the output layer into hidden layer(s)

x1

x2

x3

x4

x5

y1

y2

Input layer Output layer

w(1)ji

Hidden layer

w(2)ji

Page 16: Rutgers CS440, Fall 2003 Neural networks Reading: Ch. 20, Sec. 5, AIMA 2 nd Ed.

Rutgers CS440, Fall 2003

Application: handwritten digit recognition

Page 17: Rutgers CS440, Fall 2003 Neural networks Reading: Ch. 20, Sec. 5, AIMA 2 nd Ed.

Rutgers CS440, Fall 2003

Probabilistic interpretation

• Two classes, X+ and X-, elements of which are distributed randomly according to some densities p+(x) and p-(x)

• Classes have prior probabilities p+ and p-

• Given an element x decide which class it belongs to.

X+

X-

0 )(

)(log

)|(

)|(log

?

pxp

pxp

xp

xp

)|(

)|(logg ) Class(

xp

xp

)|()|()|(

)|(

)|(

)|(1

1e1

1 ) is Class(

)|(

)|(log

xpxpxp

xp

xp

xp

xp

xp

• Output of NN is the posterior probability of a data class

Page 18: Rutgers CS440, Fall 2003 Neural networks Reading: Ch. 20, Sec. 5, AIMA 2 nd Ed.

Rutgers CS440, Fall 2003

Hopfield NN

• “Associative memory” – associate unknown input with entities encoded in the network

)( )1()( tt Wxgx

Activation potential at “time” t-1(state of network at t-1)

Activation potential at “time” t(state of network at t)

• Learning:Use a set of examples X = { x1, …, xK} and set W to

1...0

1

0...1

1

' KxxWK

kkk

• Association:To associate an unknown input xu with one of K memorized elements, set x(0) = xu and let the network converge

Page 19: Rutgers CS440, Fall 2003 Neural networks Reading: Ch. 20, Sec. 5, AIMA 2 nd Ed.

Rutgers CS440, Fall 2003

Hopfield NN example

Training set

Test example

t=0 t=1 t=2 t=3