Top Banner
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 34: Backpropagation; need for multiple layers and non linearity 5 th April, 2011
26

CS344: Introduction to Artificial Intelligence (associated lab: CS386)

Jan 26, 2016

Download

Documents

kelda

CS344: Introduction to Artificial Intelligence (associated lab: CS386). Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 34: Backpropagation ; need for multiple layers and non linearity 5 th April, 2011. Backpropagation algorithm. …. Output layer (m o/p neurons). j. w ji. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CS344: Introduction to Artificial Intelligence (associated lab: CS386)

CS344: Introduction to Artificial Intelligence

(associated lab: CS386)

Pushpak BhattacharyyaCSE Dept., IIT Bombay

Lecture 34: Backpropagation; need for multiple layers and non linearity

5th April, 2011

Page 2: CS344: Introduction to Artificial Intelligence (associated lab: CS386)

Backpropagation algorithm

Fully connected feed forward network Pure FF network (no jumping of

connections over layers)

Hidden layers

Input layer (n i/p neurons)

Output layer (m o/p neurons)

j

i

wji

….

….

….

….

Page 3: CS344: Introduction to Artificial Intelligence (associated lab: CS386)

Gradient Descent Equations

iji

jji

j

thj

ji

j

jji

jiji

jow

netjw

jnet

E

netw

net

net

E

w

E

w

Ew

)layer j at theinput (

)10 rate, learning(

Page 4: CS344: Introduction to Artificial Intelligence (associated lab: CS386)

Backpropagation – for outermost layer

ijjjjji

jjjj

m

ppp

thj

j

j

jj

ooootw

oootj

otE

netnet

o

o

E

net

Ej

)1()(

))1()(( Hence,

)(2

1

)layer j at theinput (

1

2

Page 5: CS344: Introduction to Artificial Intelligence (associated lab: CS386)

Backpropagation for hidden layers

Hidden layers

Input layer (n i/p neurons)

Output layer (m o/p neurons)j

i

….

….

….

….

k

k is propagated backwards to find value of j

Page 6: CS344: Introduction to Artificial Intelligence (associated lab: CS386)

Backpropagation – for hidden layers

)1()(

)1()( Hence,

)1()(

)1(

layernext

layernext

layernext

jjk

kkj

jjk

kjkj

jjk j

k

k

jjj

j

j

jj

iji

oow

oow

ooo

net

net

E

ooo

E

net

o

o

E

net

Ej

jow

Page 7: CS344: Introduction to Artificial Intelligence (associated lab: CS386)

General Backpropagation Rule

ijjk

kkj ooow )1()(layernext

)1()( jjjjj ooot

iji jow • General weight updating rule:

• Where

for outermost layer

for hidden layers

Page 8: CS344: Introduction to Artificial Intelligence (associated lab: CS386)

Observations on weight change rules

Does the training technique support our intuition?

The larger the xi, larger is ∆wi Error burden is borne by the weight

values corresponding to large input values

Page 9: CS344: Introduction to Artificial Intelligence (associated lab: CS386)

Observations contd.

∆wi is proportional to the departure from target

Saturation behaviour when o is 0 or 1

If o < t, ∆wi > 0 and if o > t, ∆wi < 0 which is consistent with the Hebb’s law

Page 10: CS344: Introduction to Artificial Intelligence (associated lab: CS386)

Hebb’s law

If nj and ni are both in excitatory state (+1) Then the change in weight must be such that it

enhances the excitation The change is proportional to both the levels of

excitation

∆wji α e(nj) e(ni)

If ni and nj are in a mutual state of inhibition ( one is +1 and the other is -1), Then the change in weight is such that the inhibition

is enhanced (change in weight is negative)

nj

ni

wji

Page 11: CS344: Introduction to Artificial Intelligence (associated lab: CS386)

Saturation behavior

The algorithm is iterative and incremental

If the weight values or number of input values is very large, the output will be large, then the output will be in saturation region.

The weight values hardly change in the saturation region

Page 12: CS344: Introduction to Artificial Intelligence (associated lab: CS386)

How does it work?

Input propagation forward and error propagation backward (e.g. XOR)

w2=1w1=1θ = 0.5

x1x2 x1x2

-1

x1 x2

-11.5

1.5

1 1

Page 13: CS344: Introduction to Artificial Intelligence (associated lab: CS386)

If Sigmoid Neurons Are Used, Do We Need MLP?

Does sigmoid have the power of separating non-linearly separable data?

Can sigmoid solve the X-OR problem

Page 14: CS344: Introduction to Artificial Intelligence (associated lab: CS386)

O = 1 if O > yu

O = 0 if O < yl

Typically yl << 0.5 , yu >> 0.5

O = 1 / 1+ e -net

O

net

1yu

yl

Page 15: CS344: Introduction to Artificial Intelligence (associated lab: CS386)

Inequalities

O = 1 / (1+ e –net )

W2

X2

W1

X1

W0

X0=-1

Page 16: CS344: Introduction to Artificial Intelligence (associated lab: CS386)

<0, 0>

O = 0 i.e 0 < yl

1 / 1 + e(–w1

x1

- w2

x2

+w0

) < yl

i.e. (1 / (1+ ewo)) < yl (1)

Page 17: CS344: Introduction to Artificial Intelligence (associated lab: CS386)

<0, 1>

O = 1i.e. 0 > yu

1/(1+ e (–w1

x1

- w2

x2

+ w0

)) > yu

(1 / (1+ e-w2

+w0)) > yu (2)

Page 18: CS344: Introduction to Artificial Intelligence (associated lab: CS386)

<1, 0>

O = 1i.e. (1/1+ e-w

1+w

0) > yu

<1, 1>

O = 0

i.e. 1/(1+ e-w1

-w2

+w0) < yl

(3)

(4)

Page 19: CS344: Introduction to Artificial Intelligence (associated lab: CS386)

Rearranging, 1 gives

i.e. 1+ ewo > 1 / yl

i.e. Wo > ln ((1- yl) / yl)

(5)

1/(1+ ewo) < yl

Page 20: CS344: Introduction to Artificial Intelligence (associated lab: CS386)

2 Gives

1/1+ e-w2

+w0 > yu

i.e. 1+ e-w2

+w0 < 1 / yu

i.e. e-w2

+w0 < 1-yu / yu

i.e. -W2 + Wo < ln (1-yu) / yu

i.e. W2 - Wo > ln (yu / (1–yu)) (6)

Page 21: CS344: Introduction to Artificial Intelligence (associated lab: CS386)

W1 - Wo > ln (yu / (1- yu))

-W1 – W2 + Wo > ln ((1- yl)/ yl)

3 Gives

4 Gives

(7)

(8)

Page 22: CS344: Introduction to Artificial Intelligence (associated lab: CS386)

5 + 6 + 7 + 8 Gives

0 > 2ln (1- yl )/ yl + 2 ln yu / (1 – yu )

i.e. 0 > ln [ (1- yl )/ yl * yu / (1 – yu )]

i.e. ((1- yl ) / yl) * (yu / (1 – yu )) < 1

Page 23: CS344: Introduction to Artificial Intelligence (associated lab: CS386)

i. [(1- yl ) / (1- yy )] * [yu / yl] < 1

ii. 2) Yu >> 0.5

iii. 3) Yl << 0.5

From i, ii and iii; Contradiction, hence sigmoid cannot compute X-OR

Page 24: CS344: Introduction to Artificial Intelligence (associated lab: CS386)

x2 x1

h2 h1

33 cxmy

11 cxmy 22 cxmy

1221111 )( cxwxwmh

1221111 )( cxwxwmh

32211

32615 )(kxkxkchwhwOut

Can Linear Neurons Work?

Page 25: CS344: Introduction to Artificial Intelligence (associated lab: CS386)

Note: The whole structure shown in earlier slide is reducible to a single neuron with given behavior

Claim: A neuron with linear I-O behavior can’t compute X-OR.

Proof: Considering all possible cases:

[assuming 0.1 and 0.9 as the lower and upper thresholds]

For (0,0), Zero class:

For (0,1), One class:

32211 kxkxkOut

1.0.1.0)0.0.( 21

mccwwm

9.0..9.0)0.1.(

1

12

cmwmcwwm

Page 26: CS344: Introduction to Artificial Intelligence (associated lab: CS386)

For (1,0), One class:

For (1,1), Zero class:

These equations are inconsistent. Hence X-OR can’t be computed.

Observations:1. A linear neuron can’t compute X-OR.2. A multilayer FFN with linear neurons is

collapsible to a single linear neuron, hence no a additional power due to hidden layer.

3. Non-linearity is essential for power.

9.0.. 1 cmwm

9.0.. 1 cmwm