Top Banner
Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar Behera Department of Electrical Engineering Indian Institute of Technology, Kanpur Recurrent Networks – p.1/40
40

Intelligent Control - IITK - Indian Institute of ...home.iitk.ac.in/~lbehera/Files/lec7.pdf · Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar

Apr 15, 2018

Download

Documents

lykhanh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Intelligent Control - IITK - Indian Institute of ...home.iitk.ac.in/~lbehera/Files/lec7.pdf · Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar

Intelligent Control

Module I- Neural NetworksLecture 7

Adaptive Learning Rate

Laxmidhar Behera

Department of Electrical EngineeringIndian Institute of Technology, Kanpur

Recurrent Networks – p.1/40

Page 2: Intelligent Control - IITK - Indian Institute of ...home.iitk.ac.in/~lbehera/Files/lec7.pdf · Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar

Subjects to be covered

Motivation for adaptive learning rate

Lyapunov Stability Theory

Training Algorithm based on Lyapunov StabilityTheory

Simulations and discussion

Conclusion

Recurrent Networks – p.2/40

Page 3: Intelligent Control - IITK - Indian Institute of ...home.iitk.ac.in/~lbehera/Files/lec7.pdf · Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar

Training of a Feed Forward Network

x2

1x

WW

y

Figure 1: A feed-forward network

Here, W ∈ RM is the weight vector. The training dataconsists of, say, N patterns, xp, yp, p = 1, 2, ..., N .

Weight update law: W (t+1) = W (t)−η∂E

∂W, η : learning rate

Recurrent Networks – p.3/40

Page 4: Intelligent Control - IITK - Indian Institute of ...home.iitk.ac.in/~lbehera/Files/lec7.pdf · Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar

Motivation for adaptive learning rate

-10 -5 0 5 10x

0

10

20

30

40

50

60

70

f(x)

ActualAdaptive learning rate

x0=-6.7

Figure 2: Convergence to global minimum

With adaptive learning rate, one can employ a higherlearning rate when the error is far from global minimumand a smaller learning rate when it is near to it.

Recurrent Networks – p.4/40

Page 5: Intelligent Control - IITK - Indian Institute of ...home.iitk.ac.in/~lbehera/Files/lec7.pdf · Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar

Adaptive Learning Rate

The objective is to achieve global convergence for anon-quadratic, non-convex nonlinear function withoutincreasing the computational complexity.

In GD, learning rate is fixed. If one can have a largerlearning rate for a point far away from global minimumand a smaller learning rate for a point closer to globalminimum, then it would be possible to avoid localminima and ensure global convergence. Thisnecessitates need of adaptive learning rate.

Recurrent Networks – p.5/40

Page 6: Intelligent Control - IITK - Indian Institute of ...home.iitk.ac.in/~lbehera/Files/lec7.pdf · Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar

Lyapunov Stability Theory

Used extensively in control system problems.

If we choose a Lyapunov function candidate V (x(t), t)such that

V (x(t), t) is positive definite

V (x(t), t) is negative definitethen the system is asymptotically stable.

Local Invariant Set Theorem (La Salle)Consider an autonomous system of the form x = f(x)with f continuous, and let V (x) be a scalar functionwith continuous partial derivatives. Assume that

* for some l > 0, the region Ωl defined by V (x) < lis bounded.

Recurrent Networks – p.6/40

Page 7: Intelligent Control - IITK - Indian Institute of ...home.iitk.ac.in/~lbehera/Files/lec7.pdf · Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar

Lyapunov stability theory: contd...

* V (x) < 0 for all x in Ωl.

Let R be the set of all points within Ωl where V (x) = 0, andM be the largest invariant set in R. Then, every solutionx(t) originating in Ωl tends to M as t → ∞.

Problem lies in choosing a proper Lyapunov functioncandidate.

Recurrent Networks – p.7/40

Page 8: Intelligent Control - IITK - Indian Institute of ...home.iitk.ac.in/~lbehera/Files/lec7.pdf · Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar

Weight update law using Lyapunov based approach

The network output is given by

yp = f(W , xp) p = 1, 2, . . . N (1)

The usual quadratic cost function is given as:

E =1

2

N∑

p=1

(yp − yp)2 (2)

Let’s choose a Lyapunov function candidate for the systemas below:

V =1

2(yT y) (3)

where y = [y1 − y1, ....., yp − yp, ....., yN − yN ]T .Recurrent Networks – p.8/40

Page 9: Intelligent Control - IITK - Indian Institute of ...home.iitk.ac.in/~lbehera/Files/lec7.pdf · Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar

LF I Algorithm

The time derivative of the Lyapunov function V is given by

V = −y∂y

∂WW = −yT JW (4)

where

J =∂y

∂WJ ∈ RN×M

Theorem 1. If an arbitrary initial weight W (0) is updated by

W (t′) = W (0) +

∫ t′

0

W dt (5)where

W =‖ y ‖2

‖ JT y ‖2 +εJT y (6)

where ε is a small positive constant, then y converges to zerounder the condition that W exists along the convergencetrajectory. Recurrent Networks – p.9/40

Page 10: Intelligent Control - IITK - Indian Institute of ...home.iitk.ac.in/~lbehera/Files/lec7.pdf · Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar

Proof of LF - I Algorithm

Proof. Substitution of Eq. (6) into Eq. (4) yields

V1 = − ‖ y ‖2 ‖ JT y ‖2

‖ JT y ‖2 +ε≤ 0 (7)

where V1 < 0 for all y 6= 0. If V1 is uniformly continuous andbounded, then according to Barbalat’s lemma as t → ∞, V1 → 0and y → 0.

Recurrent Networks – p.10/40

Page 11: Intelligent Control - IITK - Indian Institute of ...home.iitk.ac.in/~lbehera/Files/lec7.pdf · Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar

LF - I Algorithm: contd...

The weight update law is a batch update law. Theinstantaneous LF I learning algorithm can be derived as:

W =‖ y ‖2

‖ JiT y ‖2

JiT y (8)

where y = yp − yp ∈ R and Ji = ∂yp

∂W∈ R(1×M). The

difference equation representation of the weight updateequation is given by

W (t + 1) = W (t) + µW (t) (9)

Here µ is a constant.

Recurrent Networks – p.11/40

Page 12: Intelligent Control - IITK - Indian Institute of ...home.iitk.ac.in/~lbehera/Files/lec7.pdf · Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar

Comparison with BP Algorithm

In gradient descent method we have,

4W = −η∂E

∂W= ηJi

T y

W (t + 1) = W (t) + ηJiT y (10)

The update equation for LF-I algorithm:

W (t + 1) = W (t) +(µ

‖ y ‖2

‖ JiT y ‖2

)Ji

T y

Comparing above two equations, we find that the fixedlearning rate η in BP algorithm is replaced by its adaptiveversion ηa:

ηa =(µ

‖ y ‖2

‖ JiT y ‖2

)(11)

Recurrent Networks – p.12/40

Page 13: Intelligent Control - IITK - Indian Institute of ...home.iitk.ac.in/~lbehera/Files/lec7.pdf · Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar

Adaptive Learning rate of LF-I

0 100 200 300 400No of iterations (4xno. of epochs)

0

10

20

30

40

50

Lear

ning

rate

LF - I : XOR

Learning rate is not fixed unlike BP algorithm.

Learning rate goes to zero as error goes to zero.Recurrent Networks – p.13/40

Page 14: Intelligent Control - IITK - Indian Institute of ...home.iitk.ac.in/~lbehera/Files/lec7.pdf · Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar

Convergence of LF-I

The theorem states that the global convergence of LF-I isguaranteed provided W exists along the convergencetrajectory. This, in turn, necessitates ‖ ∂V1

∂W‖=‖ JT y ‖6= 0.

‖ ∂V1

∂W‖= 0, indicates a local minimum of the error function.

Thus, the theorem only says that the global minimum isreached only when local minima are avoided duringtraining.

Since instantaneous update rule introduces noise, it maybe possible to reach global minimum in some cases,however, the global convergence is not guaranteed.

Recurrent Networks – p.14/40

Page 15: Intelligent Control - IITK - Indian Institute of ...home.iitk.ac.in/~lbehera/Files/lec7.pdf · Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar

LF II Algorithm

We consider following Lyapunov function

V2 =1

2(yT y + λWTW)

= V1 +λ

2WTW (12)

where λ is a positive constant. The time derivative ofabove equation is given by

V2 = −yT ∂y

∂WW + λWTW

= −yT (J − D)W (13)

where J = ∂y

∂W: N × m is the Jacobian matrix, and

D = λ 1‖y‖2 yWT ∈ RN×m

Recurrent Networks – p.15/40

Page 16: Intelligent Control - IITK - Indian Institute of ...home.iitk.ac.in/~lbehera/Files/lec7.pdf · Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar

LF II Algorithm: contd...

Theorem 2. If the update law for weight vector W follows adynamics given by following nonlinear differential equation

W = α(W)JT y − α(W)W (14)

where α(W) = ‖y‖2

‖JT y‖2+εis a scalar function of weight vector W

and ε is a small positive constant, then y converges to zero underthe condition that (J − D)T y is non-zero along the convergencetrajectory.

Recurrent Networks – p.16/40

Page 17: Intelligent Control - IITK - Indian Institute of ...home.iitk.ac.in/~lbehera/Files/lec7.pdf · Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar

Proof of LF II algorithm

Proof. W = α(W)JT y − α(W)W may be rewritten as

W =‖y‖2

‖JT y‖2 + ε(J − D)T y (15)

Substituting for W from above equation intoV2 = −yT (J − D)W, we get

V2 = −‖ y ‖2

‖ JT y ‖2 +ε‖ (J − D)T y ‖2≤ 0 (16)

Since (J − D)T y is non-zero, V2 < 0 for all y 6= 0 and V2 = 0

iff y = 0. If V2 is uniformly continuous and bounded, thenaccording to Barbalat’s lemma as t → ∞, V2 → 0 andy → 0. Recurrent Networks – p.17/40

Page 18: Intelligent Control - IITK - Indian Institute of ...home.iitk.ac.in/~lbehera/Files/lec7.pdf · Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar

Proof of LF II algorithm: contd...

The instantaneous weight update equation using LF IIalgorithm can be finally expressed in difference equationmodel as follows:

W(t + 1) = W(t) +(µ

‖y‖2

‖JpT y‖2 + ε

)(Jp − D)T y

= W(t) + µ‖y‖2

‖JpT y‖2 + ε

JpT y

−µ1W(t)

‖JpT y‖2 + ε

(17)

where µ1 = µλ and the acceleration W(t) is computed as:

W(t) =1

(4t)2[W(t) − 2W(t − 1) + W(t − 2)]

and 4t is taken to be one time unit for simulation. Recurrent Networks – p.18/40

Page 19: Intelligent Control - IITK - Indian Institute of ...home.iitk.ac.in/~lbehera/Files/lec7.pdf · Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar

Comparison with BP Algorithm

Applying gradient-descent to V2 = V1 + λ2WTW,

4W = −η

(∂V2

∂W

)T

= −η

(∂V1

∂W

)T

− η

[d

dW(λ

2WTW)

]T

= η

(∂y

∂W

)T

y − ηλW

Thus, the weight update equation for gradient descentmethod may be written as

W(t + 1) = W(t) + η′JpT y − µ′W

︸︷︷︸

acceleration term

(18)Recurrent Networks – p.19/40

Page 20: Intelligent Control - IITK - Indian Institute of ...home.iitk.ac.in/~lbehera/Files/lec7.pdf · Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar

Adaptive learning rate and adaptive acceleration

Comparing the two updates law, the adaptive learning ratein this case is given by

η′a = µ

‖y‖2

‖JpT y‖2 + ε

(19)

and the adaptive acceleration rate is given by

µ′a =

λ

‖JpT y‖2 + ε

(20)

Recurrent Networks – p.20/40

Page 21: Intelligent Control - IITK - Indian Institute of ...home.iitk.ac.in/~lbehera/Files/lec7.pdf · Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar

Convergence of LF II

The global minimum of V2 in is given by

y = 0, W = 0 (y ∈ Rn, W ∈ Rm)

Global minimum can be reached provided W does notvanish along the convergence trajectory.

Analyzing local minima conditions:

W vanishes under following conditions

1. First condition: J = D (J, D ∈ Rn×m)In case of neural networks, it is very unlikely that eachelement of J would be equal to that of D, thus thispossibility can easily be ruled out for a multi-layerperceptron network.

Recurrent Networks – p.21/40

Page 22: Intelligent Control - IITK - Indian Institute of ...home.iitk.ac.in/~lbehera/Files/lec7.pdf · Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar

Convergence of LF II: contd...

2. Second Condition: W vanishes whenever

(J − D)T y = 0

Assuming J 6= D, Rank ρ(J − D) = n ensures globalconvergence.

3. Third Condition:

JT y = DT y = λW

Solutions of above equation represent localminimaThe solution to above equation exists for everyvector W ∈ Rm whenever rank ρ(J) = m

Recurrent Networks – p.22/40

Page 23: Intelligent Control - IITK - Indian Institute of ...home.iitk.ac.in/~lbehera/Files/lec7.pdf · Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar

Convergence of LF II: contd...

For NN, n ≤ m ⇒ ρ(J) ≤ n. Hence there are at leastm− n vectors W ∈ Rm for which solutions do not existand hence local minima do not occur.

Thus, increasing no. of hidden layers or hiddenneurons (i.e, increasing m), chances of encounteringlocal minima can be reduced.

Increasing the number of output neurons increasesboth m and n as well as n/m.Thus, for MIMO systems, there are more local minima(for fixed number of weights) as compared to singleoutput systems.

Recurrent Networks – p.23/40

Page 24: Intelligent Control - IITK - Indian Institute of ...home.iitk.ac.in/~lbehera/Files/lec7.pdf · Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar

Avoiding local minima

V1

W

local minimum

global minimum

ABC D

tt − 1

t − 2

t + 14W(t − 1)

4W(t)

4W(t + 1)

Recurrent Networks – p.24/40

Page 25: Intelligent Control - IITK - Indian Institute of ...home.iitk.ac.in/~lbehera/Files/lec7.pdf · Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar

Avoiding local minima: contd...

Rewrite the update law for LF-II as

W(t + 1) = W(t) + 4W(t + 1) = W(t) − η′ ∂V1

∂W(t) − µ′W(t)

Consider point B (at time t − 1):The weight update for the interval (t − 1, t]computed at this instant4W(t) = 4W1(t − 1) + 4W2(t − 1).4W1(t − 1) = −η ∂V1

∂W(t − 1) > 0

4W2(t − 1) = −µW(t − 1) =−µ(4W (t − 1) −4W (t − 2)) > 0

It is to be noted that 4W (t − 1) < 4W (t − 2) asthe velocity is decreasing towards the point oflocal minimum.4W (t) > 0, hence speed increases. Recurrent Networks – p.25/40

Page 26: Intelligent Control - IITK - Indian Institute of ...home.iitk.ac.in/~lbehera/Files/lec7.pdf · Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar

Avoiding local minima: contd...

Consider point ’A’ (at t):Weight increments

4W1(t) = −η∂V1

∂W(t) = 0

4W2(t) = −µW(t) = −µ(4W (t) −4W (t − 1)) > 0

4W (t) < 4W (t − 1) ⇒ 4W2(t) > 0

4W(t + 1) = 4W1(t) + 4W2(t) > 0

This helps in avoiding local minimum

Recurrent Networks – p.26/40

Page 27: Intelligent Control - IITK - Indian Institute of ...home.iitk.ac.in/~lbehera/Files/lec7.pdf · Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar

Avoiding local minima: contd...Consider point ’D’ (at instant t + 1):

Weight contributions

4W1(t + 1) = −η∂V1

∂W(t + 1) < 0

4W2(t + 1) = −µW(t + 1)

= −µ(4W (t + 1) −4W (t)) > 0

contribution due to BP term becomes negative as theslope ∂V1

∂W> 0 on the right hand side of local minimum.

4W (t + 1) < 4W (t)

4W(t + 2) = 4W1(t + 1) + 4W2(t + 1) > 0 if4W2(t + 1) > 4W1(t + 1)

Thus it is possible to avoid local minima by properlychoosing µ. Recurrent Networks – p.27/40

Page 28: Intelligent Control - IITK - Indian Institute of ...home.iitk.ac.in/~lbehera/Files/lec7.pdf · Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar

Simulation results - LF-I vs LF-II: XOR

0 10 20 30 40 50Runs

50

100

150

200

250

300

Trai

ning

Epo

chs

LF I (λ=0.0, µ=0.55)LF II (λ=0.015, µ=0.65)

XOR

Figure 3: performance comparison for XOR

Observation: LF II provides tangible improvement over LF Iboth in terms of convergence time and training epochs.

Recurrent Networks – p.28/40

Page 29: Intelligent Control - IITK - Indian Institute of ...home.iitk.ac.in/~lbehera/Files/lec7.pdf · Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar

LF I vs LF II: 3-bit parity

0 10 20 30 40 50Runs

0

500

1000

1500

2000

2500

3000

Trai

ning

epo

chs

LF I (λ=0.0, µ=0.47)LF II (λ=0.03, µ=0.47)

3-bit Parity

Figure 4: performance comparison for 3-bit parity

Observation: LF II performs better than LF I both in termsof computation time and training epochs Recurrent Networks – p.29/40

Page 30: Intelligent Control - IITK - Indian Institute of ...home.iitk.ac.in/~lbehera/Files/lec7.pdf · Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar

LF I vs LF II: 8-3 Encoder

0 10 20 30 40 50Runs

0

50

100

150

Trai

ning

epo

chs

LF I (λ=0.0, µ=0.46)LF II (λ=0.01, µ=0.465)

8-3 Encoder

Figure 5: comparison for 8-3 encoder

Observation: LF II takes minimum epochs in most of theruns Recurrent Networks – p.30/40

Page 31: Intelligent Control - IITK - Indian Institute of ...home.iitk.ac.in/~lbehera/Files/lec7.pdf · Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar

LF I vs LF II: 2D Gabor function

0 10000 20000 30000Iterations (training data points)

0

0.1

0.2

0.3

0.4

0.5

rms t

rain

ing

erro

r

LF I (µ=0.8, λ=0.0)LF II (µ=0.8, λ=0.6)

2D Gabor Function

Figure 6: performance comparison for 2D Gabor

functionObservation: With increasing iterations, the performance ofLF II improves as compared to LF I

Recurrent Networks – p.31/40

Page 32: Intelligent Control - IITK - Indian Institute of ...home.iitk.ac.in/~lbehera/Files/lec7.pdf · Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar

Simulation Results - Comparison: contd...

XOR

Algorithm epochs time (sec) parametersBP 5620 0.0578 η = 0.5

BP 3769 0.0354 η = 0.95

EKF 3512 0.1662 λ = 0.9

LF-I 165 0.0062 µ = 0.55

LF-II 120 0.0038 µ = 0.65, λ = 0.01

Recurrent Networks – p.32/40

Page 33: Intelligent Control - IITK - Indian Institute of ...home.iitk.ac.in/~lbehera/Files/lec7.pdf · Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar

Comparison among BP, EKF and LF-II

0 10 20 30 40 50Run

0

0.1

0.2

0.3

0.4

Conv

erge

nce

time

(sec

onds

)

BPEKFLF - II

Observation: LF takes almost same time for any arbitrary

initial condition.

Recurrent Networks – p.33/40

Page 34: Intelligent Control - IITK - Indian Institute of ...home.iitk.ac.in/~lbehera/Files/lec7.pdf · Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar

Comparison among BP, EKF and LF: contd...

3-bit Parity

Algorithm epochs time (sec) parametersBP 12032 0.483 η = 0.5

BP 5941 0.2408 η = 0.95

EKF 2186 0.4718 λ = 0.9

LF-I 1338 0.1176 µ = 0.47

LF-II 738 0.0676 µ = 0.47, λ = 0.03

Recurrent Networks – p.34/40

Page 35: Intelligent Control - IITK - Indian Institute of ...home.iitk.ac.in/~lbehera/Files/lec7.pdf · Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar

Comparison among BP, EKF and LF: contd...

8-3 Encoder

Algorithm epochs time (sec) parametersBP 326 0.044 η = 0.7

BP 255 0.0568 η = 0.9

LF-I 72 0.0582 µ = 0.46

LF-II 42 0.051 µ = 0.465, λ = 0.01

Recurrent Networks – p.35/40

Page 36: Intelligent Control - IITK - Indian Institute of ...home.iitk.ac.in/~lbehera/Files/lec7.pdf · Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar

Comparison among BP, EKF and LF: contd...

2D Gabor function

Algorithm No. of Centers rms error/run parametersBP 40 0.0847241 η1,2 = 0.2

BP 80 0.0314169 η1,2 = 0.2

LF-I 40 0.0192033 µ = 0.8

LF-II 40 0.0186757 µ = 0.8, λ = 0.3

Recurrent Networks – p.36/40

Page 37: Intelligent Control - IITK - Indian Institute of ...home.iitk.ac.in/~lbehera/Files/lec7.pdf · Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar

Discussion

Global convergence of Lyapunov based learning Algorithms

Consider the following Lyapunov function candidate:

V2 = µV1 +1

2σ‖

∂V1

∂W‖2; where V1 =

1

2yT y (21)

The objective is to select an weight update law W suchthat the global minimum (V1 = 0 and ∂V1

∂W= 0), is reached.

The rate derivative of the Lyapunov function V2 is given as:

V2 =∂V1

∂W[µI + σ

∂2V1

∂W ∂W T]W (22)

Recurrent Networks – p.37/40

Page 38: Intelligent Control - IITK - Indian Institute of ...home.iitk.ac.in/~lbehera/Files/lec7.pdf · Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar

If the weight update law W is selected as

W = −[µI+σ∂2V1

∂W ∂W T]−1 ( ∂V1

∂W)T

‖ ∂V1

∂W‖2

(ζ‖∂V1

∂W‖2+η‖V1‖

2) (23)

with ζ > 0 and η > 0, then

V2 = −ζ‖∂V1

∂W‖2 − η‖V1‖

2 (24)

which is negative definite with respect to V1 and ∂V1

∂W. Thus,

V2 will finally converge to its equilibrium point given by V1 =

0 and ∂T V1

∂W= 0.

Recurrent Networks – p.38/40

Page 39: Intelligent Control - IITK - Indian Institute of ...home.iitk.ac.in/~lbehera/Files/lec7.pdf · Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar

But the implementation of the weight update algorithmbecomes very difficult due to the presence of aHessian term ∂2V1

∂W ∂WT .

Thus, the above algorithm is of theoretical interest.

The above weight update algorithm is similar to BPlearning algorithm with a fixed learning rate.

Recurrent Networks – p.39/40

Page 40: Intelligent Control - IITK - Indian Institute of ...home.iitk.ac.in/~lbehera/Files/lec7.pdf · Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar

Conclusion

LF Algorithms perform better than both EKF and BPalgorithms in terms of speed and accuracy.

LF II avoids local minima to a greater extent ascompared to LF I.

It is seen that by choosing a proper networkarchitecture, it is possible to reach global minimum.

LF-I Algorithm has an interesting parallel withconventional BP algorithm where the the fixedlearning rate of BP is replaced by an adaptive learningrate.

Recurrent Networks – p.40/40