Top Banner
Neural Networks 2nd Edition Simon Haykin 柯柯柯 Chap 3. Single-Layer Perceptrons
15

Neural Networks 2nd Edition Simon Haykin 柯博昌 Chap 3. Single-Layer Perceptrons.

Jan 17, 2018

Download

Documents

Todd Booth

3 Unconstrained Optimization Techniques  Let C(w) be a continuously differentiable function of some unknown weight (parameter) vector w.  C(w) maps w into real numbers.  Goal: Find an optimal solution w* that satisfies C(w*)  C(w)  Minimize C(w) with respect to w. Necessary Condition for optimality:  C(w*)=0 (  is the gradient operator) A class of unconstrained optimization algorithm: Starting with an initial guess denoted by w(0), generate a sequence of weight vectors w(1), w(2), …, such that the cost function C(w) is reduced at each iteration of the algorithm.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Neural Networks 2nd Edition Simon Haykin 柯博昌 Chap 3. Single-Layer Perceptrons.

Neural Networks 2nd EditionSimon Haykin

柯博昌Chap 3. Single-Layer Perceptrons

Page 2: Neural Networks 2nd Edition Simon Haykin 柯博昌 Chap 3. Single-Layer Perceptrons.

2

Adaptive Filtering Problem

Dynamic System The external behavior of the system:T: {x(i), d(i); i=1, 2, …, n, …}

where x(i)=[x1(i), x2(i), …, xm(i)]T

x(i) can arise from: Spatial: x(i) is a snapshot of data. Temporal: x(i) is uniformly spaced in time.Signal-flow Graph of the

Adaptive Filter

Filtering Process y(i) is produced in response to x(i). e(i) = d(i) - y(i)

Adaptive Process Automatic Adjustment of the synaptic

weights in accordance with e(i).

m

kkk ixiwiviy

1

)()()()( Tm21

T

iwiwiw(i) where

iiiy

)(),...,(),(

)()()(

w

wx )()()( iyidie

Page 3: Neural Networks 2nd Edition Simon Haykin 柯博昌 Chap 3. Single-Layer Perceptrons.

3

Unconstrained Optimization Techniques

Let C(w) be a continuously differentiable function of some unknown weight (parameter) vector w.

C(w) maps w into real numbers. Goal: Find an optimal solution w* that satisfies C(w*)C(w) Minimize C

(w) with respect to w.Necessary Condition for optimality: C(w*)=0 ( is the gradient operator)

T

mwww

,...,,21

T

mwC

wC

wCC

,...,,21

w

A class of unconstrained optimization algorithm:Starting with an initial guess denoted by w(0), generate a sequence of weight vectors w(1), w(2), …, such that the cost function C(w) is reduced at each iteration of the algorithm.

Page 4: Neural Networks 2nd Edition Simon Haykin 柯博昌 Chap 3. Single-Layer Perceptrons.

4

Method of Steepest Descent

The successive adjustments applied to w are in the direction of steepest descent, that is, in a direction opposite to the gradient vector C(w).

wg CLetThe steepest descent algorithm: w(n+1)=w(n)-g(n)

: a positive constant called the stepsize or learning-rate parameter.w(n) = w(n+1) - w(n) = -g(n)

Small Overdamp the transient response.

Large Underdamp the transient response.

If exceeds a certain value, the algorithm becomes unstable.

Page 5: Neural Networks 2nd Edition Simon Haykin 柯博昌 Chap 3. Single-Layer Perceptrons.

5

Newton’s Method

Applying second-order Taylor series expansion of C(w) around w(n).

nnHnnn

nCnCnC

TT wwwg

www

21

1

2

2

2

2

1

2

2

2

22

2

12

21

2

21

2

21

2

2

mmm

m

m

wC

wwC

wwC

wwC

wC

wwC

wwC

wwC

wC

CH

w

C(w) is minimized when

0

nnHn

nnC wg

ww

nnHn gw 1

nnHn

nnn

gwwww

1

1

Generally speaking, Newton’s method converges quickly

Minimize the quadratic approximation of the cost function C(w) around the current point w.

Page 6: Neural Networks 2nd Edition Simon Haykin 柯博昌 Chap 3. Single-Layer Perceptrons.

6

Gauss-Newton Method

Let

n

i

ieC1

2

21w

nm

m

m

wne

wne

wne

we

we

we

we

we

we

n

ww

J

21

21

21222

111

n1,2,...,i nieieieT

n

,)(),()(

www

www

Te(n)e(2),...,e(1),(n) wherennnn ewwJewe ,)(),(

Gauss-Newton method is applicable to a cost function C(w) that is the sum of error squares.

The Jacobian J(n) is [e(n)]T )(),...,2(),1()( neeen e

Goal:

2),(

21arg1 wew

wnmin n

Page 7: Neural Networks 2nd Edition Simon Haykin 柯博昌 Chap 3. Single-Layer Perceptrons.

7

Gauss-Newton Method (Cont.)

nnnnnnnnn

scalars.are them of both and nnnnnn

nnnnnnnnnnn

nnnnnn

nnnnnn

nnn

TTTT

TTTT

TTTTTT

TTT

T

T

wwJJwwwwJeewe

eJwwwwJe

wwJJwweJwwwwJee

wwJeJwwe

wwJewwJe

wewewe

21)()(

21),(

21

)()(

)()()(21

)()(21

)()(21

),(),(21),(

21

22

2

2

Differentiating this expression with respect to w and setting the result to be zero. 0)( nnnnn TT wwJJeJ )(1

1nnnnnn TT eJJJww

To guard against the possibility that J(n) is rank deficient.

)(11

nnnnnn TT eJIJJww

Page 8: Neural Networks 2nd Edition Simon Haykin 柯博昌 Chap 3. Single-Layer Perceptrons.

8

Linear Least-Squares Filter

Characteristics of Linear Least-Squares Filter– The single neuron around which it is built is linear.– The cost function C(w) consists of the sum of error squares.

nnn

nnnn T

wXdwxxxde

)()()(),...,2(),1()()(

where d(n)=[d(1), d(2),…, d(n)]T X(n)=[x(1), x(2),…, x(n)]T

)()()( nnnn TXe

we

Substituting it into equation derived from Gauss-Newton Method

)()( nn XJ

)(

)(11

1

nnnn

nnnnnnnnTT

TT

dXXX

wXdXXXww

nnnn Let TT XXXX 1 )(1 nnn dXw

Page 9: Neural Networks 2nd Edition Simon Haykin 柯博昌 Chap 3. Single-Layer Perceptrons.

9

Wiener Filter Limiting form of the Linear Least-Squares Filter for an Ergodic Environment

Let w0 denote the Wiener solution to the linear optimum filtering problem.

d

T

n

T

n

TT

nn

nnnn

nnnnn

xx rR

dXXX

dXXXww

1

1

10

)(limlim

)(lim1lim

Let Rx denote the Correlation Matrix of input vector x(i).

)()(1lim)()(1lim)()(1

nnn

iin

iiE T

n

n

i

T

n

T XXxxxxRx

Let rxd denote the Cross-correlation Vector of x(i) and d(i).

)()(1lim)()(1lim)()(1

nnn

idin

idiEr T

n

n

ind dXxxx

Page 10: Neural Networks 2nd Edition Simon Haykin 柯博昌 Chap 3. Single-Layer Perceptrons.

10

nennnnC n

nne xg

wx

w

ˆ

Least-Mean-Square (LMS) Algorithm

neC 2

21

wLMS is based on instantaneous values for the cost function

e(n) is the error signal measured at time n.

www

neneC nnndne because T wx

nennn xww ˆ1ˆ

is used in place of w(n) to emphasize that LMS produces an estimate of w that result from the method of steepest descent. nw

n(n)e(n)1)(n (n)(n)-d(n)e(n)

compute,1,2,n For n.Computatio (0) Settion.Initializa

:parameter selected-Userd(n) response Desired

(n) vector signalInput: SampleTraining

T

xwwxw

0w

x

ˆˆˆ

ˆSummary of the LMS Algorithm

Page 11: Neural Networks 2nd Edition Simon Haykin 柯博昌 Chap 3. Single-Layer Perceptrons.

11

Virtues and Limitations of LMS

Virtues– Simplicity

Limitations– Slow rate of convergence– Sensitivity to variations in the eigenstructure of th

e input

Page 12: Neural Networks 2nd Edition Simon Haykin 柯博昌 Chap 3. Single-Layer Perceptrons.

12

Learning Curve

Page 13: Neural Networks 2nd Edition Simon Haykin 柯博昌 Chap 3. Single-Layer Perceptrons.

13

Learning Rate Annealing

Normal Approach: n all for n 0

Stochastic Approximation: constant a is c ncn

There is a danger of parameter blowup for small n when c is large.

Search-then-converge schedule: constants are and n

n 0

/1

0

Page 14: Neural Networks 2nd Edition Simon Haykin 柯博昌 Chap 3. Single-Layer Perceptrons.

14

Perceptron

bxwvm

iii

1

x1

x2

xm

......

Bias, b

vk Outputyk

w1w2

wm

Hard liniterInputs

Let x0=1 and b=w0

nn

nxnwnv

T

m

iii

xw

0

The simplest form used for the classification of patterns said to be linearly separable.

Goal: Classify the set {x(1), x(2), …, x(n)} into one of two classes, C1 or C2.

Decision Rule: Assign x(i) to class C1 if y=+1 and to class C2 if y=-1.

wTx > 0 for every input vector x belonging to class C1

wTx 0 for every input vector x belonging to class C2

Page 15: Neural Networks 2nd Edition Simon Haykin 柯博昌 Chap 3. Single-Layer Perceptrons.

15

Perceptron (Cont.)

Algorithms:

1. w(n+1)=w(n) if wTx(n) > 0 and x(n) belongs to class C1

w(n+1)=w(n) if wTx(n) 0 and x(n) belongs to class C2

2. w(n+1)=w(n)-(n)x(n) if wTx(n) > 0 and x(n) belongs to class C2

w(n+1)=w(n)+(n)x(n) if wTx(n) 0 and x(n) belongs to class C1

Let

2

1

C class to belongs (n) ifC class to belongs (n) if

ndxx

11

w(n+1) = w(n) + [d(n)-y(n)]x(n) (Error-correction learning rule form)

Smaller provides stable weight estimates. Larger provides fast adaption.