Function Approximation spring 2006 1 Function Approximation Fariba Sharifian Somaye Kafi.

Function Approximation spring 2006

1

Function Approximation

Fariba Sharifian

Somaye Kafi


2

Contents Introduction to Counterpropagation Full Counterpropagation

Architecture Algorithm Application example

Forward only Counterpropagation Architecture Algorithm Application example


3

Contents

Function Approximation Using Neural Network Introduction Development of Neural Network Weight Equations Algebra Training Algorithms

Exact Matching of Function Input –Output Data Approximate Matching of Gradient Data in Algebra

Training Approximate Matching of Function Input-Output Data Exact Matching of Function Gradient Data


4

Introduction to Counterpropagation

are multilayer networks based on combination of input, clustering and output layers

can be used to compress data, to approximate functions, or to associate patterns

approximate its training input vectors pair by adoptively constructing a lookup table


5

Introduction to Counterpropagation (cont.)

training has two stages Clustering Output weight updating

There are two types of it Full Forward only


6

Full Counterpropagation

Produces an approximation x*:y* based on

input of an x vector input of a y vector only input of an x:y ,possibly with some distorted or missing

elements in either or both vectors.


7

Full Counterpropagation (cont.)

Phase 1 The units in the cluster layer compete. The learning rule for

weight updates on the winning cluster unit is (only the winning unit is allowed to learn)

learning)Kohonen standard is (This

,...,2,1

,...,2,1

mkuywu

niwxwwold

kJk

old

kJ

new

kJ

old

iJi

old

iJ

new

iJ


8

Full Counterpropagation (cont.)

Phase 2 The weights from the winning cluster unit J to the output units are

adjusted so that the vector of activations of the units in the Y output layer, y*, is an approximation to the input vector y; x*, is an approximation to the input vector x. The weight updates for the units in the Y output and X output layers are

learning) Grossberg asknown is (This

,...,2,1

,...,2,1

nitxbtt

mkvyavvold

Jii

old

Ji

new

Ji

old

Jkk

old

Jk

new

Jk


9

Architecture of Full Counterpropagation

X1

Xi

Xn

Z1

Zj

Zp

Y1

Yk

Ym

Y1*

Yk*

Ym*

X1*

Xi*

Xn*

Hidden layer uw

t

Cluster layer

v


10

Full Counterpropagation Algorithm

learning) (Grossberglayer cluster fromout for weight rates learning:,

learning)(Kohonen layer cluster into sfor weight rates learning : ,

unit layer,output to,unit layer,cluster fromweight :

unit layer,output to,unit layer,cluster fromweight :

unit Z layer,cluster to,unit layer,input fromweight :

unit Z layer,cluster to,unit layer,input fromweight :

y vector ion toapproximat computed:

vector xion toapproximat computed:

unit Zlayer cluster of activation:

)(y :input x toingcorrespondoutput target :y

),...,,...,( x: vector ninginput trai :x

*

*

1

1

ba

XXZt

YYZv

YYu

XXw

y

x

z

,...,y,...,yy

xxx

*

ijjk

*

kjjk

jkkj

jiij

jj

mk

ni


11

Full Counterpropagation Algorithm (phase 1) Step 1. Initialize weights, learning rates, etc. Step 2. While stopping condition for Phase 1 is false, do Step 3-8 Step 3. For each training input pair x:y, do Step 4-6 Step 4. Set X input layer activations to vector x ; set Y input layer activations to vector y. Step 5. Find winning cluster unit; call its index J Step 6. Update weights for unit ZJ:

Step 7. Reduce learning rate and . Step 8. Test stopping condition for Phase 1 training


12

Full Counterpropagation algorithm(phase 2)

Step 9. While stopping condition for Phase 2 is false, do Step 10-16

(Note: and are small, constant values during phase 2)

Step 10. For each training input pair x:y, do Step 11-14

Step 11. Set X input layer activations to vector x ; set Y input layer activations to vector y. Step 12. Find winning cluster unit; call its index J Step 13. Update weights for unit ZJ:


13

Full Counterpropagation Algorithm(phase 2)(cont.)

Step 14. Update weights from unit ZJ to the output layers

Step 15. Reduce learning rate a and b. Step 16. Test stopping condition for Phase

2 training.


14

Which cluster is the winner? dot product (find the cluster with the largest net input)

Euclidean distance (find the cluster with smallest square distance from the input)

i k

kjkijij uywxnet

i k

kjkijij uywxD 22


15

Full Counterpropagation Application

The application for counterpropagation is as follows:

Step0: initialize weights. step1: for each input pair x:y, do step 2-4. Step2: set X input layer activation to vector x

set Y input layer activation to vector Y;


16

Full Counterpropagation Application (cont.)

Step3: find cluster unit Z, that is closest to the input pair

Step4: compute approximations to x and y:

X*i=tji

Y*k=ujk


17

Full counterpropagation example Function approximation of y=1/x After training phase we have

Cluster unit v w z1 0.11 9.0 z2 0.14 7.0 z3 0.20 5.0 z4 0.30 3.3 z5 0.60 1.6 z6 1.60 0.6 z7 3.30 0.3 z8 5.00 0.2 z9 7.00 0.14 z10 9.00 0.11


18

Full counterpropagation example (cont.)

X1

Z1

Z2

Z10

Y1

Y1* X1

*

.

.

.

0.11 9.0

9.0 0.11

0.14 7.0

0.25.0

0.14

0.2

7.0

5.0


19

Full counterpropagation example (cont.) To approximate value for y for x=0.12 As we don’t know any thing about y compute D just by means of x D1=(.12-.11)2 =.0001 D2=.0004 D3=.064 D4=.032 D5=.23 D6=2.2 D7=10.1 D8=23.8 D9=47.3 D10=81


20

Forward Only Counterpropagation

Is a simplified version of the full counterpropagation

Are intended to approximate y=f(x) function that is not necessarily invertible

It may be used if the mapping from x to y is well defined, but the mapping from y to x is not.


21

Forward Only Counterpropagation Architecture

X1

Xi

Xn

Z1

Zj

Zp

Y1

Yk

Ym

Input layer Cluster layer Output layer

XYw

XY

u


22

Forward Only Counterpropagation Algorithm

Step 1. Initialize weights, learning rates, etc. Step 2. While stopping condition for Phase 1 is false, do Step 3-8 Step 3. For each training input x, do Step 4-6 Step 4. Set X input layer activations to vector x Step 5. Find winning cluster unit; call its index j Step 6. Update weights for unit ZJ:

Step 7. Reduce learning rate Step 8. Test stopping condition for Phase 1 training.

n iwxww oldiJi

oldiJ

newiJ 1,2,...,,


23

Step 9. While stopping condition for Phase 2 is false, do Step 10-16 (Note: is small, constant values during phase 2) Step 10. For each training input pair x:y, do Step 11-14 Step 11. Set X input layer activations to vector x ; set Y input layer activations to vector y. Step 12. Find winning cluster unit; call its index J Step 13. Update weights for unit ZJ ( is small)

Step 14. Update weights from unit ZJ to the output layers

Step 15. Reduce learning rate a. Step 16. Test stopping condition for Phase 2 training.

n iwxww old

iJi

old

iJ

new

iJ 1,2,...,,

.,...,2,1 , mkuyauu old

Jkk

old

Jk

new

Jk


24

Forward Only Counterpropagation Application

Step0: initialize weights (by training in previous subsection).

Step1: present input vector x. Step2: find unit J closest to vector x. Step3: set activation output units:

yk=ujk


25

Forward only counterpropagation example Function approximation of y=1/x After training phase we have

Cluster unit w u z1 0.5 5.5 z2 1.5 0.75 z3 2.5 0.4 z4 . . z5 . . z6 . . z7 . . z8 . . z9 . . z10 9.5 0.1


26

Function ApproximationUsing Neural Network

IntroductionDevelopment of Neural Network Weight EquationsAlgebra Training Algorithms

Exact Matching of Function Input –Output DataApproximate Matching of Gradient Data in Algebra TrainingApproximate Matching of Function Input-Output DataExact Matching of Function Gradient Data


27

Introduction analytical description for a set of data

referred to as data modeling or system identification


28

standard tools

Splines Wavelets Neural network


29

Why Using Neural Network

Splines & Wavelets not generalize well to higher 3 dimensional spaces

universal approximators

parallel architecture trained to map multidimensional nonlinear

functions


30

Why Using Neural Network (cont)

Central to the solution of differential equations. Provide differentiable closed-analytic- form solutions have very good generalization properties widely applicable

translates into a set of nonlinear, transcendental weight equations

cascade structure nonlinearity of the hidden nodes linear operations in the input and output layers


31

Function Approximation Using Neural Network

functions not known analytically have a set of precise input–output samples

functions modeled using an algebraic approach design objectives:

exact matching approximate matching

feedforward neural networks Data:

Input Output And/or gradient information


32

Objective

exact solutions

sufficient degrees of freedom retaining good generalization properties

synthesize a large data set by a parsimonious network


33

Input-to-node values

algebraic training base if all sigmoidal functions inputs are known weight

equations become algebraic

input-to-node values, sigmoidal functions inputs

determine the saturation level of each sigmoid at a given data point


34

weight equations structure

analyze & train a nonlinear neural network

means linear algebra controlling the distribution controlling the saturation level of the active

nodes


35





36

Development of Neural Network Weight Equations

Objective approximate a smooth scalar function of q Inputs

using a feedforward sigmoidal network


37

Derivative information

can improve network’s generalization properties

partial derivatives with input

can be incorporated in the training set


38

Network Output

z: computed as a nonlinear transformation w: input weight p: input b: bias d: output bias v: output weight :sigmoid functions

such as:

input-to-node variables


39

Scalar OutPut of Network


40

Exactly Match of the Function’s Outputs

output weighted equation


41

Gradient Equations

derivative of the network output with respect to its inputs


42

Exact Matching of the Function’s Derivatives

gradient weight equations


43

Input-to-node Weight Equations

rewriting 12


44

Four Algebraic Algorithms

Exact Matching of Function Input –Output Data

Approximate Matching of Gradient Data in Algebra Training

Approximate Matching of Function Input-Output Data

Exact Matching of Function Gradient Data


45





46

A.Exact Matching of Function Input-Output Data

Input S is known matrix ps strategy for producing a well-conditioned S

input weights o

random number N(0,1) L scaling factor

user-defined scalar input-to-node values that do not saturate the sigmoids


47

Input bias

The input bias d is computed to center each sigmoid at one of the training pairs from


48

Finally, the linear system in (9) is solved for v by inverting S


49

17 produced an ill-conditioned S => computation repeated


50

Fig. 2-a. Exact input–output-based algebraic algorithm

Exact Input-Output-Based Algebraic Algorithm


51

Fig. 2-b. Exact input–output-based algebraic algorithm with added p-steps for incorporating gradient information.

Exact Input-Output-Based Algebraic Algorithm with gradient information.


52

Then

Exact matching Input output gradient information

solved exactly simultaneously for the neural parameters.


53





54

B.Approximate Matching of Gradient Data in Algebra Training

estimate output weights input-to-node values

first soluation: use randomized W all parameters refined by a p-step node-by-node

update algorithm.


55

Approximate Matching of Gradient Data in Algebra Training (cont)

d and can be computed solely from


56


kith gradient equations solved for the input weights associated with the ith node


57


end of each step Solve

terminate user-specified gradient tolerance error enters through v and through the input

weights error adjusted in later steps

basic idea ith node input weights mainly contribute to the kth

partial derivatives


58





59

C.Approximate Matching of Function Input-Output Data

algebraic approach approximate parsimonious network exact sulotion s<p satisfy rank(S|u)= rank(S)= s

example linear system in (9) not square sp inverse relationship between u and v (9) will be overdetermined


60

Approximate Matching of Function Input-Output Data (cont)

superimposes technique networks that individually map the nonlinear

function over portions of its input space

training set, covering entire input space input space divided into m subsets


61


J

Fig. 3.Superposition of -node neural networks into one s-node network


62


the gth neural network approximates the vector

by the estimate


63


full network matrix of input-to-node values

with the element in the ith column and kth row

Terms main diagonal terms

input-to-node value matrices for m sub-networks off-diagonal terms,

columnwise linearly dependent on the elements in


64


output weights

S constructed to be of rank s rank of = s or s+1 zero or small error during the superposition error does not increase with m


65


key to developing algebraic training techniques construct a matrix S, through N display the desired characteristics

desired characteristics S must be of rank s s is kept small to produce a parsimonious

network.


66





67

D.Exact Matching of Function Gradient Data

Gradient-based training sets

At every training point k is known for e of the neural network inputs

denoted by x remaining (q-e) denoted by a

Input–output information

&


68

Exact Matching of Function Gradient Data (cont)

input weight

output weight

gradient weight

input-to-node weight equation


69

First Linear System(36)

by reorganizing all values

s=p => is a known -dimensional column vector

rewritten f A is a ps(q-e+1)s matrix computed from all –input vectors


70

Second Linear System(34)

known (34) system Becomes linear

always can be solved for v provided s = p S nonsingular v can be treated as a constant


71

Third Linear System(35)

(35) becomes linear unknowns consist of x-input weights known gradients in training set X is a

known epes


72


algorithm goals determines effective distribution for elements weight equations solved in one step

first solved strategy

with probability=1, produce well-conditioned S consists of generating according to


73

Input-to-Output Values

Substituted in (38)


74

Input-to-Output Values (cont)

sigmoids are very nearly centered desirable one sigmoid be centered for a given

input prevent ill-conditioning S

same sigmoid should close to saturation for any other known input

need a factor absolute value of the largest element in


75



76

Example: Neural Network Modeling of the Sine Function

A sigmoidal neural network is trained to approximate the sine function u=sin(y) over the domain 0≤ y ≤π

The training set is comprised of the gradient and output information shown in the table1.{yk, uk , ck} k=1,2,3

q=e=1


77


78


79

It is shown that the data is matched exactly by a network with two nodes

Suppose the input-to-node values and are chosen such that


80


81


82

equations. In this example, is chosen to make the above weight equations consistent and to meet the assumptions in (57) and (60)–(61). It can be easily shown that this corresponds to computing the elements of ( and ) from the equation


83


84


85


86

Conclusion algebraic training vs optimization-based techniques.

faster execution speeds better generalization properties reduced computational complexity can be used to find a direct correlation between the number of network

nodes needed to model a given data set and the desired accuracy of representation.


87

Function ApproximationFariba SharifianSomaye Kafi

Function Approximation spring 2006 1 Function Approximation Fariba Sharifian Somaye Kafi.

Documents

function approximation

approximation x

input vector y x

x vector input

training slide

y vector

y output layer

training input pair