Top Banner
Last lecture summary
33

Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer.

Dec 23, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer.

Last lecture summary

Page 2: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer.

Multilayer perceptron

• MLP, the most famous type of neural network

input layer hidden layer output layer

Page 3: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer.

Processing by one neuron

0

0 1 1 2 21.0

n

j jj

n n

w x

w w x w x w x

w x

inputs

weights

output

bias

activation function

Page 4: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer.

Linear activation functions

0j j

j

w x

w x

linear threshold

w∙x > 0

w∙x ≤ 0

Page 5: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer.

logistic (sigmoid, unipolar) tanh (bipolar)

1

1 e

Nonlinear activation functions

tanhe e

e e

Page 6: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer.

Backpropagation training algorithm

• MLP is trained by backpropagation.• forward pass– present a training sample to the neural network– calculate the error (MSE) in each output neuron

• backward pass– first calculate gradient for hidden-to-output weights– then calculate gradient for input-to-hidden weights

• the knowledge of gradhidden-output is necessary to calculate gradinput-hidden

– update the weights in the network1m m m m mw w w w d

Page 7: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer.

input signal propagates forward

error propagates backward

Page 8: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer.

• Online learning vs. batch learning– Batch learning improves the stability by averaging.

• Another averaging approach providing stability is using the momentum (μ).

– μ (between 0 and 1) indicates the relative importance of the past weight change ∆wm-1 on the new weight increment ∆wm

1 1m m mw w d

Momentum

Page 9: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer.

Other improvements

• Delta-Bar-Delta (Turboprop)– Each weight has its own learning rate β.

• Second order methods– Hessian matrix (How fast changes the rate of

increase of the function in the small neighborhood? curvature)

– QuickProp, Gauss-Newton, Levenberg-Marquardt– less epochs, computationally (Hessian inverse,

storage) expensive

Page 10: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer.

Improving generalization of MLP

• Flexibility comes from hidden neurons.• Choose such a # of hidden neurons that

neither underfitting, nor overfitting occurs.• Three most common approaches:– exhaustive search• stop training after MSE < small_threshold (e.g. 0.001)

– early stopping• large number of hidden neurons

– regularization• weight decay

2

1

m

jj

W MSE w

Page 11: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer.

number of neurons

Sandhya Samarasinghe, Neural Networks for Applied Sciences and Engineering, 2006

Page 12: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer.

Network pruning

• Keep only essential weights/neurons.• Optimal Brain Damage (OBD)– If the saliency si of the weight is small, remove the

weight.

– Train flexible network (e.g. early stopping), the remove weights, retrain network, etc.

2

2ii i

i

H ws

Page 13: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer.

Radial Basis Function Networks(new stuff)

Page 14: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer.

Radial Basis Function (RBF) Network

• Becoming an increasingly popular neural network.

• Is probably the main rival to the MLP.• Completely different approach by viewing the

design of a neural network as an approximation problem in high-dimensional space.

• Uses radial functions as activation function.

Page 15: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer.

Gaussian RBF• Typical radial function is the Gaussian RBF

(monotonically decreases with distance from the center).

• Their response decreases with distance from a central point.

• Parameters: – center c– width (radius r)

c - center

rradius

2

2)(exp)(

rh

cxx

Page 16: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer.

Local vs. global units

• Local– they cover just certain part of the space– i.e. they are nonzero just in certain part of the

space• Global– sigmoid, linear

• Local– Gaussian

Page 17: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer.

MLP

RBF

Pavel Kordík, Data Mining lecture, FEL, ČVUT, 2009

Page 18: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer.

Input layer

Hidden layer

(RBFs)

Output layer

x1

x2

x3

xn

h1

h2

h3

hm

f(x)

W1

W2

W3

Wm

RBFN architectureEach of n compo-nents of the input vector x feeds forward to m basis functions whose outputs are linearly combined with weights w (i.e. dot product x∙w) into the network output f(x).

no weights

Pavel Kordík, Data Mining lecture, FEL, ČVUT, 2009

Page 19: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer.

Σ Σ

Pavel Kordík, Data Mining lecture, FEL, ČVUT, 2009

Page 20: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer.

• The basic architecture for a RBF is a 3-layer network. • The input layer is simply a fan-out layer and does no

processing. • The hidden layer performs a non-linear mapping

from the input space into a (usually) higher dimensional space in which the patterns become linearly separable.

• The output layer performs a simple weighted sum (i.e. w∙x).– If the RBFN is used for regression then this output is fine. – However, if pattern classification is required, then a hard-

limiter or sigmoid function could be placed on the output neurons to give 0/1 output values

Page 21: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer.

Clustering

• The unique feature of the RBF network is the process performed in the hidden layer.

• The idea is that the patterns in the input space form clusters.

• If the centres of these clusters are known, then the distance from the cluster centre can be measured.

Page 22: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer.

• Furthermore, this distance measure is made non-linear, so that if a pattern is in an area that is close to a cluster centre it gives a value close to 1.

• Beyond this area, the value drops dramatically. • The notion is that this area is radially symmetrical

around the cluster centre, so that the non-linear function becomes known as the radial-basis function.

non-linearly transformed distance

distance from the center of the cluster

Page 23: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer.

Σ Σ

Category 1 Category 2

Category 1

Category 2

RBFN for classification

Page 24: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer.

RBFN for regression

http://diwww.epfl.ch/mantra/tutorial/english/rbf/html/

Page 25: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer.

XOR problem

0 1

0

1

Page 26: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer.

XOR problem

• 2 inputs x1, x2, 2 hidden units (with outputs φ1, φ2), one output

• The parameters for two hidden units are set as– c1 = <0,0>, c2 = <1,1>– the value of radius r is chosen such that 2r2 = 1

x1 x2 φ1 φ2

0 0 1 0.1

0 1 0.4 0.4

1 0 0.4 0.4

1 1 0.1 1

x1

x2

h1

h2

φ1

φ2

Page 27: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer.

0 1

0

1

0,11,0

1,1

0,0

0 1

0

1 1,1

0,0

0,1

1,0

When mapped into the feature space < h1 , h2 >, two classes become linearly separable. So a linear classifier with h1(x)

and h2(x) as inputs can be used to solve the XOR problem.

Linear classifier is represented by the output layer.

x1 x2 φ1 φ2

0 0 1 0.1

0 1 0.4 0.4

1 0 0.4 0.4

1 1 0.1 1

Page 28: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer.

RBF Learning• Design decision– number of hidden neurons

• max of neurons = number of input patterns• min of neurons = determine• more neurons – more complex, smaller tolerance

• Parameters to be learnt– centers– radii

• A hidden neuron is more sensitive to data points near its center. This sensitivity may be tuned by adjusting the radius.

• smaller radius fits training data better (overfitting)• larger radius less sensitivity, less overfitting, network of smaller

size, faster execution

– weights between hidden and output layers

Page 29: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer.

• Learning can be divide into two independent tasks:1. Center and radii determination2. Learning of output layer weights

• Learning strategies for RBF parameters– Sample center position randomly from the

training data– Self-organized selection of centers– Both layers are learnt using supervised learning

Page 30: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer.

Select centers at random

• Choose centers randomly from the training set.• Radius r is calculated as

• Weights are found by means of numerical linear algebra approach.

• Requires a large training set for a satisfactory level of performance.

maximum distance between any 2 centers

number of centersr

Page 31: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer.

Self-organized selection of centers• centers are selected using k-means clustering

algorithm• radii are usually found using k-NN– find k-nearest centers– The root-mean squared distance between the current

cluster centre and its k (typically 2) nearest neighbours is calculated, and this is the value chosen for r.

• The output layer is learnt using a gradient descent technique

2

1

1( )

k

k ii

r c ck

Page 32: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer.

Supervised learning

• Supervised learning of all parameters (centers, radii, weights) using gradient descent.

• Mathematical formulas for updating all of these parameters. They are not shown here, it is not necessary to scare you in such a “nice” day.

• Learning rate is used.

Page 33: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer.

Advantages/disadvantages

• RBFN trains faster than a MLP • Although the RBFN is quick to train, when training is

finished and it is being used it is slower than a MLP.• RBFN are essentially well tried statistical techniques

being presented as neural networks. Learning mechanisms in statistical neural networks are not biologically plausible.

• RBFN can give “I don’t know” answer.• RBFN construct local approximations to non-linear I/O

mapping. MLP construct global approximations to non-linear I/O mapping.