Top Banner
Corneliu Florea. Machine Learning pentru Aplicatii Vizuale
46

Machine Learning pentru Aplicatii Vizuale

Oct 03, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Machine Learning pentru Aplicatii Vizuale

Corneliu Florea.

Machine Learning pentru Aplicatii Vizuale

Page 2: Machine Learning pentru Aplicatii Vizuale

Perceptron

Page 3: Machine Learning pentru Aplicatii Vizuale

Perceptron

• Historically was the first machine learning structure

• Inspired from biological neuron

• Capable of linear separation only – Limitation concluded in 15 years gap in research

• Learning algorithm a prequel to the powerful gradient descent

Page 4: Machine Learning pentru Aplicatii Vizuale

Biological Neuron

The Neuron - A Biological Information Processor

• dentrites - the receivers• soma - neuron cell body (sums

input signals)• axon - the transmitter• synapse - point of transmission• neuron activates after a certain

threshold is metLearning occurs via electro-chemical

changes in effectiveness of synaptic junction.

An Artificial Neuron - The PerceptronBasic function of neuron is to sum inputs, and produce output given sum is greater than threshold

Page 5: Machine Learning pentru Aplicatii Vizuale

Artificial Perceptron

• Model similar with logistic regression• Artificial Neuron – Perceptron• The model was introduced by McCullogh and Pitts in 1943

with a hard limiting function• Tipically trained by Delta ???? algorithm

Page 6: Machine Learning pentru Aplicatii Vizuale

Neuron Modelw1x1

w2x2

wNxN

o

Input weight

threshold

Output 0 or 1

Summation(with bias)

+= ∑

=

N

iiiwxwfo

10

Learning:-During training the weights w0, w1,…,wN are found

-Perceptron works on linearly separable problems only!!!

Page 7: Machine Learning pentru Aplicatii Vizuale

Activation Functions

1) Threshold Functionf(v) = 1 if v≥ 0

= 0 otherwise

2) Piecewise-Linear Functionf(v) = 1 if v ≥ ½

= v if ½> v > - ½ = 0 otherwise

3) Sigmoid Functionf(v) = 1/{1 + exp(- av)}

etc..

Neurons can use any differentiable transfer function f to generate their output

Page 8: Machine Learning pentru Aplicatii Vizuale

Separating hyperplane

Example

Page 9: Machine Learning pentru Aplicatii Vizuale

Example

Page 10: Machine Learning pentru Aplicatii Vizuale

Perceptron Learning Algorithm:

1. Initialize weights with small random values

2. Present a pattern, xi and target output, yi

3. Compute output :

4. Update weights :

Repeat starting at 2, until acceptable level of error

Learning in a Simple Neuron

+= ∑

=

N

iiiwxwfo

10

)()()1( twtwtw ∆+=+

Page 11: Machine Learning pentru Aplicatii Vizuale

Learning in a Simple Neuron

Widrow-Hoff or Delta Rule for weight modification

)()()1( twtwtw ∆+=+(t)ixtotyη(t)ixtηtiw

−==∆ )()()()( εWhere:η = learning rate (0 < η <= 1), typically set to 0.1 or 0.2ε(t) = error signal = desired output (y()) - network output (o())

)()()1( 00 ttwtw ηε+=+

For input weights

For bias

Page 12: Machine Learning pentru Aplicatii Vizuale

Weight Updates

• Binary classification – weight updates

x(t)totyηx(t)tηtw

−==∆ )()()()( ε

o(t) y(t) ε(t) Δw(t)

0 0 0 0

0 1 +1 ηx(t)

1 0 -1 -ηx(t)

1 1 0 0

Page 13: Machine Learning pentru Aplicatii Vizuale

Perceptron learning

• Presenting the perceptron with enough training vectors, the weight vector w(n) will tend to the correct value w.

• Rosenblatt proved that if input patterns are linearly separable, then the perceptron learning law converges, and the hyperplane separating two classes of input patterns can be determined

Page 14: Machine Learning pentru Aplicatii Vizuale

Example

Logical OR Function

x1 x2 y0 0 00 1 11 0 11 1 1 x1

x2

0,0 0,1

1,0 1,1

y = f(w0+w1x1+w2x2)

SimpleNeural Network

Line: ?x1+?x2 = ? or -?x1-?x2 + ? = 0

Page 15: Machine Learning pentru Aplicatii Vizuale

Trainingo(t) y(t) ε(t) Δw(t)

0 0 0 0

0 1 +1 ηx(t)

1 0 -1 -ηx(t)

1 1 0 0

η=0.2

Iterat

w1 w2 w0 x1 x2 y=(x1orx2)

o' o ε Δw1 Δw2 Δw0

1 0.02 -0.15 0.09 1 0 1 0.11 0 1 0.2 00.2

2 0.22 -0.15 0.29 0 1 1 0.14 0 1 0 0.20.2

3 0.22 0.05 0.49 1 1 1 0.76 1 0 0 00

4 0.22 0.05 0.49 1 0 1 0.71 1 0 0 00

5 0.22 0.05 0.49 0 0 0 0.49 0 0 0 00

6 0.22 0.05 0.49 1 1 1 0.76 1 0 0 00

1 0.02 -0.15 0.09 1 0 1 0.11 0 1 0.2 00.2

o = f(w0+w1x1+w2x2) = f(o’)

≤>

=5.005.01

)(xx

xf

Page 16: Machine Learning pentru Aplicatii Vizuale

Example

Iterat w1 w2 w0 x1 x2 y o' o ε Δw1 Δw2 Δw0

1 0.02 -0.15 0.09 1 0 1 0.11 0 1 0.2 0 0.22 0.22 -0.15 0.29 0 1 1 0.14 0 1 0 0.2 0.23 0.22 0.05 0.49 1 1 1 0.76 1 0 0 0 04 0.22 0.05 0.49 1 0 1 0.71 1 0 0 0 05 0.22 0.05 0.49 0 0 0 0.49 0 0 0 0 06 0.22 0.05 0.49 1 1 1 0.76 1 0 0 0 07 0.22 0.05 0.49 0 0 0 0.49 0 0 0 0 08 0.22 0.05 0.49 1 1 1 0.76 1 0 0 0 09 0.22 0.05 0.49 1 0 1 0.71 1 0 0 0 0

10 0.22 0.05 0.49 0 1 1 0.54 1 0 0 0 011 0.22 0.05 0.49 0 0 0 0.49 0 0 0 0 012 0.22 0.05 0.49 1 1 1 0.76 1 0 0 0 013 0.22 0.05 0.49 1 0 1 0.71 1 0 0 0 014 0.22 0.05 0.49 0 1 1 0.54 1 0 0 0 015 0.22 0.05 0.49 1 0 1 0.71 1 0 0 0 016 0.22 0.05 0.49 0 0 0 0.49 0 0 0 0 0

Line: 0.22x1+0.25x2 = -0.11

Convergence

Page 17: Machine Learning pentru Aplicatii Vizuale

FAILURE

Logical XOR Function

x1 x2 y0 0 00 1 11 0 11 1 0 0,0 0,1

1,0 1,1

Two neurons are need! Their combined results can produce good classification.

x2

x1

Page 18: Machine Learning pentru Aplicatii Vizuale

XOR- Failure

Iterat w1 w2 w0 x1 x2 y o' o ε Δw1 Δw2 Δw0

1 0.02 -0.15 0.09 1 0 0 0.11 0 0 0 0 02 0.02 -0.15 0.09 0 1 0 -0.06 0 0 0 0 03 0.02 -0.15 0.09 1 1 1 -0.04 0 1 0.2 0.2 0.24 0.22 0.05 0.29 1 0 0 0.51 1 -1 -0.2 0 -0.25 0.02 0.05 0.09 0 0 1 0.09 0 1 0 0 0.26 0.02 0.05 0.29 1 1 1 0.36 0 1 0.2 0.2 0.27 0.22 0.25 0.49 0 0 1 0.49 0 1 0 0 0.28 0.22 0.25 0.69 1 1 1 1.16 1 0 0 0 09 0.22 0.25 0.69 1 0 0 0.91 1 -1 -0.2 0 -0.2

10 0.02 0.25 0.49 0 1 0 0.74 1 -1 0 -0.2 -0.211 0.02 0.05 0.29 0 0 1 0.29 0 1 0 0 0.212 0.02 0.05 0.49 1 1 1 0.56 1 0 0 0 013 0.02 0.05 0.49 1 0 0 0.51 1 -1 -0.2 0 -0.214 -0.18 0.05 0.29 0 1 0 0.34 0 0 0 0 015 -0.18 0.05 0.29 1 0 0 0.11 0 0 0 0 016 -0.18 0.05 0.29 0 0 1 0.29 0 1 0 0 0.2

Page 19: Machine Learning pentru Aplicatii Vizuale

Geometric interpretation of the learning law

Page 20: Machine Learning pentru Aplicatii Vizuale

Multi-Layer Perceptron

Page 21: Machine Learning pentru Aplicatii Vizuale

Multi-Layer Perceptron

• Multi-Layer Perceptron = Artificial Neural Network = = FeedForward Network = Fully Connected Network

Over the 15 years (1969-1984) some research continued ... • hidden layer of nodes allowed combinations of linear functions • non-linear activation functions displayed properties closer to real

neurons:– output varies continuously but not linearly– differentiable .... sigmoid

non-linear ANN classifier was possibleae

af−+

=1

1)(

Page 22: Machine Learning pentru Aplicatii Vizuale

Collection of Artificial Neurons

Hidden Nodes

Output Nodes

Input Nodes

I1 I2 I3 I4

O1 O2“Distributed processing

and representation”

3-Layer Networkhas

2 active layers

Page 23: Machine Learning pentru Aplicatii Vizuale

Mathematical formulation

• Each (i-th) neuron’s from the j-th layer output is computed as

+= ∑

=

N

iijijijij wxwfo

10

Where:• f() is a non linear function• xi are:o inputs for first layero outputs of the previous layer

• wi – weights• w0 - bias

Learning: - find the values of all weights

Page 24: Machine Learning pentru Aplicatii Vizuale

Training MLPs

• Initially there was no learning algorithm to adjust the weights of a multi-layer network -

– weights had to be set by hand.

• How could the weights below the hidden layer be updated?The Back-propagation Algorithm• 1986: the solution to multi-layer ANN weight update rediscovered • Conceptually simple - the global error is backward propagated to network nodes, weights

are modified proportional to their contribution

• Most important ANN learning algorithm• Become known as back-propagation because the error is send back through the

network to correct all weights

Page 25: Machine Learning pentru Aplicatii Vizuale

The Back-Propagation Algorithm

• Like the Perceptron - calculation of error is based on difference between target and actual output:

• However in BP it is the rate of change of the error which is the important feedback through the network

generalized delta rule

• Relies on the sigmoid activation function for communicationijδw

δLηijΔw −=

2)(21

jj

j oyL −= ∑

Page 26: Machine Learning pentru Aplicatii Vizuale

The Back-Propagation Algorithm

Objective: compute for allDefinitions:

= weight from node i to node j= totaled weighted input of node

= output of node = error (loss) for 1 pattern over all output nodes

ijwL

δδ

ijw

jx

jo

L= = + −f x e x

jj( ) /( )1 1

ijw

i

n

iijow∑=

=0

)1/(1)( jj

xexfo −+==

Page 27: Machine Learning pentru Aplicatii Vizuale

The Back-Propagation Algorithm

Objective: compute derivatives for all wij

Four step process:

1. Compute how fast error changes as output of node j is changed

2. Compute how fast error changes as total input to node j is changed

3. Compute how fast error changes as weight wij coming into node j is changed

4. Compute how fast error changes as output of node i in previous layer is changed

ijwL

δδ

Page 28: Machine Learning pentru Aplicatii Vizuale

The Back-Propagation Algorithm

On-Line algorithm:

1. Initialize weights

2. Present a pattern (training example) xi and target output yi

3. Compute output :

4. Update weights :

where

Repeat starting at 2 until acceptable level of error

o f w oj iji

n

i= ∑=

[ ]0

ijijij wtwtw ∆+=+ )()1(

ijwE

ijw δδη−=∆

+= ∑

=

N

iijijijij wxwfo

10

Page 29: Machine Learning pentru Aplicatii Vizuale

Back-Propagation

Where:

For output nodes:

For hidden nodes:

ijij

ijij LWwLow η

∂∂ηηε −=−==∆

))(1()1(

jjjj

jjjjj

oyooooLALI

−−=

=−==ε

ijj

jiiiiiii wLIooooLALI ∑−=−== )1()1(ε

Page 30: Machine Learning pentru Aplicatii Vizuale

BackPropagation

• Node (all) outputs assume a sigmoid function:

• We need to compute its derivative

aeaf

−+=

1

1)(

( ))(1)()(' afafaf −=

Page 31: Machine Learning pentru Aplicatii Vizuale

The Back-propagation Algorithm

Visualizing the BackProp learning process:

The algorithm performs a gradient descent in weights space toward a minimum level of error using a fixed step size or learning rate

The gradient is given by : = rate at which error changes as weights change

ijwE

δδ

η

Page 32: Machine Learning pentru Aplicatii Vizuale

The Back-propagation Algorithm

Momentum Descent: Minimization can be speed-up if an additional term is added to the

update equation:where:

Thus: Augments the effective learning rate to vary the amount a weight is

updatedAnalogous to momentum of a ball - maintains directionRolls through small local minima Increases weight upadte when on stable gradient

)]1()([ −− twtw ijijα

η

1<< αo)1()( −∆+=∆ twodtw ijijij αη

Page 33: Machine Learning pentru Aplicatii Vizuale

The Back-propagation Algorithm

Line Search Techniques: Steepest and momentum descent use only gradient of

error surfaceMore advanced techniques explore the weight space using

various heuristicsMost common is to search ahead in the direction defined

by the gradient

Page 34: Machine Learning pentru Aplicatii Vizuale

The Back-propagation Algorithm

On-line vs. Batch algorithms: Batch (or cumulative) method reviews a set of training

examples known as an epoch and computes global error:

Weight updates are based on this cumulative error signal On-line more stochastic and typically a little more

accurate, batch more efficient

2)(21

jj

jp

otE −= ∑∑

Page 35: Machine Learning pentru Aplicatii Vizuale

The Back-propagation Algorithm

Several Questions:• What is BP’s inductive bias?

• Can BP get stuck in local minimum?

• How does learning time scale with size of the network & number of training examples?

• Is it biologically plausible?

• Do we have to use the sigmoid activation function?

• How well does a trained network generalize to unseen test cases?

Page 36: Machine Learning pentru Aplicatii Vizuale

Example

• Taken from: https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/• Network:

Page 37: Machine Learning pentru Aplicatii Vizuale

Example

• Initial weights The Forward Pass

Here’s how we calculate the total net input for :

We use the logistic function to get the output of h1 :

Carrying out the same process for h2 we get:

Page 38: Machine Learning pentru Aplicatii Vizuale

ExampleContinuing the forward pass• We repeat this process for the output layer neurons, using the output from

the hidden layer neurons as inputs.• Here’s the output for o1

And carrying out the same process for o2 we get:

Page 39: Machine Learning pentru Aplicatii Vizuale

ExampleCalculating the Total Error

We can now calculate the error for each output neuron using the squared error function and sum them to get the total error:

For example, the target output for is 0.01 but the neural network output 0.75136507, therefore its error is:

Repeating this process for (remembering that the target is 0.99) we get:

The total error for the neural network is the sum of these errors:

Page 40: Machine Learning pentru Aplicatii Vizuale

ExampleThe Backwards PassOur goal with BackPropagation is to update each of the weights in the network so that they cause the actual output to be closer the target output, thereby minimizing the error for each output neuron and the network as a whole.

Output LayerConsider w5 . We want to know how much a change in w5 affects the total error, aka .

By applying the chain rule we know that:

Visually, here’s what we’re doing:

Page 41: Machine Learning pentru Aplicatii Vizuale

Example

We need to figure out each piece in this equation.First, how much does the total error change with respect to the output?

When we take the partial derivative of the total error with respect to outo1 , the quantity [1/2(targeto2 – out02)2] becomes zero because outo1 does not affect it which means we’re taking the derivative of a constant which is zero.

Page 42: Machine Learning pentru Aplicatii Vizuale

Example

Next, how much does the output of o1 change with respect to its total net input?The partial derivative of the logistic function (a.k.a. sigmoid) is the output multiplied by 1 minus the output:

Finally, how much does the total net input of o1 change with respect to w5 ?

Putting it all together:

Page 43: Machine Learning pentru Aplicatii Vizuale

ExampleYou’ll often see this calculation combined in the form of the delta rule:

Alternatively, we have and which can be written as , aka aka the

node delta. We can use this to rewrite the calculation above:

Therefore:

Some sources extract the negative sign from δ so it would be written as:

To decrease the error, we then subtract this value from the current weight (multiplied by some learning rate, eta, which is set here to 0.5):

We can repeat this process to get the new weights :

Page 44: Machine Learning pentru Aplicatii Vizuale

ExampleHidden LayerNext, we’ll continue the backwards pass by calculating new values for w1,w2,w3,w4.We need to figure out:

The process is similar with the one for the output layer, but slightly different to account for the fact that the output of each hidden layer neuron contributes to the output (and therefore error) of multiple output neurons. We know that outh1 affects both outo1 and outo2 therefore the needs to take into consideration its effect on the both output neurons:

We can calculate using values we calculated earlier:

And is equal to :

Page 45: Machine Learning pentru Aplicatii Vizuale

Example

Plugging them in:

Following the same process for , we get:

Therefore:

Now that we have , we need to figure out and then for each weight:

We calculate the partial derivative of the total net input to h1 with respect to w1 the same as we did for the output neuron:

Putting it all together:

Page 46: Machine Learning pentru Aplicatii Vizuale

ExampleWe can now update w1 :

Repeating this for w2, w3 and w4:

Finally, we’ve updated all of our weights!

When we fed forward the 0.05 and 0.1 inputs originally, the error on the network was 0.298371109. After this first round of backpropagation, the total error is now down to 0.291027924. It might not seem like much, but after repeating this process 10,000 times, for example, the error plummets to 0.0000351085. At this point, when we feed forward 0.05 and 0.1, the two outputs neurons generate 0.015912196 (vs 0.01 target) and 0.984065734 (vs 0.99 target).