Top Banner
Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)
42

Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

Cooperating Intelligent Systems

Statistical learning methodsChapter 20, AIMA

(only ANNs & SVMs)

Page 2: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

Artificial neural networks

The brain is a pretty intelligent system.

Can we ”copy” it?

There are approx. 1011 neurons in the brain.

There are approx. 23109 neurons in the male cortex (females have about 15% less).

Page 3: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

The simple model

• The McCulloch-Pitts model (1943)

Image from Neuroscience: Exploring the brain by Bear, Connors, and Paradiso

y = g(w0+w1x1+w2x2+w3x3)

w1

w2

w3

Page 4: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

Transfer functions g(z)

The Heaviside function The logistic function

Page 5: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

The simple perceptron

With {-1,+1} representation

Traditionally (early 60:s) trained with Perceptron learning.

0 if1

0 if1]sgn[)(

y

T

TT

xw

xwxwx

22110 xwxwwT xw

Page 6: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

Perceptron learning

Repeat until no errors are made anymore1. Pick a random example [x(n),f(n)]

2. If the classification is correct, i.e. if y(x(n)) = f(n) , then do nothing

3. If the classification is wrong, then do the following update to the parameters (, the learning rate, is a small positive number)

Bn

Annf

class tobelongs )( if1

class tobelongs )( if1)(

x

xDesired output

)()(1 nxnfww iii

Page 7: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

Example: Perceptron learning

The AND function

x1

x2x1 x2 f

0 0 -1

0 1 -1

1 0 -1

1 1 +1

Initial values:

= 0.3

1

1

5.0

w

Page 8: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

Example: Perceptron learning

The AND function

x1

x2x1 x2 f

0 0 -1

0 1 -1

1 0 -1

1 1 +1

1

1

5.0

w

This one is correctlyclassified, no action.

Page 9: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

Example: Perceptron learning

The AND function

x1

x2x1 x2 f

0 0 -1

0 1 -1

1 0 -1

1 1 +1

1

1

5.0

w

This one is incorrectlyclassified, learning action.

7.01

10

8.01

22

11

00

ww

ww

ww

Page 10: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

Example: Perceptron learning

The AND function

x1

x2x1 x2 f

0 0 -1

0 1 -1

1 0 -1

1 1 +1

7.0

1

8.0

w

This one is incorrectlyclassified, learning action.

7.01

10

8.01

22

11

00

ww

ww

ww

Page 11: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

Example: Perceptron learning

The AND function

x1

x2x1 x2 f

0 0 -1

0 1 -1

1 0 -1

1 1 +1

7.0

1

8.0

w

This one is correctlyclassified, no action.

Page 12: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

Example: Perceptron learning

The AND function

x1

x2x1 x2 f

0 0 -1

0 1 -1

1 0 -1

1 1 +1This one is incorrectlyclassified, learning action.

7.00

7.01

1.11

22

11

00

ww

ww

ww

7.0

1

8.0

w

Page 13: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

Example: Perceptron learning

The AND function

x1

x2x1 x2 f

0 0 -1

0 1 -1

1 0 -1

1 1 +1

7.0

7.0

1.1

w

This one is incorrectlyclassified, learning action.

7.00

7.01

1.11

22

11

00

ww

ww

ww

Page 14: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

Example: Perceptron learning

The AND function

x1

x2x1 x2 f

0 0 -1

0 1 -1

1 0 -1

1 1 +1

7.0

7.0

1.1

w Final solution

Page 15: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

Perceptron learning

• Perceptron learning is guaranteed to find a solution in finite time, if a solution exists.

• Perceptron learning cannot be generalized to more complex networks.

• Better to use gradient descent – based on formulating an error and differentiable functions

N

n

nynfE1

2),()()( WW

Page 16: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

Gradient search

)(WW EW

W

E(W)

“Go downhill”

The “learning rate” () is set heuristically

W(k)

W(k+1) = W(k) + W(k)

Page 17: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

The Multilayer Perceptron (MLP)

• Combine several single layer perceptrons.

• Each single layer perceptron uses a sigmoid function (C)E.g.

x k

h j

h i

y l 1)exp(1)(

)tanh()(

zz

zz

outp

ut

inpu

t

Can be trained using gradient descent

Page 18: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

Example: One hidden layer

• Can approximate any continuous function

(z) = sigmoid or linear, (z) = sigmoid.

x k

h j

y i

D

kkjkjj

J

jjijii

xwwh

hvvy

10

10

)(

)()(

x

xx

Page 19: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

Example of computing the gradient

)(WEW W

N

n

N

n

eN

nynxWyN

MSEWE1 1

22 1))())(,(ˆ(

1)(

N

nW

N

nW

N

nWW yne

Nnene

Nne

NWE

111

2 )ˆ)((2

))()((2

)(1

)(

What we need to do is to compute yW ˆ

))((ˆ11

K

kkjkj

J

jj wxhvy

We have the complete equation for the network:

Page 20: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

Example of computing the gradient

y

y

y

y

y

j

kj

j

v

v

w

w

W

ˆ

ˆ

ˆ

ˆ

ˆ0

0

kk

kjkjjk

kjkjj

jkjkj

w xwxhvwxhvww

yy

kj)()(

ˆˆ

)(1)()tanh()( 2 zhzhzzh

)(ˆ0

kkjkjjw wxhvy

j

Page 21: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

When should you stop learning?

• After a set number of learning epochs• When the change in the gradient becomes

smaller than a certain number• Validation data - “early stopping”

Classification error

Training epochs Preferred model

Training error

Validation (test error)

Page 22: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

RPROP (Resilient PROPagation)

))(()( iWii WEsigntWi

0)()()1(5.0

0)()()1(2.1)(

1

1

itWitWi

itWitWi

i WEWEift

WEWEiftt

ii

ii

No parameter tuning unlike standard backpropagation!

Parameter update rule:

Learning rate update rule:

Page 23: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

Model selection

5 6 7 8 9 10 11

0

0.1

0.2

0.3

0.4

0.5

Model type A

Classification error [%]

PD

F

2 3 4 5 6 7 8 9

-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4Model type B

Classification error [%]

PD

F

Model type A Model type B2

3

4

5

6

7

8

9

10Errorbar plot

Cla

ssifi

catio

n er

ror

[%]

Use this to determine:• Number of hidden nodes• Which input signalsto use• If a pre-processing strategy is good or not• Etc...

Variability typically induced by:• Varying train and test data sets• Random initialmodel parameters

Page 24: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

Support vector machines

Page 25: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

Linear classifier on a linearly separable problem

There are infinitely manylines that have zero trainingerror.

Which line should we choose?

Page 26: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

There are infinitely manylines that have zero trainingerror.

Which line should we choose?

Choose the line with thelargest margin.

The “large margin classifier”

margin

Linear classifier on a linearly separable problem

Page 27: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

There are infinitely manylines that have zero trainingerror.

Which line should we choose?

Choose the line with thelargest margin.

The “large margin classifier”

margin

Linear classifier on a linearly separable problem

”Support vectors”

Page 28: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

The plane separating and is defined by

The dashed planes are given by

margin

Computing the margin

w

aT xw

ba

baT

T

xw

xw

Page 29: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

Divide by b

Define new w = w/b and = a/bmargin

Computing the margin

w 1//

1//

bab

babT

T

xw

xw

1

1

xw

xwT

T

We have defined a scalefor w and b

Page 30: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

We have

which gives

margin

Computing the margin

margin

1)(

1

w

wxw

xw

T

T

w

x

x + w

w

2margin

Page 31: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

w

Maximizing the margin isequal to minimizing

||w||

subject to the constraints

wTx(n) – +1 for all

wTx(n) – -1 for all

Quadratic programming problem, constraints can be included with Lagrange multipliers.

Linear classifier on a linearly separable problem

Page 32: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

N

n

N

n

N

m

TmnnD mnmynyL

1 1 1

)()()()(2

1xx

Quadratic programming problem

N

n

Tnp nnyL

1

21)()(

2

1 xww

Minimize cost (Lagrangian)

Minimum of Lp occurs at the maximum of (the Wolfe dual)

Only scalar productin cost. IMPORTANT!

Page 33: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

sn

Tn

T nnyy xxxwx )()(sgnsgn)(ˆ

Linear Support Vector Machine

Test phase, the predicted output

Still only scalar products in the expression.

Page 34: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

Example: Robot color vision(Competition 1999)

Classify the Lego pieces into red, blue, and yellow.Classify white balls, black sideboard, and green carpet.

Page 35: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

What the camera sees (RGB space)

Yellow

Green

Red

Page 36: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

Mapping RGB (3D) to rgb (2D)

BGR

Bb

BGR

Gg

BGR

Rr

Page 37: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

Lego in normalized rgb space

2Xx 6Cc

Input is 2D

x1

x2

Output is 6D:{red, blue, yellow,green, black, white}

Page 38: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

MLP classifier

E_train = 0.21%E_test = 0.24%2-3-1 MLP

Levenberg-Marquardt

Training time(150 epochs):51 seconds

Page 39: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

SVM classifier

E_train = 0.19%E_test = 0.20%SVM with

= 1000

2)(exp),( yxyx K

Training time:22 seconds

Page 40: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

Lab 4: Digit recognition

• Inputs (digits) are provided as 32x32 bitmaps. Task is to investigate how well these handwritten digits can be recognized by neural networks.

• Assignment includes changing in the program code to answer:

1.How good is the generalization performance? (test data error)

2.Can pre-processing improve performance?3.What is the best configuration of the

network?

Page 41: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

public AppTrain() { // create a new network of given size nn=new NN(32*32, 10, seed);

// each row contains 32*32+1 integer // create the matrix holding the data // read data into the matrix file=new TFile("digits.dat"); System.out.println(file.rows()+" digits have been loaded");

double[] input=new double[32*32]; double[] target=new double[10];

// the training session (below) is iterative for (int e=0; e<nEpochs; e++) { // reset the error accumulated over each training epoch double err=0; // in each epoch, go through all examples/tuples/digits // note: all examples are here used for training, consequently no systematic testing // you may consider dividing the data set into training, testing and validation sets. for (int p=0; p<file.rows(); p++) { for (int i=0; i<32*32; i++) input[i]=file.values[p][i]; // the last value on each row contains the target (0-9) // convert it to a double[] target vector for (int i=0; i<10; i++) { if (file.values[p][32*32]==i) target[i]=1; else target[i]=0; } // present a sample and // calculate errors and adjust weights err+=nn.train(input, target, eta); } System.out.println("Epoch "+e+" finished with error "+err/file.rows()); }

// save network weights in a file for later use, e.g. in AppDigits nn.save("network.m"); }

Page 42: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

/** classify * @param map the bitmap on the screen * @return int the most likely digit (0-9) according to network */ public int classify(boolean[][] map) { double[] input=new double[32*32]; for (int c=0; c<map.length; c++) { for (int r=0; r<map[c].length; r++) { if (map[c][r]) // bit set input[r*map[r].length+c]=1; else input[r*map[r].length+c]=0; } } // activate the network, produce output vector double[] output=nn.feedforward(input); // alternative version assumes that the network has been trained on an 8x8 map // double[] output=nn.feedforward(to8x8(input)); double highscore=0; int highscoreIndex=0; // print out each output value (gives an idea of the network's support for each digit). System.out.println("--------------"); for (int k=0; k<10; k++) { System.out.println(k+":"+(double)((int)(output[k]*1000)/1000.0)); if (output[k]>highscore) { highscore=output[k]; highscoreIndex=k; } } System.out.println("--------------"); return highscoreIndex; }