Top Banner
Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker
51

Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

Computational Learning Theory• PAC• IID• VC Dimension• SVM

Kunstmatige Intelligentie / RuG

KI2 - 5

Marius Bulacu & prof. dr. Lambert Schomaker

Page 2: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

2

Learning

Learning is essential for unknown environments–i.e., when designer lacks omniscience

Learning is useful as a system construction method–i.e., expose the agent to reality rather than trying to write it down

Learning modifies the agent's decision mechanisms to improve performance

Page 3: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

3

Learning Agents

Page 4: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

4

Learning Element

Design of a learning element is affected by:– Which components of the performance element are to be learned– What feedback is available to learn these components– What representation is used for the components

Type of feedback:– Supervised learning: correct answers for each example– Unsupervised learning: correct answers not given– Reinforcement learning: occasional rewards

Page 5: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

5

Inductive Learning

Simplest form: learn a function from examples

- f is the target function

- an example is a pair (x, f(x))

Problem: find a hypothesis h

such that h ≈ f

given a training set of examples

This is a highly simplified model of real learning:

- ignores prior knowledge

- assumes examples are given

Page 6: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

6

Inductive Learning Method

Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve fitting:

Page 7: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

7

Inductive Learning Method

Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve fitting:

Page 8: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

8

Inductive Learning Method

Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve fitting:

Page 9: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

9

Inductive Learning Method

Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve fitting:

Page 10: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

10

Inductive Learning Method

Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve fitting:

Page 11: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

11

Inductive Learning Method

Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve fitting:

Occam’s razor: prefer the simplest hypothesis consistent with data

Page 12: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

12

Occam’s Razor

William of Occam

(1285-1349, England)

“If two theories explain the facts equally well, then the simpler theory is to be preferred.”

Rationale:

There are fewer short hypotheses than long hypotheses.

A short hypothesis that fits the data is unlikely to be a coincidence.

A long hypothesis that fits the data may be a coincidence.

Formal treatment in computational learning theory

Page 13: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

13

The Problem

• Why does learning work?

• How do we know that the learned hypothesis h is close to the target function f if we do not know what f is?

answer provided by

computational learning theory

Page 14: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

14

The Answer

• Any hypothesis h that is consistent with a sufficiently large number of training examples is unlikely to be seriously wrong.

Therefore it must be:

Probably Approximately Correct

PAC

Page 15: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

15

The Stationarity Assumption

• The training and test sets are drawn randomly from the same population of examples using the same probability distribution.

Therefore training and test data are

Independently and Identically Distributed

IID

“the future is like the past”

Page 16: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

16

How many examples are needed?

Number of examples Probability that h and f disagree on an example

Probability of existence of a wrong hypothesis

consistent with all examples

)Hln(lnm 11

Size of hypothesis space

Sample complexity

Page 17: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

17

Formal Derivation

H (the set of all possible hypothese)

f

HBAD (the set of “wrong” hypotheses)

1))x(f)x(h,x(P

))x(f)x(h,x(P

)Hln(lnm)(H

)(H)Hh(P

m

mBADBAD

11

1

1

Page 18: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

18

What if hypothesis space is infinite?

Can’t use our result for finite H Need some other measure of complexity for H

– Vapnik-Chervonenkis dimension

Page 19: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

19

Page 20: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

20

Page 21: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

21

Page 22: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

22

Shattering two binary dimensionsover a number of classes

In order to understand the principle of shattering sample points into classes we will look at the simple case of

two dimensions

of binary value

Page 23: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

23

2-D feature space

0

0

1

1

f1

f2

Page 24: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

24

2-D feature space, 2 classes

0

0

1

1

f1

f2

Page 25: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

25

the other class…

0

0

1

1

f1

f2

Page 26: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

26

2 left vs 2 right

0

0

1

1

f1

f2

Page 27: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

27

top vs bottom

0

0

1

1

f1

f2

Page 28: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

28

right vs left

0

0

1

1

f1

f2

Page 29: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

29

bottom vs top

0

0

1

1

f1

f2

Page 30: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

30

lower-right outlier

0

0

1

1

f1

f2

Page 31: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

31

lower-left outlier

0

0

1

1

f1

f2

Page 32: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

32

upper-left outlier

0

0

1

1

f1

f2

Page 33: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

33

upper-right outlier

0

0

1

1

f1

f2

Page 34: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

34

etc.

0

0

1

1

f1

f2

Page 35: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

35

2-D feature space

0

0

1

1

f1

f2

Page 36: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

36

2-D feature space

0

0

1

1

f1

f2

Page 37: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

37

2-D feature space

0

0

1

1

f1

f2

Page 38: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

38

XOR configuration A

0

0

1

1

f1

f2

Page 39: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

39

XOR configuration B

0

0

1

1

f1

f2

Page 40: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

40

2-D feature space, two classes: 16 hypotheses

f1=0f1=1f2=0f2=1

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

“hypothesis” = possible class partioning of all data samples

Page 41: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

41

2-D feature space, two classes, 16 hypotheses

f1=0f1=1f2=0f2=1

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

two XOR class configurations:

2/16 of hypotheses requires a non-linear separatrix

Page 42: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

42

XOR, a possible non-linear separation

0

0

1

1

f1

f2

Page 43: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

43

XOR, a possible non-linear separation

0

0

1

1

f1

f2

Page 44: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

44

2-D feature space, three classes, # hypotheses?

f1=0f1=1f2=0f2=1

0 1 2 3 4 5 6 7 8

…………

Page 45: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

45

2-D feature space, three classes, # hypotheses?

f1=0f1=1f2=0f2=1

0 1 2 3 4 5 6 7 8

…………

34 = 81 possible hypotheses

Page 46: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

46

Maximum, discrete space

Four classes: 44 = 256 hypotheses

Assume that there are no more classes than discrete cells

Nhypmax = ncellsnclasses

Page 47: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

47

2-D feature space, three classes…

0

0

1

1

f1

f2

In this example, is linearly separatablefrom the rest, as is .

But is not linearly separatable from the rest of the classes.

Page 48: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

48

2-D feature space, four classes…

0

0

1

1

f1

f2 Minsky & Papert:simple tablelookup or logic will do nicely.

Page 49: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

49

2-D feature space, four classes…

0

0

1

1

f1

f2Spheres or radial-basisfunctions may offer a compact classencapsulation in case of limited noise andlimited overlap

(but in the end the datawill tell: experimentationrequired!)

Page 50: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

50

SVM (1): Kernels

Complicated separation boundary

Simple separation boundary: Hyperplane

f1

f2

f1

f2

f3

Kernels Polynomial Radial basis Sigmoid

Implicit mapping to a higher dimensional space where linear separation is possible.

Page 51: Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

51

SVM (2): Max Margin

Support vectors

Max Margin

“Best” Separating Hyperplane

From all the possible separating hyperplanes, select the one that gives Max Margin.

Solution found by Quadratic Optimization – “Learning”.

f1

f2Good generalization