Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

Computational Learning Theory• PAC• IID• VC Dimension• SVM

Kunstmatige Intelligentie / RuG

KI2 - 5

Marius Bulacu & prof. dr. Lambert Schomaker

Learning

Learning is essential for unknown environments–i.e., when designer lacks omniscience

Learning is useful as a system construction method–i.e., expose the agent to reality rather than trying to write it down

Learning modifies the agent's decision mechanisms to improve performance

Learning Agents

Learning Element

Design of a learning element is affected by:– Which components of the performance element are to be learned– What feedback is available to learn these components– What representation is used for the components

Type of feedback:– Supervised learning: correct answers for each example– Unsupervised learning: correct answers not given– Reinforcement learning: occasional rewards

Inductive Learning

Simplest form: learn a function from examples

- f is the target function

- an example is a pair (x, f(x))

Problem: find a hypothesis h

such that h ≈ f

given a training set of examples

This is a highly simplified model of real learning:

- ignores prior knowledge

- assumes examples are given

Inductive Learning Method

Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve fitting:

Occam’s razor: prefer the simplest hypothesis consistent with data

Occam’s Razor

William of Occam

(1285-1349, England)

“If two theories explain the facts equally well, then the simpler theory is to be preferred.”

Rationale:

There are fewer short hypotheses than long hypotheses.

A short hypothesis that fits the data is unlikely to be a coincidence.

A long hypothesis that fits the data may be a coincidence.

Formal treatment in computational learning theory

The Problem

• Why does learning work?

• How do we know that the learned hypothesis h is close to the target function f if we do not know what f is?

answer provided by

computational learning theory

The Answer

• Any hypothesis h that is consistent with a sufficiently large number of training examples is unlikely to be seriously wrong.

Therefore it must be:

Probably Approximately Correct

The Stationarity Assumption

• The training and test sets are drawn randomly from the same population of examples using the same probability distribution.

Therefore training and test data are

Independently and Identically Distributed

“the future is like the past”

How many examples are needed?

Number of examples Probability that h and f disagree on an example

Probability of existence of a wrong hypothesis

consistent with all examples

)Hln(lnm 11

Size of hypothesis space

Sample complexity

Formal Derivation

H (the set of all possible hypothese)

HBAD (the set of “wrong” hypotheses)

1))x(f)x(h,x(P

))x(f)x(h,x(P

)Hln(lnm)(H

)(H)Hh(P

mBADBAD

What if hypothesis space is infinite?

Can’t use our result for finite H Need some other measure of complexity for H

– Vapnik-Chervonenkis dimension

Shattering two binary dimensionsover a number of classes

In order to understand the principle of shattering sample points into classes we will look at the simple case of

two dimensions

of binary value

2-D feature space

2-D feature space, 2 classes

the other class…

2 left vs 2 right

top vs bottom

right vs left

bottom vs top

lower-right outlier

lower-left outlier

upper-left outlier

upper-right outlier

2-D feature space

XOR configuration A

XOR configuration B

2-D feature space, two classes: 16 hypotheses

f1=0f1=1f2=0f2=1

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

“hypothesis” = possible class partioning of all data samples

2-D feature space, two classes, 16 hypotheses

f1=0f1=1f2=0f2=1

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

two XOR class configurations:

2/16 of hypotheses requires a non-linear separatrix

XOR, a possible non-linear separation

2-D feature space, three classes, # hypotheses?

f1=0f1=1f2=0f2=1

0 1 2 3 4 5 6 7 8

…………

2-D feature space, three classes, # hypotheses?

f1=0f1=1f2=0f2=1

0 1 2 3 4 5 6 7 8

…………

34 = 81 possible hypotheses

Maximum, discrete space

Four classes: 44 = 256 hypotheses

Assume that there are no more classes than discrete cells

Nhypmax = ncellsnclasses

2-D feature space, three classes…

In this example, is linearly separatablefrom the rest, as is .

But is not linearly separatable from the rest of the classes.

2-D feature space, four classes…

f2 Minsky & Papert:simple tablelookup or logic will do nicely.

2-D feature space, four classes…

f2Spheres or radial-basisfunctions may offer a compact classencapsulation in case of limited noise andlimited overlap

(but in the end the datawill tell: experimentationrequired!)

SVM (1): Kernels

Complicated separation boundary

Simple separation boundary: Hyperplane

Kernels Polynomial Radial basis Sigmoid

Implicit mapping to a higher dimensional space where linear separation is possible.

SVM (2): Max Margin

Support vectors

Max Margin

“Best” Separating Hyperplane

From all the possible separating hyperplanes, select the one that gives Max Margin.

Solution found by Quadratic Optimization – “Learning”.

f2Good generalization

Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

learning learning

examples f

learning agents

reinforcement learning

training set h

training set of examples

data slide

learning element design

Documents

Project I 2 RP Intelligent Information Retrieval and...

Data Mining: An Introductory Overview Kunstmatige...

Școala exegetică biblică din Antiohia - Pr. Dr. Mihail...

Professional Interviewing Prepare & Be...

...Stefan Zanfir Corina Fasui Petra Cismaru Ion Alexandrescu...

WWV2015: Valk Exclusief_Gerard Schomaker

ki2 mod01 -...

Giaoan Tuchon Toan10 Ki2 Nangcao

Wissensrepräsentation und Lernen › ... › SS04 › KI2.....

Gerard Schomaker - De toegevoegde waarde van e-mailmarketing...

com-studies.orgcom-studies.org/images/magazine/2014/KI2.pdf�...

Text Detection and Pose Estimation for a Reading Robot · 3...

· BARBULESCU Ion loana-Si1via BÄJENARU ALEXANDRU ION...

1 Kunstmatige Intelligentie / RuG KI2 - 11 Reinforcement...

Catalytic Asymmetric Electrocyclizations: Early...

Lambert Schomaker KI2 - 2 Kunstmatige Intelligentie / RuG.