Top Banner
Learning From Data Lecture 6 Bounding The Growth Function Bounding the Growth Function Models are either Good or Bad The VC Bound - replacing |H| with m H (N ) M. Magdon-Ismail CSCI 4100/6100
31

LearningFromData Lecture6 Bounding …magdon/courses/LFD-Slides/SlidesLect06.pdfc AML Creator: MalikMagdon-Ismail Bounding the Growth Function: 11/31 Combinatorialpuzzleagain−→

Jun 10, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: LearningFromData Lecture6 Bounding …magdon/courses/LFD-Slides/SlidesLect06.pdfc AML Creator: MalikMagdon-Ismail Bounding the Growth Function: 11/31 Combinatorialpuzzleagain−→

Learning From DataLecture 6

Bounding The Growth Function

Bounding the Growth FunctionModels are either Good or Bad

The VC Bound - replacing |H| with mH(N)

M. Magdon-IsmailCSCI 4100/6100

Page 2: LearningFromData Lecture6 Bounding …magdon/courses/LFD-Slides/SlidesLect06.pdfc AML Creator: MalikMagdon-Ismail Bounding the Growth Function: 11/31 Combinatorialpuzzleagain−→

recap: The Growth Function mH(N )

A new measure for the diversity of a hypothesis set.

H(x1, . . . ,xN) = {(h(x1), . . . , h(xN))}

The dichotomies (N -tuples) H implements on x1, . . . ,xN .

H H viewed through D

The growth function mH(N) considers the worst possible x1, . . . ,xN .

mH(N) = maxx1,...,xN

|H(x1, . . . ,xN)|.

This lecture: Can we bound mH(N) by a polynomial in N?

Can we replace |H| by mH(N) in the generalization bound?

c© AML Creator: Malik Magdon-Ismail Bounding the Growth Function: 2 /31 Example growth functions −→

Page 3: LearningFromData Lecture6 Bounding …magdon/courses/LFD-Slides/SlidesLect06.pdfc AML Creator: MalikMagdon-Ismail Bounding the Growth Function: 11/31 Combinatorialpuzzleagain−→

Example Growth Functions

N

1 2 3 4 5 · · ·

2-D perceptron 2 4 8 14 · · ·

1-D pos. ray 2 3 4 5 · · ·

2-D pos. rectangles 2 4 8 16 < 25 · · ·

•mH(N) drops below 2N – there is hope.

• A break point is any k for which mH(k) < 2k.

c© AML Creator: Malik Magdon-Ismail Bounding the Growth Function: 3 /31 Quiz I −→

Page 4: LearningFromData Lecture6 Bounding …magdon/courses/LFD-Slides/SlidesLect06.pdfc AML Creator: MalikMagdon-Ismail Bounding the Growth Function: 11/31 Combinatorialpuzzleagain−→

Pop Quiz I

I give you a set of k∗ points x1, . . . ,xk∗ on which H implements < 2k∗

dichotomys.

(a) k∗ is a break point.

(b) k∗ is not a break point.

(c) all break points are > k∗.

(d) all break points are ≤ k∗.

(e) we don’t know anything about break points.

c© AML Creator: Malik Magdon-Ismail Bounding the Growth Function: 4 /31 Answer −→

Page 5: LearningFromData Lecture6 Bounding …magdon/courses/LFD-Slides/SlidesLect06.pdfc AML Creator: MalikMagdon-Ismail Bounding the Growth Function: 11/31 Combinatorialpuzzleagain−→

Pop Quiz I

I give you a set of k∗ points x1, . . . ,xk∗ on which H implements < 2k∗

dichotomys.

(a) k∗ is a break point.

(b) k∗ is not a break point.

(c) all break points are > k∗.

(d) all break points are ≤ k∗.

X (e) we don’t know anything about break points.

c© AML Creator: Malik Magdon-Ismail Bounding the Growth Function: 5 /31 Quiz II −→

Page 6: LearningFromData Lecture6 Bounding …magdon/courses/LFD-Slides/SlidesLect06.pdfc AML Creator: MalikMagdon-Ismail Bounding the Growth Function: 11/31 Combinatorialpuzzleagain−→

Pop Quiz II

For every set of k∗ points x1, . . . ,xk∗, H implements < 2k∗

dichotomys.

(a) k∗ is a break point.

(b) k∗ is not a break point.

(c) all k ≥ k∗ are break points.

(d) all k < k∗ are break points.

(e) we don’t know anything about break points.

c© AML Creator: Malik Magdon-Ismail Bounding the Growth Function: 6 /31 Answer −→

Page 7: LearningFromData Lecture6 Bounding …magdon/courses/LFD-Slides/SlidesLect06.pdfc AML Creator: MalikMagdon-Ismail Bounding the Growth Function: 11/31 Combinatorialpuzzleagain−→

Pop Quiz II

For every set of k∗ points x1, . . . ,xk∗, H implements < 2k∗

dichotomys.

X (a) k∗ is a break point.

(b) k∗ is not a break point.

X (c) all k ≥ k∗ are break points.

(d) all k < k∗ are break points.

(e) we don’t know anything about break points.

c© AML Creator: Malik Magdon-Ismail Bounding the Growth Function: 7 /31 Quiz III −→

Page 8: LearningFromData Lecture6 Bounding …magdon/courses/LFD-Slides/SlidesLect06.pdfc AML Creator: MalikMagdon-Ismail Bounding the Growth Function: 11/31 Combinatorialpuzzleagain−→

Pop Quiz III

To show that k is not a break point for H:

(a) Show a set of k points x1, . . .xk which H can shatter.

(b) Show H can shatter any set of k points.

(c) Show a set of k points x1, . . .xk which H cannot shatter.

(d) Show H cannot shatter any set of k points.

(e) Show mH(k) = 2k.

c© AML Creator: Malik Magdon-Ismail Bounding the Growth Function: 8 /31 Answer −→

Page 9: LearningFromData Lecture6 Bounding …magdon/courses/LFD-Slides/SlidesLect06.pdfc AML Creator: MalikMagdon-Ismail Bounding the Growth Function: 11/31 Combinatorialpuzzleagain−→

Pop Quiz III

To show that k is not a break point for H:

X (a) Show a set of k points x1, . . .xk which H can shatter.

overkill (b) Show H can shatter any set of k points.

(c) Show a set of k points x1, . . .xk which H cannot shatter.

(d) Show H cannot shatter any set of k points.

X (e) Show mH(k) = 2k.

c© AML Creator: Malik Magdon-Ismail Bounding the Growth Function: 9 /31 Quiz IV −→

Page 10: LearningFromData Lecture6 Bounding …magdon/courses/LFD-Slides/SlidesLect06.pdfc AML Creator: MalikMagdon-Ismail Bounding the Growth Function: 11/31 Combinatorialpuzzleagain−→

Pop Quiz IV

To show that k is a break point for H:

(a) Show a set of k points x1, . . .xk which H can shatter.

(b) Show H can shatter any set of k points.

(c) Show a set of k points x1, . . .xk which H cannot shatter.

(d) Show H cannot shatter any set of k points.

(e) Show mH(k) > 2k.

c© AML Creator: Malik Magdon-Ismail Bounding the Growth Function: 10 /31 Answer −→

Page 11: LearningFromData Lecture6 Bounding …magdon/courses/LFD-Slides/SlidesLect06.pdfc AML Creator: MalikMagdon-Ismail Bounding the Growth Function: 11/31 Combinatorialpuzzleagain−→

Pop Quiz IV

To show that k is a break point for H:

(a) Show a set of k points x1, . . .xk which H can shatter.

(b) Show H can shatter any set of k points.

(c) Show a set of k points x1, . . .xk which H cannot shatter.

X (d) Show H cannot shatter any set of k points.

(e) Show mH(k) > 2k.

c© AML Creator: Malik Magdon-Ismail Bounding the Growth Function: 11 /31 Combinatorial puzzle again −→

Page 12: LearningFromData Lecture6 Bounding …magdon/courses/LFD-Slides/SlidesLect06.pdfc AML Creator: MalikMagdon-Ismail Bounding the Growth Function: 11/31 Combinatorialpuzzleagain−→

Back to Our Combinatorial Puzzle

How many dichotomies can you list on 4 points so that no 2 is shattered.

x1 x2 x3 x4

◦ ◦ ◦ ◦◦ ◦ ◦ •◦ ◦ • ◦◦ • ◦ ◦• ◦ ◦ ◦

Can we add a 6th dichotomy?

c© AML Creator: Malik Magdon-Ismail Bounding the Growth Function: 12 /31 Can’t add a 6th dichotomy −→

Page 13: LearningFromData Lecture6 Bounding …magdon/courses/LFD-Slides/SlidesLect06.pdfc AML Creator: MalikMagdon-Ismail Bounding the Growth Function: 11/31 Combinatorialpuzzleagain−→

Can’t Add A 6th Dichotomy

x1 x2 x3 x4

◦ ◦ ◦ ◦◦ ◦ ◦ •◦ ◦ • ◦◦ • ◦ ◦• ◦ ◦ ◦◦ • • ◦

c© AML Creator: Malik Magdon-Ismail Bounding the Growth Function: 13 /31 B(N,K) −→

Page 14: LearningFromData Lecture6 Bounding …magdon/courses/LFD-Slides/SlidesLect06.pdfc AML Creator: MalikMagdon-Ismail Bounding the Growth Function: 11/31 Combinatorialpuzzleagain−→

The Combinatorial Quantity B(N, k)

How many dichotomies can you list on 4 points so that no 2 are shattered.↑ ↑N k

B(N, k): Max. number of dichotomys on N points so that no k are shattered.

x1 x2 x3

◦ ◦ ◦◦ ◦ •◦ • ◦• ◦ ◦

x1 x2 x3 x4

◦ ◦ ◦ ◦◦ ◦ ◦ •◦ ◦ • ◦◦ • ◦ ◦• ◦ ◦ ◦

B(3, 2) = 4 B(4, 2) = 5

c© AML Creator: Malik Magdon-Ismail Bounding the Growth Function: 14 /31 B(4, 3) −→

Page 15: LearningFromData Lecture6 Bounding …magdon/courses/LFD-Slides/SlidesLect06.pdfc AML Creator: MalikMagdon-Ismail Bounding the Growth Function: 11/31 Combinatorialpuzzleagain−→

Let’s Try To Bound B(4, 3)

How many dichotomies can you list on 4 points so that no subset of 3 is shattered.

x1 x2 x3 x4

◦ ◦ ◦ ◦◦ ◦ ◦ •◦ ◦ • ◦◦ • ◦ ◦• ◦ ◦ ◦◦ ◦ • •◦ • ◦ •• ◦ ◦ •◦ • • ◦• ◦ • ◦• • ◦ ◦

c© AML Creator: Malik Magdon-Ismail Bounding the Growth Function: 15 /31 Two kinds of dichotomys −→

Page 16: LearningFromData Lecture6 Bounding …magdon/courses/LFD-Slides/SlidesLect06.pdfc AML Creator: MalikMagdon-Ismail Bounding the Growth Function: 11/31 Combinatorialpuzzleagain−→

Two Kinds of Dichotomys

Prefix appears once or prefix appears twice.

x1 x2 x3 x4

◦ ◦ ◦ ◦◦ ◦ ◦ •◦ ◦ • ◦◦ • ◦ ◦• ◦ ◦ ◦◦ ◦ • •◦ • ◦ •• ◦ ◦ •◦ • • ◦• ◦ • ◦• • ◦ ◦

c© AML Creator: Malik Magdon-Ismail Bounding the Growth Function: 16 /31 Reorder the dichotomys −→

Page 17: LearningFromData Lecture6 Bounding …magdon/courses/LFD-Slides/SlidesLect06.pdfc AML Creator: MalikMagdon-Ismail Bounding the Growth Function: 11/31 Combinatorialpuzzleagain−→

Reorder the Dichotomys

x1 x2 x3 x4

α

◦ • • ◦• ◦ • ◦• • ◦ ◦

β

◦ ◦ ◦ ◦◦ ◦ • ◦◦ • ◦ ◦• ◦ ◦ ◦

β

◦ ◦ ◦ •◦ ◦ • •◦ • ◦ •• ◦ ◦ •

α: prefix appears once

β: prefix appears twice

B(4, 3) = α + 2β

c© AML Creator: Malik Magdon-Ismail Bounding the Growth Function: 17 /31 Bound for α+ β −→

Page 18: LearningFromData Lecture6 Bounding …magdon/courses/LFD-Slides/SlidesLect06.pdfc AML Creator: MalikMagdon-Ismail Bounding the Growth Function: 11/31 Combinatorialpuzzleagain−→

First, Bound α + β

x1 x2 x3 x4

α

◦ • • ◦• ◦ • ◦• • ◦ ◦

β

◦ ◦ ◦ ◦◦ ◦ • ◦◦ • ◦ ◦• ◦ ◦ ◦

β

◦ ◦ ◦ •◦ ◦ • •◦ • ◦ •• ◦ ◦ •

α + β ≤ B(3, 3)

↑A list on 3 points, with no 3 shattered (why?)

c© AML Creator: Malik Magdon-Ismail Bounding the Growth Function: 18 /31 Bound for β −→

Page 19: LearningFromData Lecture6 Bounding …magdon/courses/LFD-Slides/SlidesLect06.pdfc AML Creator: MalikMagdon-Ismail Bounding the Growth Function: 11/31 Combinatorialpuzzleagain−→

Second, Bound β

x1 x2 x3 x4

α

◦ • • ◦• ◦ • ◦• • ◦ ◦

β

◦ ◦ ◦ ◦◦ ◦ • ◦◦ • ◦ ◦• ◦ ◦ ◦

β

◦ ◦ ◦ •◦ ◦ • •◦ • ◦ •• ◦ ◦ •

β ≤ B(3, 2)

↑If 2 points are shattered, then using the mirror di-

chotomies you shatter 3 points (why?)

c© AML Creator: Malik Magdon-Ismail Bounding the Growth Function: 19 /31 Combine the bounds −→

Page 20: LearningFromData Lecture6 Bounding …magdon/courses/LFD-Slides/SlidesLect06.pdfc AML Creator: MalikMagdon-Ismail Bounding the Growth Function: 11/31 Combinatorialpuzzleagain−→

Combining to Bound α + 2β

x1 x2 x3 x4

α

◦ • • ◦• ◦ • ◦• • ◦ ◦

β

◦ ◦ ◦ ◦◦ ◦ • ◦◦ • ◦ ◦• ◦ ◦ ◦

β

◦ ◦ ◦ •◦ ◦ • •◦ • ◦ •• ◦ ◦ •

B(4, 3) = α + β + β

≤ B(3, 3) +B(3, 2)

The argument generalizes to (N, k)

B(N, k) ≤ B(N − 1, k)+B(N − 1, k − 1)

c© AML Creator: Malik Magdon-Ismail Bounding the Growth Function: 20 /31 Simple boundary cases −→

Page 21: LearningFromData Lecture6 Bounding …magdon/courses/LFD-Slides/SlidesLect06.pdfc AML Creator: MalikMagdon-Ismail Bounding the Growth Function: 11/31 Combinatorialpuzzleagain−→

Boundary Cases: B(N, 1) and B(N,N )

k

1 2 3 4 5 6 · · ·

N

1 1

2 1 3

3 1 7

4 1 15

5 1 31

6 1 63

... ... . . .

B(N, 1) = 1 (why?)

B(N,N) = 2N − 1 (why?)

c© AML Creator: Malik Magdon-Ismail Bounding the Growth Function: 21 /31 Getting B(3, 2) −→

Page 22: LearningFromData Lecture6 Bounding …magdon/courses/LFD-Slides/SlidesLect06.pdfc AML Creator: MalikMagdon-Ismail Bounding the Growth Function: 11/31 Combinatorialpuzzleagain−→

Recursion Gives B(N, k) Bound

B(N, k) ≤ B(N − 1, k) + B(N − 1, k − 1)

k

1 2 3 4 5 6 · · ·

N

1 1

2 1 3ց ↓

3 1 4 7

4 1 15

5 1 31

6 1 63

... ... ... ... ... ... ... . . .

c© AML Creator: Malik Magdon-Ismail Bounding the Growth Function: 22 /31 Filling the table −→

Page 23: LearningFromData Lecture6 Bounding …magdon/courses/LFD-Slides/SlidesLect06.pdfc AML Creator: MalikMagdon-Ismail Bounding the Growth Function: 11/31 Combinatorialpuzzleagain−→

Recursion Gives B(N, k) Bound

B(N, k) ≤ B(N − 1, k) + B(N − 1, k − 1)

k

1 2 3 4 5 6 · · ·

N

1 1

2 1 3

3 1 4 7

4 1 5 11 15

5 1 6 16 26 31

6 1 7 22 42 57 63

... ... ... ... ... ... ... . . .

c© AML Creator: Malik Magdon-Ismail Bounding the Growth Function: 23 /31 B(N, k) ≤

k−1∑

i=0

(

N

i

)

−→

Page 24: LearningFromData Lecture6 Bounding …magdon/courses/LFD-Slides/SlidesLect06.pdfc AML Creator: MalikMagdon-Ismail Bounding the Growth Function: 11/31 Combinatorialpuzzleagain−→

Analytic Bound for B(N, k)

Theorem.

B(N, k) ≤k−1∑

i=0

(

N

i

)

.

Proof: (Induction on N .)

1. Verify for N = 1: B(1, 1) ≤(

1

0

)

= 1 X

2. Suppose B(N, k) ≤k−1∑

i=0

(

N

i

)

.

Lemma.(

N

k

)

+(

N

k−1

)

=(

N+1

k

)

.

B(N + 1, k) ≤ B(N, k) +B(N, k − 1)

≤k−1∑

i=0

(

N

i

)

+k−2∑

i=0

(

N

i

)

=k−1∑

i=0

(

N

i

)

+k−1∑

i=1

(

N

i−1

)

= 1 +k−1∑

i=1

((

N

i

)

+(

N

i−1

))

= 1 +k−1∑

i=1

(

N+1

i

)

(lemma)

=k−1∑

i=0

(

N+1

i

)

c© AML Creator: Malik Magdon-Ismail Bounding the Growth Function: 24 /31 mH(N) ≤ B(N, k) −→

Page 25: LearningFromData Lecture6 Bounding …magdon/courses/LFD-Slides/SlidesLect06.pdfc AML Creator: MalikMagdon-Ismail Bounding the Growth Function: 11/31 Combinatorialpuzzleagain−→

mH(N ) is bounded by B(N, k)!

Theorem. Suppose that H has a break point at k. Then,

mH(N) ≤ B(N, k).

x1 x2 x3 x4 . . . xN

◦ ◦ ◦ ◦ . . . •◦ ◦ ◦ • . . . ◦◦ ◦ • ◦ . . . ◦◦ • ◦ ◦ . . . ◦• ◦ ◦ ◦ . . . •◦ ◦ • • . . . •◦ • ◦ • . . . ◦... ... ... ... . . . ...

Consider any k points.

They cannot be shattered (otherwise k woud not be a break point).

B(N, k) is largest such list.

mH(N) ≤ B(N, k)

c© AML Creator: Malik Magdon-Ismail Bounding the Growth Function: 25 /31 Once broken, forever polynomial −→

Page 26: LearningFromData Lecture6 Bounding …magdon/courses/LFD-Slides/SlidesLect06.pdfc AML Creator: MalikMagdon-Ismail Bounding the Growth Function: 11/31 Combinatorialpuzzleagain−→

Once bitten, twice shy . . . Once Broken, Forever Polynomial

Theorem. If k is any break point for H, so mH(k) < 2k, then

mH(N) ≤k−1∑

i=0

(

N

i

)

.

Facts (Problems 2.5 and 2.6):

k−1∑

i=0

(

N

i

)

Nk−1 + 1

(

eN

k − 1

)k−1(polynomial in N)

This is huge: if we can replace |H| with mH(N) in the bound, then learning is feasible.

c© AML Creator: Malik Magdon-Ismail Bounding the Growth Function: 26 /31 There’s good, bad, no ugly −→

Page 27: LearningFromData Lecture6 Bounding …magdon/courses/LFD-Slides/SlidesLect06.pdfc AML Creator: MalikMagdon-Ismail Bounding the Growth Function: 11/31 Combinatorialpuzzleagain−→

A Hypothesis Set is either Good and Bad

N

logmH(N

)

the good H

the bad Hthe ugly H

N mH(N)

1 2 3 4 5 · · ·

2-D perceptron 2 4 8 14 · · · · · · ≤ N3 + 1

1-D pos. ray 2 3 4 5 · · · · · · ≤ N1 + 1

2-D pos. rectangles 2 4 8 16 < 25 · · · ≤ N4 + 1

c© AML Creator: Malik Magdon-Ismail Bounding the Growth Function: 27 /31 We have a bound on mH; next: |H| ← mH? −→

Page 28: LearningFromData Lecture6 Bounding …magdon/courses/LFD-Slides/SlidesLect06.pdfc AML Creator: MalikMagdon-Ismail Bounding the Growth Function: 11/31 Combinatorialpuzzleagain−→

We have One Step in the Puzzle

X Can we get a polynomial bound on mH(N) even for infinite H?

X Can we replace |H| with mH(N) in the generalization bound?

c© AML Creator: Malik Magdon-Ismail Bounding the Growth Function: 28 /31 Ghost ‘test’ set D′ represents Eout −→

Page 29: LearningFromData Lecture6 Bounding …magdon/courses/LFD-Slides/SlidesLect06.pdfc AML Creator: MalikMagdon-Ismail Bounding the Growth Function: 11/31 Combinatorialpuzzleagain−→

(i) How to Deal With Eout (Sketch)

The ghost data set : a ‘fictitious’ data set D′:Age

Inco

me

Age

Incom

e

Eout

Ein

ր

ց

Age

Inco

me

E ′in

Eout

Probability

distribution

ofE

in,E′ in

EinE ′in

E ′in is like a test error on N new points.

Ein deviates from Eout implies Ein deviates from E ′in.

Ein and E ′in have the same distribution.

P[(E ′in(g), Ein(g)) “deviate”] ≥12P [(Eout(g), Ein(g)) “deviate”]

We can analyze deviations between two in-sample errors.

c© AML Creator: Malik Magdon-Ismail Bounding the Growth Function: 29 /31 D ∪ D′ =⇒ mH(2N) −→

Page 30: LearningFromData Lecture6 Bounding …magdon/courses/LFD-Slides/SlidesLect06.pdfc AML Creator: MalikMagdon-Ismail Bounding the Growth Function: 11/31 Combinatorialpuzzleagain−→

(ii) Real Plus Ghost Data Set = 2N points

x1 x2 x3 . . . xN xN+1 xN+2 xN+3 . . . x2N

◦ ◦ • . . . ◦ • • ◦ . . . ◦

Number of dichotomys is at most mH(2N).

Up to technical details, analyze a “hypothesis set” of size at most mH(2N).

c© AML Creator: Malik Magdon-Ismail Bounding the Growth Function: 30 /31 The VC-Bound −→

Page 31: LearningFromData Lecture6 Bounding …magdon/courses/LFD-Slides/SlidesLect06.pdfc AML Creator: MalikMagdon-Ismail Bounding the Growth Function: 11/31 Combinatorialpuzzleagain−→

The Vapnik-Chervonenkis Bound (VC Bound)

P [|Ein(g)−Eout(g)| > ǫ] ≤ 4mH(2N)e−ǫ2N/8, for any ǫ > 0.

P [|Ein(g)−Eout(g)| ≤ ǫ] ≥ 1− 4mH(2N)e−ǫ2N/8, for any ǫ > 0.

Eout(g) ≤ Ein(g) +√

8Nlog 4mH(2N)

δ, w.p. at least 1− δ.

c© AML Creator: Malik Magdon-Ismail Bounding the Growth Function: 31 /31