Top Banner
Maria-Florina Balcan Active Learning Maria Florina Balcan Lecture 26th
19

Maria-Florina Balcan Active Learning Maria Florina Balcan Lecture 26th.

Dec 13, 2015

Download

Documents

Whitney Ross
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Maria-Florina Balcan Active Learning Maria Florina Balcan Lecture 26th.

Maria-Florina Balcan

Active Learning

Maria Florina Balcan

Lecture 26th

Page 2: Maria-Florina Balcan Active Learning Maria Florina Balcan Lecture 26th.

Active Learning

A Label for that Example

Request for the Label of an Example

A Label for that Example

Request for the Label of an Example

Data Source

Unlabeled examples

. . .

Algorithm outputs a classifier

Learning Algorithm

Expert / Oracle

• The learner can choose specific examples to be labeled. • He works harder, to use fewer labeled examples.

Page 3: Maria-Florina Balcan Active Learning Maria Florina Balcan Lecture 26th.

Maria-Florina Balcan

• Choose the label requests carefully, to get informative labels.

What Makes a Good Algorithm?

• Guaranteed to output a relatively good classifier for most learning problems.

• Doesn’t make too many label requests.

Page 4: Maria-Florina Balcan Active Learning Maria Florina Balcan Lecture 26th.

Maria-Florina Balcan

Can It Really Do Better Than Passive?

• YES! (sometimes)

• We often need far fewer labels for active learning than for passive.

• This is predicted by theory and has been observed in practice.

Page 5: Maria-Florina Balcan Active Learning Maria Florina Balcan Lecture 26th.

Can adaptive querying help? [CAL92, Dasgupta04]

• Threshold fns on the real line:

w

+-

Exponential improvement.

hw(x) = 1(x ¸ w), C = {hw: w 2 R}

• Sample with 1/ unlabeled examples; do binary search.-

• Binary search – need just O(log 1/) labels.

Active: only O(log 1/) labels.

Passive supervised: (1/) labels to find an -accurate threshold.

+-

Active Algorithm

Other interesting results as well.

Page 6: Maria-Florina Balcan Active Learning Maria Florina Balcan Lecture 26th.

Maria-Florina Balcan

Active Learning might not help [Dasgupta04]

C = {linear separators in R1}: active learning reduces sample complexity substantially.

In this case: learning to accuracy requires 1/ labels…

In general, number of queries needed depends on C and also on D.

C = {linear separators in R2}: there are some target hyp. for which no improvement can be achieved! - no matter how benign the input distr.

h1

h2

h3

h0

Page 7: Maria-Florina Balcan Active Learning Maria Florina Balcan Lecture 26th.

Maria-Florina Balcan

Examples where Active Learning helps

• C = {linear separators in R1}: active learning reduces sample complexity substantially no matter what is the input distribution.

In general, number of queries needed depends on C and also on D.

• C - homogeneous linear separators in Rd, D - uniform distribution over unit sphere: • need only d log 1/ labels to find a hypothesis

with error rate < .

• Freund et al., ’97.

• Dasgupta, Kalai, Monteleoni, COLT 2005

• Balcan-Broder-Zhang, COLT 07

Page 8: Maria-Florina Balcan Active Learning Maria Florina Balcan Lecture 26th.

Maria-Florina Balcan

Region of uncertainty [CAL92]

current version space

• Example: data lies on circle in R2 and hypotheses are homogeneous linear separators.

region of uncertainty in data space

• Current version space: part of C consistent with labels so far.• “Region of uncertainty” = part of data space about which there is still some uncertainty (i.e. disagreement within version space)

++

Page 9: Maria-Florina Balcan Active Learning Maria Florina Balcan Lecture 26th.

Maria-Florina Balcan

Region of uncertainty [CAL92]

Pick a few points at random from the current region of uncertainty and query their labels.

current version space

region of uncertainy

Algorithm:

Page 10: Maria-Florina Balcan Active Learning Maria Florina Balcan Lecture 26th.

Maria-Florina Balcan

Region of uncertainty [CAL92]

• Current version space: part of C consistent with labels so far.• “Region of uncertainty” = part of data space about which there is still some uncertainty (i.e. disagreement within version space)

current version space

region of uncertainty in data space

++

Page 11: Maria-Florina Balcan Active Learning Maria Florina Balcan Lecture 26th.

Maria-Florina Balcan

Region of uncertainty [CAL92]

• Current version space: part of C consistent with labels so far.• “Region of uncertainty” = part of data space about which there is still some uncertainty (i.e. disagreement within version space)

new version space

New region of uncertainty in data space

++

Page 12: Maria-Florina Balcan Active Learning Maria Florina Balcan Lecture 26th.

Maria-Florina Balcan

Region of uncertainty [CAL92], Guarantees

Algorithm: Pick a few points at random from the current region of uncertainty and query their labels.

• C - homogeneous linear separators in Rd, D -uniform distribution over unit sphere.

• low noise, need only d2 log 1/ labels to find a hypothesis with error rate < .• realizable case, d3/2 log 1/ labels.

•supervised -- d/ labels.

[Balcan, Beygelzimer, Langford, ICML’06]

Analyze a version of this alg. which is robust to noise.

• C- linear separators on the line, low noise, exponential improvement.

Page 13: Maria-Florina Balcan Active Learning Maria Florina Balcan Lecture 26th.

Maria-Florina Balcan

Margin Based Active-Learning Algorithm

Use O(d) examples to find w1 of error 1/8.

iterate k=2, … , log(1/) • rejection sample mk samples x from D

satisfying |wk-1T ¢ x| · k ;

• label them;• find wk 2 B(wk-1, 1/2k ) consistent with all these

examples.end iterate

[Balcan-Broder-Zhang, COLT 07]

wk

wk+

1

γk

w*

Page 14: Maria-Florina Balcan Active Learning Maria Florina Balcan Lecture 26th.

Maria-Florina Balcan

Margin Based Active-Learning, Realizable Case

u

v

(u,v)

Theorem

If then after

ws has error · .

PX is uniform over Sd.

and

iterations

Fact 1

Fact 2v

Page 15: Maria-Florina Balcan Active Learning Maria Florina Balcan Lecture 26th.

Maria-Florina Balcan

Margin Based Active-Learning, Realizable Case

u

v

(u,v)

If and

Theorem

If then after

ws has error · .

PX is uniform over Sd.

and

iterations

Fact 1

Fact 3vu

v

Page 16: Maria-Florina Balcan Active Learning Maria Florina Balcan Lecture 26th.

Maria-Florina Balcan

BBZ’07, Proof Ideaiterate k=2, … , log(1/)

Rejection sample mk samples x from D satisfying |wk-1

T ¢ x| · k ;ask for labels and find wk 2 B(wk-1, 1/2k ) consistent with all these examples.

end iterate

Assume wk has error · . We are done if 9 k s.t. wk+1 has error · /2

and only need O(d log( 1/)) labels in round k.

wk

wk+1

γk

w*

Page 17: Maria-Florina Balcan Active Learning Maria Florina Balcan Lecture 26th.

Maria-Florina Balcan

BBZ’07, Proof Ideaiterate k=2, … , log(1/)

Rejection sample mk samples x from D satisfying |wk-1

T ¢ x| · k ;ask for labels and find wk 2 B(wk-1, 1/2k ) consistent with all these examples.

end iterate

Assume wk has error · . We are done if 9 k s.t. wk+1 has error · /2

and only need O(d log( 1/)) labels in round k.

wk

wk+1

γk

w*

Page 18: Maria-Florina Balcan Active Learning Maria Florina Balcan Lecture 26th.

Maria-Florina Balcan

BBZ’07, Proof Ideaiterate k=2, … , log(1/)

Rejection sample mk samples x from D satisfying |wk-1

T ¢ x| · k ;ask for labels and find wk 2 B(wk-1, 1/2k ) consistent with all these examples.

end iterate

Assume wk has error · . We are done if 9 k s.t. wk+1 has error · /2

and only need O(d log( 1/)) labels in round k.

Key Point

Under the uniform distr. assumption for

we have · /4 wk

wk+1

γk

w*

Page 19: Maria-Florina Balcan Active Learning Maria Florina Balcan Lecture 26th.

Maria-Florina Balcan

BBZ’07, Proof Idea

Key Point

Under the uniform distr. assumption for

we have · /4

So, it’s enough to ensure that

We can do so by only using O(d log( 1/)) labels in round k.

wk

wk+1

γk

w*

Key Point