Top Banner
Active Perspectives on Computational Learning and Testing Liu Yang Slide 1
27

Active Perspectives on Computational Learning and Testing Liu Yang Slide 1.

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Active Perspectives on Computational Learning and Testing Liu Yang Slide 1.

Active Perspectives on Computational Learning and Testing

Liu Yang

Slide 1

Page 2: Active Perspectives on Computational Learning and Testing Liu Yang Slide 1.

Interaction

• An Interactive Protocol - Have algorithm interact with oracle/experts• Care about - Query complexity - Computational Efficiency • Question: how much better can learner do w/

interaction, vs. getting data in one shot ?

Slide 2

Page 3: Active Perspectives on Computational Learning and Testing Liu Yang Slide 1.

Notation

• Instance space X = {0, 1}^n • Concept space C: collection of fn h: X -> {-1,1}• Distribution D over X • Unknown target function h*: the true labeling

function (Realizable case: h* in C) • Err(h) = Px~D[h(x) ~= h*(x)]

Slide 3

Page 4: Active Perspectives on Computational Learning and Testing Liu Yang Slide 1.

“Active” means Label Request

• Label request: have a pool of unlabeled exs, pick any x and receive h*(x), repeat

Alg achieves Query Complexity S(ε,δ,h) for (C,D) if it outputs hn after ≤ n label requests, and for any h* in C,

ε > 0, δ > 0, n ≥ S(ε, δ, h), P[err(hn ≤ ε )] ≥ 1- δ

• Motivation: labeled data is expensive to get • Using label request, can do - Active Learning: find h has small err(h) - Active Testing: decide h* in C or far from C

Slide 4

Page 5: Active Perspectives on Computational Learning and Testing Liu Yang Slide 1.

Thesis Outline

• Bayesian Active Learning Using Arbitrary Binary Valued Queries (Published)

• Active Property Testing (Major results submitted)

• Self-Verifying Bayesian Active Learning (Published)

• A Theory of Transfer Learning (Accepted)

• Learning with General Types of Query (In progress)

• Active Learning with a Drifting Distri. (Submitted)

This Talk

Slide 5

Page 6: Active Perspectives on Computational Learning and Testing Liu Yang Slide 1.

Outline

• Active Property Testing (Submitted) • Self-Verifying Bayesian Active Learning (Published)

• Transfer Learning (Accepted)

• Learning with General Types of Query (in progress)

Slide 6

Page 7: Active Perspectives on Computational Learning and Testing Liu Yang Slide 1.

Property Testing

• What Property Testing is for ? - Quickly tell whether the right fn class - Estimate complexity of fn without actually learning • Question : Can you do w/ fewer queries than

learning ? • Yes !!! e.g. Union of d Intervals, testing help!!!

----++++----+++++++++-----++---+++-------- - Testing tells how big d need to be close to target - #Label: Active Testing need O(1), Passive Testing

need Θ(√d), Active Learning need Θ(d)Slide 7

Page 8: Active Perspectives on Computational Learning and Testing Liu Yang Slide 1.

Active Property Testing

• Passive Testing : has no control on labeled exs• Membership Query: unrealistic to query fn. at

arbitrary points• Active query asks labels from what exist in env• Question : Is active testing still get significant

benefit in label requests over passive testing ?

Slide 8

Page 9: Active Perspectives on Computational Learning and Testing Liu Yang Slide 1.

Property Tester

• - Accepts w.p. >= 2/3 if h* in C - Rejects w.p. >= 2/3 if d(h*;C) = ming in C Px~D [h*(x) ≠ g(x)] ≥ ε

Slide 9

Page 10: Active Perspectives on Computational Learning and Testing Liu Yang Slide 1.

Testing Union of d Intervals• ----++++----+++++++++-----++---+++--------

Theorem. Testing unions of <=d intervals in active testing model uses only queries.

• Noise Sensitivity := Pr [two close points labeled diff]• Proof Idea: - all unions of d intervals have low noise sensitivity - all functions that are far from this class have

noticeably larger noise sensitivity - we introduce a tester that estimates the noise

sensitivity of the input function.

Slide 10

Page 11: Active Perspectives on Computational Learning and Testing Liu Yang Slide 1.

Summary of Results

★★Active Testing (constant # of queries ) much better than Passive Testing and Active Learning on unions of intervals, cluster assumption, margin assumption★ Testing is easier than learning on LTF✪ For dictator (single variable fn), Active Testing no help

Slide 11

Page 12: Active Perspectives on Computational Learning and Testing Liu Yang Slide 1.

Outline

• Active Property Testing (Submitted)

• Self-Verifying Bayesian Active Learning (Published)

• Transfer Learning (Accepted)

• Learning with General Types of Query (in progress)

Slide 12

Page 13: Active Perspectives on Computational Learning and Testing Liu Yang Slide 1.

Self-Verifying Bayesian Active Learning

Self-verifying (a special type of stopping criterion)- given ε, adaptively decides # of query, then halts- has the property that E[err] < ε when halts

Question: Can you do with E[#query] = o(1/ε) ? (passive learning need 1/ε labels)

Slide 13

Page 14: Active Perspectives on Computational Learning and Testing Liu Yang Slide 1.

Example: Intervals

- -

Suppose D is uniform on [0,1]

0 1

Slide 14

+

Page 15: Active Perspectives on Computational Learning and Testing Liu Yang Slide 1.

Example: Intervals

Suppose h* is empty interval, D is uniform on [0,1]

}

2 2

} } } } } } } } } } } } } } } } } } } } } } } } } } } } } } }

2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

10

Verification Lower Bound

alg somehow arrives at h, er(h) < ε; how to verify h ε close to h* ? h* ε close to h every h’ ε outside green ball is not h* - Everything on red circle is outside green ball; so have to verify those are not h* - Suppose h* is empty interval, then er(h) < ε => p(h=+) < ε.

h* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Slide 15

h*h

C

ε 2ε

- In particular, intervals of width 2 are on red circle . - Need one label in each interval to verify it is not h*- So need (1/) labels to verify the target isn’t one of these.

Page 16: Active Perspectives on Computational Learning and Testing Liu Yang Slide 1.

Learning with a prior

• Suppose we know a distribution the target is sampled from, call it prior

Slide 16

Page 17: Active Perspectives on Computational Learning and Testing Liu Yang Slide 1.

Interval Example with prior- - - - - |+++++++|- - - - - -

• Algorithm: Query random pts till find first +, do binary search to find end-pts. Halt when reach a pre-specified prior-based query budget. Output posterior’s Bayes classifier.

• Let budget N be high enough so E[err] < ε - N = o(1/ε) sufficient for E[err|w*>0] < ε: can learn each interval

of w > 0 w/ o(1/eps) queries; by DCT, the average of any collection of o(1/N) fns is o(1/N), so only need N=o(1/ε) to make E[err|w*>0] < ε.

- N = o(1/ε) sufficient for E[err|w*=0] < ε: if P(w*=0)>0, then after some L = O(log(1/ε)) queries, w.p.> 1-ε, most prob. mass on empty interval, so posterior’s Bayes classifier has 0 error rate

Slide 17

Page 18: Active Perspectives on Computational Learning and Testing Liu Yang Slide 1.

Can do o(1/eps) for any VC-class

Theorem : With the prior, can do o(1/ε)• There are methods that find a good classifier

in o(1/eps) queries (though they aren’t self-verifying) [see TrueSampleComplexityAL08]

• Need to set a stopping criterion for those alg• The stop criterion we use : let alg run until

make a certain #query (set the budget to be just large enough so the E[err] < ε)

Slide 18

Page 19: Active Perspectives on Computational Learning and Testing Liu Yang Slide 1.

Outline

• Active Property Testing (Submitted) • Self-Verifying Bayesian Active Learning (Published)

• Transfer Learning (Accepted)

• Learning with General Types of Query (in progress)

Slide 19

Page 20: Active Perspectives on Computational Learning and Testing Liu Yang Slide 1.

prior

h1*

x11,y1

1 … x1k,y1

k

Task 1

hT*

xT1,yT

1 … xTk,yT

k

Task T

Model of Transfer Learning Motivation: Learners often Not Too Altruistic

h2*

x21,y2

1 … x2k,y2

k

Task 2

Layer 1: draw task i.i.d. from (unknown) prior

Layer 2: per task, draw data i.i.d. from target

Better Estimate of Prior

Slide 20

Page 21: Active Perspectives on Computational Learning and Testing Liu Yang Slide 1.

Insights

• Using a good estimate of the prior is almost as good as using the true prior

- Now we only need a VC-dim # of additional points from each task, to get a good estimate of the prior

- We’ve seen self-verifying alg, if given the true prior, has guarantee on the err and # queries

- As #task ->∞, when call self-verifying alg, it outputs a classifier as good as if it had the true prior

Slide 21

Page 22: Active Perspectives on Computational Learning and Testing Liu Yang Slide 1.

Main Result

• Design an alg using this insight - Uses at most VC-dim # additional labeled points per

task (vs. learning with known true prior) - Estimate joint distribution on (x1,y1)…(xd,yd) - Invert to get prior estimate• Running this alg asymptotically just as good as

having direct knowledge of prior (Bayesian) - [HKS] showed passive save const. factors in Θ(1/ε) sample complexity per

task (replace vc-dim w/ a prior-dependent complexity measure) - We showed access to prior can improve active sample complexity,

sometimes from Θ(1/ε) to o(1/ε).

Slide 22

Page 23: Active Perspectives on Computational Learning and Testing Liu Yang Slide 1.

Estimating the prior Insight : Identifiability of priors by d-dim joint distri.

• Distrib of full sequence (x1,y1),(x2,y2),… uniquely identifies prior• Set of joint distribs on (x1,y1),…,(xk,yk) s.t. 1≤k<∞

identifies distrib of full sequence (x1,y1),(x2,y2),… • For any k > d=VC-dim, can express distrib on (x1,y1),…,(xk,yk)

in terms of distrib of (x1,y1),…,(xd,yd).• How to do it when d =1 ? e.g. threshold - for two points x1, x2, if x1 < x2, then Pr(+,-)=0,

Pr(+,+)=Pr(+.), Pr(-,-)=Pr(.-), Pr(-,+)=Pr(.+)-Pr(++) = Pr(.+)-Pr(+.) - for any k > 1 points, can directly to reduce from k to 1 P(-----------(-+)++++++++++) = P( (-+) ) = P( (.+) ) - P( (+.) )

Slide 23

Page 24: Active Perspectives on Computational Learning and Testing Liu Yang Slide 1.

Outline

• Active Property Testing (Submitted)

• Self-Verifying Bayesian Active learning (published)

• Transfer Learning (Accepted)

• Learning with General Types of Query (in progress)

- Learning DNF formula - Learning Voronoi

Slide 24

Page 25: Active Perspectives on Computational Learning and Testing Liu Yang Slide 1.

Learning with General Query

• Construct problem-specific queries used to efficiently learn those problems having no known efficient algorithms to PAC-learn.

Slide 25

Page 26: Active Perspectives on Computational Learning and Testing Liu Yang Slide 1.

Learning DNF formulas

- DNF formulas:n is # of variables;

poly-sized DNF: # of terms is nO(1) e.g. set of fn like f = (x1 x2) (x1 x4) ∧ ∨ ∧

- Natural form of knowledge representation- [Valiant 1984]; a great challenge over 20 years- PAC-learning DNF formulas appears to be very hard.- Fastest known alg[KS01] runs in time exp(n1/3 log2n).- If the alg forced to output a hypothesis which itself is

a DNF, the problem is NP-hard.

(x1∧x2)

(x1∧x4)

Slide 26

Page 27: Active Perspectives on Computational Learning and Testing Liu Yang Slide 1.

Thanks !

Slide 31