Top Banner
Regret to the Best vs. Regret to the Average Eyal Even-Dar Michael Kearns Yishay Mansour Jennifer Wortman Upenn + Tel Aviv Univ. Slides: Csaba
19

Regret to the Best vs. Regret to the Average Eyal Even-DarMichael Kearns Yishay MansourJennifer Wortman TexPoint fonts used in EMF. Read the TexPoint manual.

Jan 01, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Regret to the Best vs. Regret to the Average Eyal Even-DarMichael Kearns Yishay MansourJennifer Wortman TexPoint fonts used in EMF. Read the TexPoint manual.

Regret to the Best vs.

Regret to the Average

Eyal Even-Dar Michael Kearns Yishay Mansour Jennifer Wortman

Upenn + Tel Aviv Univ.Slides: Csaba

Page 2: Regret to the Best vs. Regret to the Average Eyal Even-DarMichael Kearns Yishay MansourJennifer Wortman TexPoint fonts used in EMF. Read the TexPoint manual.

Motivation

Expert algorithms attempt to control regret to the return of the best expert

Regret to the average return? Same bound! Weak???

EW: wi1=1, wit=wi,t-1e git , pit=wit/Wt, Wt = i wit

E1: 1 0 1 0 1 0 1 0 1 0 …E2: 0 1 0 1 0 1 0 1 0 1 …

GA,T=T/2-cT1/2

GT+ = GT

- = GT0 = T/2

RT+ · cT1/2, RT

0· c T1/2

Page 3: Regret to the Best vs. Regret to the Average Eyal Even-DarMichael Kearns Yishay MansourJennifer Wortman TexPoint fonts used in EMF. Read the TexPoint manual.

Notation - gains

git2 [0,1] - gains

g=(git) - sequence of gains

GiT(g)= t=1T git - cumulated gains

G0T(g)=(i GiT(g))/N - average gain

G-T(g)=mini GiT(g) - worst gain

G+T(g)=maxi GiT(g) - best gain

GDT(g)=i Di GiT(g) - weighted avg. gain

Page 4: Regret to the Best vs. Regret to the Average Eyal Even-DarMichael Kearns Yishay MansourJennifer Wortman TexPoint fonts used in EMF. Read the TexPoint manual.

Notation - algorithms

wit – unnormalized weights

pit=wit/Wt, – normalized weightsWt = i wit

gA,t=i pit git – gain of A

GAT(g)= t gA,t – cumulated gain of A

Page 5: Regret to the Best vs. Regret to the Average Eyal Even-DarMichael Kearns Yishay MansourJennifer Wortman TexPoint fonts used in EMF. Read the TexPoint manual.

Notation - regret

regret to the.. R+

T(g) = (G+T(g) – GA,T(g)) Ç 1 – best

R-T(g) = (G-

T(g) – GA,T(g)) Ç 1 – worst

R0T(g) = (G0

T(g) – GA,T(g)) Ç 1 – avg

RDT(g) = (GD

T(g) – GA,T(g)) Ç 1 – dist.

Page 6: Regret to the Best vs. Regret to the Average Eyal Even-DarMichael Kearns Yishay MansourJennifer Wortman TexPoint fonts used in EMF. Read the TexPoint manual.

Goal

Algorithm A is “nice” if .. R+

A,T · O(T1/2)

R0A,T · 1

Program: Examine existing algorithms (“difference

algorithms”) – lower bound Show “nice” algorithms Show that no substantial further improvement is

possible

Page 7: Regret to the Best vs. Regret to the Average Eyal Even-DarMichael Kearns Yishay MansourJennifer Wortman TexPoint fonts used in EMF. Read the TexPoint manual.

“Difference” algorithms

Def:A is a difference algorithm if for N=2, git2 {0,1}, p1t = f(dt), p2t = 1-f(dt), dt = G1t-G2t

Examples: EW: wit = e Git

FPL: Choose argmaxi ( Git+Zit )

Prod: wit = s (1+ gis) = (1+)Git

Page 8: Regret to the Best vs. Regret to the Average Eyal Even-DarMichael Kearns Yishay MansourJennifer Wortman TexPoint fonts used in EMF. Read the TexPoint manual.

A lower bound for difference algorithms

Theorem:If A is a difference algorithm then there exist some series, g, g’ (tuned to A), such that

R+AT (g) R0

AT (g’) ¸ R+AT (g) R-

AT (g’) = (T)

For R+AT = maxg R+

AT(g), R-AT = maxg R-

AT(g),

R0AT = maxg R0

AT(g),

R+AT R0

AT ¸ R+AT R-

AT = (T)

Page 9: Regret to the Best vs. Regret to the Average Eyal Even-DarMichael Kearns Yishay MansourJennifer Wortman TexPoint fonts used in EMF. Read the TexPoint manual.

Proof

Assume T is even, p11 · ½

: first time t when p1t¸ 2/3 ) R+AT(g) ¸ /3

9 2 {2,3,..,} s.t. p1-p1-1 ¸ 1/(6)

1 1 1 1 1 1 1 1 1 1 1 1 1 1 …0 0 0 0 0 0 0 0 0 0 0 0 0 0 …

g:

Page 10: Regret to the Best vs. Regret to the Average Eyal Even-DarMichael Kearns Yishay MansourJennifer Wortman TexPoint fonts used in EMF. Read the TexPoint manual.

Proof/2 p1-p1-1 ¸ 1/(6)

G+T=G-

T=G0=T/2

GAT(g’)· + (T-2)/2 (1-1/(6)) R-

AT(g’) ¸ (T-2)/(12) ) R+

AT(g)R-AT(g’)¸ (T-2)/36

1 1 1 1 1 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1 1 1 1 1 1 1g’:

p1,t=p1,

p1,t+1=p1,-1

Gain: · 1-1/(6)p1t=p1,T-t

Gain: p1t+1-p1t=1

Page 11: Regret to the Best vs. Regret to the Average Eyal Even-DarMichael Kearns Yishay MansourJennifer Wortman TexPoint fonts used in EMF. Read the TexPoint manual.

Tightness

We know that for difference algorithms

R+AT R0

AT ¸ R+AT R-

AT = (T) Can a (difference) algorithm achieve this? Theorem: EW=EW(), with appropriately

tuned =(), 0· · 1/2 has

R+EW,T· T1/2+ (1+ln N)

R0EW,T· T1/2-

Page 12: Regret to the Best vs. Regret to the Average Eyal Even-DarMichael Kearns Yishay MansourJennifer Wortman TexPoint fonts used in EMF. Read the TexPoint manual.

Breaking the frontier

What’s wrong with the difference algorithms? They are designed to find the best expert with

low regret (fast) ..they don’t pay attention to the average gain

and how it compares with the best gain

Page 13: Regret to the Best vs. Regret to the Average Eyal Even-DarMichael Kearns Yishay MansourJennifer Wortman TexPoint fonts used in EMF. Read the TexPoint manual.

BestWorst(A)

G+T-G-

T: the spread of cumulated gain Idea: Stay with the average, until the spread

becomes large. Then switch to learning (using algorithm A).

When the spread is large enough, G0

T=GBW(A),T À G-T

) “Nothing” to loose Spread threshold: NR; where R=RT,N is a

bound on the regret of A.

Page 14: Regret to the Best vs. Regret to the Average Eyal Even-DarMichael Kearns Yishay MansourJennifer Wortman TexPoint fonts used in EMF. Read the TexPoint manual.

BestWorst(A)

Theorem: R+BW(A),T = O(NR), GBW(A),T¸ G-{T}

Proof:At the time of switch, GBW(A) ¸ (G++ (N-1)+G-)/N. Since G+¸ G-+NR,

GBW(A)¸ G- + R.

Page 15: Regret to the Best vs. Regret to the Average Eyal Even-DarMichael Kearns Yishay MansourJennifer Wortman TexPoint fonts used in EMF. Read the TexPoint manual.

PhasedAgression(A,R,D)

for k=1:log2(R) do=2k-1/RA.reset(); s:=0 // local time, new phase

while (G+s-GD

s<2R) do

qs := A.getNormedWeights( gs-1 )

ps := qs + (1-) Dend

endA.reset()run A until time T

Page 16: Regret to the Best vs. Regret to the Average Eyal Even-DarMichael Kearns Yishay MansourJennifer Wortman TexPoint fonts used in EMF. Read the TexPoint manual.

PA(A,R,D) – Theorem

Theorem:Let A be any algorithm with regret R = RT,N to the best expert, D any distribution.Then for PA=PA(A,R,D),

R+PA,T· 2R(log R+1)

RDPA,T· 1

Page 17: Regret to the Best vs. Regret to the Average Eyal Even-DarMichael Kearns Yishay MansourJennifer Wortman TexPoint fonts used in EMF. Read the TexPoint manual.

Proof Consider local time s during phase k. D and A share the gains & the regret

G+s-GPA,s < 2k-1/R£ R + (1-2k-1/R) £ 2R < 2R

GDs-GPA,s· 2k-1/R £ R =2k-1

What happens at the end of the phase?

GPA,s-GD,s ¸ 2k-1/R £ (G+

s-R-GDs)

¸ 2k-1/R £ (G+s-GD

s-R+GDsGD

s)¸ 2k-1/R £ R = 2k-1.

What if PA ends in phase k at time T:

G+T-GPA,T · 2R k · 2R (log R + 1)

GDT-GPA,T· 2k-1 - j=1

k-1 2j-1= 2k-1(2k-1-1)=1

Page 18: Regret to the Best vs. Regret to the Average Eyal Even-DarMichael Kearns Yishay MansourJennifer Wortman TexPoint fonts used in EMF. Read the TexPoint manual.

General lower bounds

Theorem:

R+A,T=O(T1/2) ) R0

A,T=(T1/2)

R+A,T· (Tlog(T))1/2/10 ) R0

A,T=(T), where ¸ 0.02

Compare this with

R+PA,T· 2R(log R+1), RD

PA,T· 1,

where R=(T log N)1/2

Page 19: Regret to the Best vs. Regret to the Average Eyal Even-DarMichael Kearns Yishay MansourJennifer Wortman TexPoint fonts used in EMF. Read the TexPoint manual.

Conclusions

Achieving constant regret to the average is a reasonable goal.

“Classical” algorithms do not have this property, but satisfy R+

AT R0AT ¸ (T).

Modification: Learn only when it makes sense; ie. when the best is much better than the average

PhasedAgression: Optimal tradeoff Can we remove dependence on T?