Top Banner
A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015 Simple regret for infinitely many armed bandit Alexandra Carpentier* and Michal Valko** * StatsLab, University of Cambridge and ** Sequel team, INRIA Lille Nord-Europe ICML 2015, July 7th 2015 A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015
35

Alexandra Carpentier* and Michal Valko**researchers.lille.inria.fr/~valko/hp/serve.php?... · A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015 The bandit problem considered

May 11, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Alexandra Carpentier* and Michal Valko**researchers.lille.inria.fr/~valko/hp/serve.php?... · A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015 The bandit problem considered

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

Simple regret for infinitely many armed bandit

Alexandra Carpentier* and Michal Valko**

* StatsLab, University of Cambridgeand

** Sequel team, INRIA Lille Nord-Europe

ICML 2015, July 7th 2015

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

Page 2: Alexandra Carpentier* and Michal Valko**researchers.lille.inria.fr/~valko/hp/serve.php?... · A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015 The bandit problem considered

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

The bandit problem considered

Simple regret for infinitelymany armed bandit

I Mean reservoir distr. F boundedby µ∗

I Limited sampling resources n

At time t ≤ n one can either

I sample a new arm νKt from thereservoir distr. with meanµKt ∼ F , and set It = Kt,

I or choose an arm It among theKt−1 observed arms {νk}k≤Kt−1 ,

and then collect Xt ∼ νkt

Objective: after n rounds, return anarm k whose mean µk is as large aspossible. Minimize the simple regret

rn = µ∗ − µk,

where µ∗ is the right end point of1− F .

At time t = 0:

1 - Mean reservoir distribution

µ*

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

Page 3: Alexandra Carpentier* and Michal Valko**researchers.lille.inria.fr/~valko/hp/serve.php?... · A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015 The bandit problem considered

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

The bandit problem considered

Simple regret for infinitelymany armed bandit

I Mean reservoir distr. F boundedby µ∗

I Limited sampling resources n

At time t ≤ n one can either

I sample a new arm νKt from thereservoir distr. with meanµKt ∼ F , and set It = Kt,

I or choose an arm It among theKt−1 observed arms {νk}k≤Kt−1 ,

and then collect Xt ∼ νkt

Objective: after n rounds, return anarm k whose mean µk is as large aspossible. Minimize the simple regret

rn = µ∗ − µk,

where µ∗ is the right end point of1− F .

At time t = 1:

1 - Mean reservoir distribution

Arm 1

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

Page 4: Alexandra Carpentier* and Michal Valko**researchers.lille.inria.fr/~valko/hp/serve.php?... · A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015 The bandit problem considered

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

The bandit problem considered

Simple regret for infinitelymany armed bandit

I Mean reservoir distr. F boundedby µ∗

I Limited sampling resources n

At time t ≤ n one can either

I sample a new arm νKt from thereservoir distr. with meanµKt ∼ F , and set It = Kt,

I or choose an arm It among theKt−1 observed arms {νk}k≤Kt−1 ,

and then collect Xt ∼ νkt

Objective: after n rounds, return anarm k whose mean µk is as large aspossible. Minimize the simple regret

rn = µ∗ − µk,

where µ∗ is the right end point of1− F .

At time t = 1:

1 - Mean reservoir distribution

Arm 1

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

Page 5: Alexandra Carpentier* and Michal Valko**researchers.lille.inria.fr/~valko/hp/serve.php?... · A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015 The bandit problem considered

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

The bandit problem considered

Simple regret for infinitelymany armed bandit

I Mean reservoir distr. F boundedby µ∗

I Limited sampling resources n

At time t ≤ n one can either

I sample a new arm νKt from thereservoir distr. with meanµKt ∼ F , and set It = Kt,

I or choose an arm It among theKt−1 observed arms {νk}k≤Kt−1 ,

and then collect Xt ∼ νkt

Objective: after n rounds, return anarm k whose mean µk is as large aspossible. Minimize the simple regret

rn = µ∗ − µk,

where µ∗ is the right end point of1− F .

At time t = 2:

1 - Mean reservoir distribution

Arm 1 Arm 2

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

Page 6: Alexandra Carpentier* and Michal Valko**researchers.lille.inria.fr/~valko/hp/serve.php?... · A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015 The bandit problem considered

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

The bandit problem considered

Simple regret for infinitelymany armed bandit

I Mean reservoir distr. F boundedby µ∗

I Limited sampling resources n

At time t ≤ n one can either

I sample a new arm νKt from thereservoir distr. with meanµKt ∼ F , and set It = Kt,

I or choose an arm It among theKt−1 observed arms {νk}k≤Kt−1 ,

and then collect Xt ∼ νkt

Objective: after n rounds, return anarm k whose mean µk is as large aspossible. Minimize the simple regret

rn = µ∗ − µk,

where µ∗ is the right end point of1− F .

At time t = 2:

1 - Mean reservoir distribution

Arm 1 Arm 2

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

Page 7: Alexandra Carpentier* and Michal Valko**researchers.lille.inria.fr/~valko/hp/serve.php?... · A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015 The bandit problem considered

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

The bandit problem considered

Simple regret for infinitelymany armed bandit

I Mean reservoir distr. F boundedby µ∗

I Limited sampling resources n

At time t ≤ n one can either

I sample a new arm νKt from thereservoir distr. with meanµKt ∼ F , and set It = Kt,

I or choose an arm It among theKt−1 observed arms {νk}k≤Kt−1 ,

and then collect Xt ∼ νkt

Objective: after n rounds, return anarm k whose mean µk is as large aspossible. Minimize the simple regret

rn = µ∗ − µk,

where µ∗ is the right end point of1− F .

At time t = 3:

1 - Mean reservoir distribution

Arm 1 Arm 2 Arm 3

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

Page 8: Alexandra Carpentier* and Michal Valko**researchers.lille.inria.fr/~valko/hp/serve.php?... · A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015 The bandit problem considered

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

The bandit problem considered

Simple regret for infinitelymany armed bandit

I Mean reservoir distr. F boundedby µ∗

I Limited sampling resources n

At time t ≤ n one can either

I sample a new arm νKt from thereservoir distr. with meanµKt ∼ F , and set It = Kt,

I or choose an arm It among theKt−1 observed arms {νk}k≤Kt−1 ,

and then collect Xt ∼ νkt

Objective: after n rounds, return anarm k whose mean µk is as large aspossible. Minimize the simple regret

rn = µ∗ − µk,

where µ∗ is the right end point of1− F .

At time t = 3:

1 - Mean reservoir distribution

Arm 1 Arm 2 Arm 3

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

Page 9: Alexandra Carpentier* and Michal Valko**researchers.lille.inria.fr/~valko/hp/serve.php?... · A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015 The bandit problem considered

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

The bandit problem considered

Simple regret for infinitelymany armed bandit

I Mean reservoir distr. F boundedby µ∗

I Limited sampling resources n

At time t ≤ n one can either

I sample a new arm νKt from thereservoir distr. with meanµKt ∼ F , and set It = Kt,

I or choose an arm It among theKt−1 observed arms {νk}k≤Kt−1 ,

and then collect Xt ∼ νkt

Objective: after n rounds, return anarm k whose mean µk is as large aspossible. Minimize the simple regret

rn = µ∗ − µk,

where µ∗ is the right end point of1− F .

At time t = 4:

1 - Mean reservoir distribution

Arm 1 Arm 2 Arm 3

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

Page 10: Alexandra Carpentier* and Michal Valko**researchers.lille.inria.fr/~valko/hp/serve.php?... · A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015 The bandit problem considered

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

The bandit problem considered

Simple regret for infinitelymany armed bandit

I Mean reservoir distr. F boundedby µ∗

I Limited sampling resources n

At time t ≤ n one can either

I sample a new arm νKt from thereservoir distr. with meanµKt ∼ F , and set It = Kt,

I or choose an arm It among theKt−1 observed arms {νk}k≤Kt−1 ,

and then collect Xt ∼ νkt

Objective: after n rounds, return anarm k whose mean µk is as large aspossible. Minimize the simple regret

rn = µ∗ − µk,

where µ∗ is the right end point of1− F .

At time t = 5:

1 - Mean reservoir distribution

Arm 1 Arm 2 Arm 3

Arm 4

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

Page 11: Alexandra Carpentier* and Michal Valko**researchers.lille.inria.fr/~valko/hp/serve.php?... · A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015 The bandit problem considered

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

The bandit problem considered

Simple regret for infinitelymany armed bandit

I Mean reservoir distr. F boundedby µ∗

I Limited sampling resources n

At time t ≤ n one can either

I sample a new arm νKt from thereservoir distr. with meanµKt ∼ F , and set It = Kt,

I or choose an arm It among theKt−1 observed arms {νk}k≤Kt−1 ,

and then collect Xt ∼ νkt

Objective: after n rounds, return anarm k whose mean µk is as large aspossible. Minimize the simple regret

rn = µ∗ − µk,

where µ∗ is the right end point of1− F .

At time t = 6:

1 - Mean reservoir distribution

Arm 1 Arm 2 Arm 3

Arm 4

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

Page 12: Alexandra Carpentier* and Michal Valko**researchers.lille.inria.fr/~valko/hp/serve.php?... · A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015 The bandit problem considered

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

The bandit problem considered

Simple regret for infinitelymany armed bandit

I Mean reservoir distr. F boundedby µ∗

I Limited sampling resources n

At time t ≤ n one can either

I sample a new arm νKt from thereservoir distr. with meanµKt ∼ F , and set It = Kt,

I or choose an arm It among theKt−1 observed arms {νk}k≤Kt−1 ,

and then collect Xt ∼ νkt

Objective: after n rounds, return anarm k whose mean µk is as large aspossible. Minimize the simple regret

rn = µ∗ − µk,

where µ∗ is the right end point of1− F .

At time t = 7:

1 - Mean reservoir distribution

Arm 1 Arm 2 Arm 3

Arm 4 Arm 5

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

Page 13: Alexandra Carpentier* and Michal Valko**researchers.lille.inria.fr/~valko/hp/serve.php?... · A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015 The bandit problem considered

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

The bandit problem considered

Simple regret for infinitelymany armed bandit

I Mean reservoir distr. F boundedby µ∗

I Limited sampling resources n

At time t ≤ n one can either

I sample a new arm νKt from thereservoir distr. with meanµKt ∼ F , and set It = Kt,

I or choose an arm It among theKt−1 observed arms {νk}k≤Kt−1 ,

and then collect Xt ∼ νkt

Objective: after n rounds, return anarm k whose mean µk is as large aspossible. Minimize the simple regret

rn = µ∗ − µk,

where µ∗ is the right end point of1− F .

At time t = 8:

1 - Mean reservoir distribution

Arm 1 Arm 2 Arm 3

Arm 4 Arm 5

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

Page 14: Alexandra Carpentier* and Michal Valko**researchers.lille.inria.fr/~valko/hp/serve.php?... · A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015 The bandit problem considered

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

The bandit problem considered

Simple regret for infinitelymany armed bandit

I Mean reservoir distr. F boundedby µ∗

I Limited sampling resources n

At time t ≤ n one can either

I sample a new arm νKt from thereservoir distr. with meanµKt ∼ F , and set It = Kt,

I or choose an arm It among theKt−1 observed arms {νk}k≤Kt−1 ,

and then collect Xt ∼ νkt

Objective: after n rounds, return anarm k whose mean µk is as large aspossible. Minimize the simple regret

rn = µ∗ − µk,

where µ∗ is the right end point of1− F .

At time t = 9:

1 - Mean reservoir distribution

Arm 1 Arm 2 Arm 3

Arm 4 Arm 5

Arm 6

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

Page 15: Alexandra Carpentier* and Michal Valko**researchers.lille.inria.fr/~valko/hp/serve.php?... · A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015 The bandit problem considered

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

The bandit problem considered

Simple regret for infinitelymany armed bandit

I Mean reservoir distr. F boundedby µ∗

I Limited sampling resources n

At time t ≤ n one can either

I sample a new arm νKt from thereservoir distr. with meanµKt ∼ F , and set It = Kt,

I or choose an arm It among theKt−1 observed arms {νk}k≤Kt−1 ,

and then collect Xt ∼ νkt

Objective: after n rounds, return anarm k whose mean µk is as large aspossible. Minimize the simple regret

rn = µ∗ − µk,

where µ∗ is the right end point of1− F .

At time t...:

1 - Mean reservoir distribution

Arm 1 Arm 2 Arm 3

Arm 4 Arm 5

Arm 6

etc...

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

Page 16: Alexandra Carpentier* and Michal Valko**researchers.lille.inria.fr/~valko/hp/serve.php?... · A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015 The bandit problem considered

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

The bandit problem considered

Simple regret for infinitelymany armed bandit

I Mean reservoir distr. F boundedby µ∗

I Limited sampling resources n

At time t ≤ n one can either

I sample a new arm νKt from thereservoir distr. with meanµKt ∼ F , and set It = Kt,

I or choose an arm It among theKt−1 observed arms {νk}k≤Kt−1 ,

and then collect Xt ∼ νkt

Objective: after n rounds, return anarm k whose mean µk is as large aspossible. Minimize the simple regret

rn = µ∗ − µk,

where µ∗ is the right end point of1− F .

At time t = n:

1 - Mean reservoir distribution

Arm 1 Arm 2 Arm 3

Arm 4 Arm 5

Arm 6

Arm returned

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

Page 17: Alexandra Carpentier* and Michal Valko**researchers.lille.inria.fr/~valko/hp/serve.php?... · A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015 The bandit problem considered

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

The bandit problem considered

Simple regret for infinitelymany armed bandit

I Mean reservoir distr. F boundedby µ∗

I Limited sampling resources n

At time t ≤ n one can either

I sample a new arm νKt from thereservoir distr. with meanµKt ∼ F , and set It = Kt,

I or choose an arm It among theKt−1 observed arms {νk}k≤Kt−1 ,

and then collect Xt ∼ νkt

Objective: after n rounds, return anarm k whose mean µk is as large aspossible. Minimize the simple regret

rn = µ∗ − µk,

where µ∗ is the right end point of1− F .

Double exploration dilemmahere: Allocation both to (i) learn thecharacteristics of the arm reservoirdistr. (meta-exploration) and (ii)learn the characteristics of the arms(exploration).

Main questions

How many arms should be sampledfrom the arm reservoir distribution?How aggressively should these armsbe explored?

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

Page 18: Alexandra Carpentier* and Michal Valko**researchers.lille.inria.fr/~valko/hp/serve.php?... · A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015 The bandit problem considered

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

Applications

Simple-regret bandit problemswith a large number of arms orwith a small budget :

I Selection of a goodbiomarker

I Special case of featureselection where one wants toselect a single feature[Hauskrecht et al., 2006]

...

Continuous set of arms Finite but large set of arms

<=>

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

Page 19: Alexandra Carpentier* and Michal Valko**researchers.lille.inria.fr/~valko/hp/serve.php?... · A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015 The bandit problem considered

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

Literature review

I Simple regret bandits:[Even-Dar et al., 2006], [Audibertet al., 2010], [Kalyanakrishnan etal., 2012], [Kaufmann et al.,2013], [Karnin et al., 2013],[Gabillon et al., 2012], and[Jamieson et al., 2014]

I Infinitely many armedbandits with cumulativeregret: [Berry et al., 1997],[Wang et al., 2008], and [Bonaldand Proutiere, 2013].

I Infinitely many armedsettings with armstructure: [Dani et al., 2008],[Kleinberg et al.,2008], [Munos,2014], [Azar et al., 2014]

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

Page 20: Alexandra Carpentier* and Michal Valko**researchers.lille.inria.fr/~valko/hp/serve.php?... · A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015 The bandit problem considered

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

Literature review

I Simple regret bandits:[Even-Dar et al., 2006], [Audibertet al., 2010], [Kalyanakrishnan etal., 2012], [Kaufmann et al.,2013], [Karnin et al., 2013],[Gabillon et al., 2012], and[Jamieson et al., 2014]

I Infinitely many armedbandits with cumulativeregret: [Berry et al., 1997],[Wang et al., 2008], and [Bonaldand Proutiere, 2013]

I Infinitely many armedsettings with armstructure: [Dani et al., 2008],[Kleinberg et al.,2008], [Munos,2014], [Azar et al., 2014]

Results:

I offer strategies that providean optimal (or ε-optimal) armwith high probability.

I provide stopping rule basedstrategies that sample untilthey can provide an ε-optimalarm.

But:

I Fixed number of arms that issmaller than the budget n(importance of trying eacharm).

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

Page 21: Alexandra Carpentier* and Michal Valko**researchers.lille.inria.fr/~valko/hp/serve.php?... · A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015 The bandit problem considered

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

Literature review

I Simple regret bandits:[Even-Dar et al., 2006], [Audibertet al., 2010], [Kalyanakrishnan etal., 2012], [Kaufmann et al.,2013], [Karnin et al., 2013],[Gabillon et al., 2012], and[Jamieson et al., 2014]

I Infinitely many armedbandits with cumulativeregret: [Berry et al., 1997],[Wang et al., 2008], and [Bonaldand Proutiere, 2013]

I Infinitely many armedsettings with armstructure: [Dani et al., 2008],[Kleinberg et al.,2008], [Munos,2014], [Azar et al., 2014]

Results:

I Provide optimal strategiesunder shape constraint on Fand boundedness of the armdistributions.

But:

I Cumulative regret.

Note

We will discuss this in detailssoon...

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

Page 22: Alexandra Carpentier* and Michal Valko**researchers.lille.inria.fr/~valko/hp/serve.php?... · A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015 The bandit problem considered

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

Literature review

I Simple regret bandits:[Even-Dar et al., 2006], [Audibertet al., 2010], [Kalyanakrishnan etal., 2012], [Kaufmann et al.,2013], [Karnin et al., 2013],[Gabillon et al., 2012], and[Jamieson et al., 2014]

I Infinitely many armedbandits with cumulativeregret: [Berry et al., 1997],[Wang et al., 2008], and [Bonaldand Proutiere, 2013]

I Infinitely many armedsettings with armstructure: [Dani et al., 2008],[Kleinberg et al.,2008], [Munos,2014], [Azar et al., 2014]

Results:

I Provide optimal strategies forspecific structured bandits.

But:

I Structure or contextualinformation needed.

Inf. many armed bandit

No control on the position of where one samples from thereservoir distr.

Optimisation setting

One selects where one samplesbased on proximity to good points

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

Page 23: Alexandra Carpentier* and Michal Valko**researchers.lille.inria.fr/~valko/hp/serve.php?... · A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015 The bandit problem considered

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

Back on infinitely many armedbandits literature

IMAB with cumulativeregret: [Berry et al., 1997], [Wang etal., 2008], and [Bonald and Proutiere,2013].

Cumulative regret:

RCn = nµ∗ −∑t≤n

Xt.

Crucial assumption:

Pµ∼F (µ∗ − µ ≥ ε) ≈ εβ ,

i.e. 1− F is β-regularly varying inµ∗.

µ*

µ*

large β

Small β

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

Page 24: Alexandra Carpentier* and Michal Valko**researchers.lille.inria.fr/~valko/hp/serve.php?... · A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015 The bandit problem considered

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

Back on infinitely many armedbandits literature

IMAB with cumulativeregret: [Berry et al., 1997], [Wang etal., 2008], and [Bonald and Proutiere,2013].

Cumulative regret:

RCn = nµ∗ −∑t≤n

Xt.

Crucial assumption:

Pµ∼F (µ∗ − µ ≥ ε) ≈ εβ ,

i.e. 1− F is β-regularly varying inµ∗.

Requirements: Bounded armdistributions and knowledge of βfor choosing the nb. of arms.

Theorem (Regret bound)

Minimax bound on E(RCn )

O(

max(n

ββ+1 ,√n))

up to log(n).

Special case: If arm distr.bounded by µ∗, different rate.

Theorem (Special regret)

Minimax bound on E(RCn )

O(n

ββ+1

)up to log(n).

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

Page 25: Alexandra Carpentier* and Michal Valko**researchers.lille.inria.fr/~valko/hp/serve.php?... · A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015 The bandit problem considered

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

The simple regret setting and assumptions

Objective:

Minimize the simple regret in the infinitely many armed setting

rn = µ∗ − µk.

Same assumptions as for IMAB with cumulative regret:

I Regularly varying mean reservoir distr. :

Pµ∼F (µ∗ − µ ≥ ε) ≈ εβ

I Distributions from the arms are bounded/sub-Gaussian.

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

Page 26: Alexandra Carpentier* and Michal Valko**researchers.lille.inria.fr/~valko/hp/serve.php?... · A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015 The bandit problem considered

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

Lower bound

The following lower bound holds.

Theorem (CV15)

The expected simple regret E(rn) can be lower bounded as

max(n− 1β , n−1/2

).

Remark: Different bottleneck as for the cumulative regret

E[RCn ] = O(

max(n

ββ+1 ,√n)).

Strategy that attains this bound?

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

Page 27: Alexandra Carpentier* and Michal Valko**researchers.lille.inria.fr/~valko/hp/serve.php?... · A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015 The bandit problem considered

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

The SiRI strategy

Parameters: β,C, δ.Pick Tβ ≈ nmin(β,2)/2 arms from the reservoirPull each of Tβ arms once and set t← Tβ.while t ≤ n do

For any k ≤ Tβ, set

Bk,t ← µk,t + 2

√C

Tk,tlog( nδTk,t

)+

2C

Tk,tlog

(nδ

Tk,t

)Pull Tkt,t times the arm kt that maximizes the Bk,tSet t← t+ Tkt,t.

end whileOutput: Return the most pulled arm k.

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

Page 28: Alexandra Carpentier* and Michal Valko**researchers.lille.inria.fr/~valko/hp/serve.php?... · A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015 The bandit problem considered

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

The SiRI strategy

Parameters: β,C, δ.Pick Tβ ≈ nmin(β,2)/2 arms from the reservoirPull each of Tβ arms once and set t← Tβ.while t ≤ n do

For any k ≤ Tβ, set

Bk,t ← µk,t + 2

√C

Tk,tlog( nδTk,t

)+

2C

Tk,tlog

(nδ

Tk,t

)Pull Tkt,t times the arm kt that maximizes the Bk,tSet t← t+ Tkt,t.

end whileOutput: Return the most pulled arm k.

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

Page 29: Alexandra Carpentier* and Michal Valko**researchers.lille.inria.fr/~valko/hp/serve.php?... · A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015 The bandit problem considered

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

The SiRI strategyParameters: β,C, δ.Pick Tβ ≈ nmin(β,2)/2 arms from the reservoirPull each of Tβ arms once and set t← Tβ.while t ≤ n do

For any k ≤ Tβ, set

Bk,t ← µk,t + 2

√C

Tk,tlog( nδTk,t

)+

2C

Tk,tlog

(nδ

Tk,t

)Pull Tkt,t times the arm kt that maximizes the Bk,tSet t← t+ Tkt,t.

end whileOutput: Return the most pulled arm k.

Remark: SiRI is the combination of a choice of thenumber of arms and a UCB algorithm for cumulativeregret.

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

Page 30: Alexandra Carpentier* and Michal Valko**researchers.lille.inria.fr/~valko/hp/serve.php?... · A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015 The bandit problem considered

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

Upper bound

The following upper bound holds.

Theorem (CV15)

The expected simple regret E(rn) of SiRI can be upper boundedup to log(n) factors as

max(n−1/2, n

− 1β

).

Lower and upper bound match up to log(n) factors (not presentin all cases).

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

Page 31: Alexandra Carpentier* and Michal Valko**researchers.lille.inria.fr/~valko/hp/serve.php?... · A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015 The bandit problem considered

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

Extensions

In the paper we present three main extensions:

I anytime SiRI.

I Distributions bounded by µ∗: a Bernstein modificationof SiRI has Minimax optimal simple regret

max(n−1, n−1/β

).

I Unknown β: possible to estimate β using arguments fromextreme value theory. Simple regret rate is the same up tolog(n) factors. Same could apply to cumulative regret.

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

Page 32: Alexandra Carpentier* and Michal Valko**researchers.lille.inria.fr/~valko/hp/serve.php?... · A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015 The bandit problem considered

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

Recap on the rates (up to log(n))

Minimax optimal rates

Cumulative regret max(n

ββ+1 ,√n)

Cum. regret with arm bound µ∗ nββ+1

Simple regret max(n−1/β , n−1/2

)Simple regret with arm bound µ∗ max

(n−1/β , n−1

)Remark: Different bottleneck as for the cumulative regret.

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

Page 33: Alexandra Carpentier* and Michal Valko**researchers.lille.inria.fr/~valko/hp/serve.php?... · A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015 The bandit problem considered

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

Simulations

Comparison on synthetic data of SiRI with:

I lil’UCB [Jamieson et. al, 2014], to which the optimal oraclenumber of arms is given (algorithm for simple regret withfinitely many arms)

I UCB-F of [Wang et. al, 2008] (algorithm for cumulativeregret and infinitely many arms)

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

Page 34: Alexandra Carpentier* and Michal Valko**researchers.lille.inria.fr/~valko/hp/serve.php?... · A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015 The bandit problem considered

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

time t

sim

ple

re

gre

t

Beta(1,1) reservoir ~ 100 simulations

1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

SiRIUCBFlilUCB

time t

sim

ple

re

gre

t

Beta(1,2) reservoir ~ 100 simulations

1000 2000 3000 4000 5000 6000 7000 8000 9000 100000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

SiRIUCBFlilUCB

time t

sim

ple

re

gre

t

Beta(1,3) reservoir ~ 100 simulations

1000 2000 3000 4000 5000 6000 7000 8000 9000 100000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

SiRIUCBFlilUCB

time t

sim

ple

re

gre

t

Beta(1,1) reservoir ~ 100 simulations

1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

SiRIBetaSiRI

Figure: Comparison on B(1, 1) (UL), B(1, 2) (UR), and B(1, 3) (DL),and unknown β on B(1, 1) (DR)

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

Page 35: Alexandra Carpentier* and Michal Valko**researchers.lille.inria.fr/~valko/hp/serve.php?... · A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015 The bandit problem considered

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015

Conclusion

Minimax optimal solution up to log(n) factors for the simpleregret problem with infinitely many arms. Extensions:

I Unknown β

I Bernstein SiRI with minimax optimal performance whenarm distributions are bounded by µ∗

Open problems:

I Closing the log gaps (some of them are already closed)?

I Heavy tailed mean reservoir distribution?

THANK YOU!

Acknowledgements This work was supported by the French Ministry ofHigher Education and Research and the French National Research Agency(ANR) under project ExTra-Learn n.ANR-14-CE24-0010-01.

A. Carpentier and M. Valko Simple regret for IMAB - ICML 2015