Top Banner
Multi-armed Bandits: Applications to Online Advertising Assaf Zeevi* Graduate School of Business Columbia University *Based on joint work with Denis Saure 1 / 26
109

Multi-armed Bandits: Applications to Online Advertising

Feb 09, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Multi-armed Bandits: Applications to Online Advertising

Multi-armed Bandits: Applications to

Online Advertising

Assaf Zeevi*

Graduate School of Business

Columbia University

*Based on joint work with Denis Saure

1 / 26

Page 2: Multi-armed Bandits: Applications to Online Advertising

Online Advertisement: Industry Overview

source: Interactive Advertisement Bureau Internet Advertisement Revenue Report (by PricewaterhouseCoopers)

2 / 26

Page 3: Multi-armed Bandits: Applications to Online Advertising

Online Advertisement: Industry Overview

source: Interactive Advertisement Bureau Internet Advertisement Revenue Report (by PricewaterhouseCoopers)

2 / 26

Page 4: Multi-armed Bandits: Applications to Online Advertising

Online Advertisement: Industry Overview

source: Interactive Advertisement Bureau Internet Advertisement Revenue Report (by PricewaterhouseCoopers)

2 / 26

Page 5: Multi-armed Bandits: Applications to Online Advertising

Customization in Online Advertisement

Ad-mix

Profit maximization

ad pool

user information

ad/user performance

pricing model. . .

1. cost per mille (CPM)

2. cost per click (CPC)

3 / 26

Page 6: Multi-armed Bandits: Applications to Online Advertising

Customization in Online Advertisement

Ad-mix

Profit maximization

ad pool

user information

ad/user performance

pricing model. . .

1. cost per mille (CPM)

2. cost per click (CPC)

3 / 26

Page 7: Multi-armed Bandits: Applications to Online Advertising

Customization in Online Advertisement

Ad-mix

Profit maximization

ad pool

user information

ad/user performance

pricing model. . .

1. cost per mille (CPM)

2. cost per click (CPC)

3 / 26

Page 8: Multi-armed Bandits: Applications to Online Advertising

Customization in Online Advertisement

Ad-mix

Profit maximization

ad pool

user information

ad/user performance

pricing model. . .

1. cost per mille (CPM)

2. cost per click (CPC)

3 / 26

Page 9: Multi-armed Bandits: Applications to Online Advertising

Customization in Online Advertisement

Ad-mix

Profit maximization

ad pool

user information

ad/user performance

pricing model. . .

1. cost per mille (CPM)

2. cost per click (CPC)

3 / 26

Page 10: Multi-armed Bandits: Applications to Online Advertising

Customization in Online Advertisement

Ad-mix

Profit maximization

ad pool

user information

ad/user performance

pricing model. . .

1. cost per mille (CPM)

2. cost per click (CPC)

3 / 26

Page 11: Multi-armed Bandits: Applications to Online Advertising

Customization in Online Advertisement

Ad-mix

Profit maximization

ad pool

user information

ad/user performance

pricing model. . .

1. cost per mille (CPM)

2. cost per click (CPC)

3 / 26

Page 12: Multi-armed Bandits: Applications to Online Advertising

Customization in Online Advertisement

Ad-mix

Profit maximization

ad pool

user information

ad/user performance

pricing model. . .

1. cost per mille (CPM)

2. cost per click (CPC)

3 / 26

Page 13: Multi-armed Bandits: Applications to Online Advertising

Customization in Online Advertisement

Ad-mix

Profit maximization

ad pool

user information

ad/user performance

pricing model. . .

1. cost per mille (CPM)

2. cost per click (CPC)

3 / 26

Page 14: Multi-armed Bandits: Applications to Online Advertising

Customization in Online Advertisement

Ad-mix

Profit maximization

ad pool

user information

ad/user performance

pricing model. . .

1. cost per mille (CPM)

2. cost per click (CPC)

3 / 26

Page 15: Multi-armed Bandits: Applications to Online Advertising

Customization in Online Advertisement

Ad-mix

Profit maximization

ad pool

user information

ad/user performance

pricing model. . .

1. cost per mille (CPM)

2. cost per click (CPC)

3 / 26

Page 16: Multi-armed Bandits: Applications to Online Advertising

Online Advertisement: Pricing Models

source: Interactive Advertisement Bureau Internet Advertisement Revenue Report (by PricewaterhouseCoopers)

4 / 26

Page 17: Multi-armed Bandits: Applications to Online Advertising

Customization in Online Advertisement

anonymity on the Internet

common practiceI internet cookiesI list of categories of interestI adaptive to “behavior”

dog food frisbees squirrels

user informationI behavioral, geographical,

demographical data . . .

July 1993, The New Yorker

5 / 26

Page 18: Multi-armed Bandits: Applications to Online Advertising

Customization in Online Advertisement

anonymity on the Internet

common practiceI internet cookiesI list of categories of interestI adaptive to “behavior”

dog food frisbees squirrels

user informationI behavioral, geographical,

demographical data . . .

July 1993, The New Yorker

5 / 26

Page 19: Multi-armed Bandits: Applications to Online Advertising

Customization in Online Advertisement

anonymity on the Internet

common practiceI internet cookiesI list of categories of interestI adaptive to “behavior”

dog food frisbees squirrels

user informationI behavioral, geographical,

demographical data . . .

July 1993, The New Yorker

5 / 26

Page 20: Multi-armed Bandits: Applications to Online Advertising

Customization in Online Advertisement

anonymity on the Internet

common practiceI internet cookiesI list of categories of interestI adaptive to “behavior”

dog food frisbees squirrels

user informationI behavioral, geographical,

demographical data . . .

July 1993, The New Yorker

5 / 26

Page 21: Multi-armed Bandits: Applications to Online Advertising

Customization in Online Advertisement

anonymity on the Internet

common practiceI internet cookiesI list of categories of interestI adaptive to “behavior”

dog food frisbees squirrels

user informationI behavioral, geographical,

demographical data . . .July 1993, The New Yorker

5 / 26

Page 22: Multi-armed Bandits: Applications to Online Advertising

Online Advertisement: Salient Features

Advent of the Internet has transformed consumer experience of

advertisements and media. . .

dynamic/customized advertisement display

one-to-one interaction with users. . .

contracts (CPC) are increasingly performance-based

customization to individual users exploiting side information

dynamic decision making to balance learning and profits

6 / 26

Page 23: Multi-armed Bandits: Applications to Online Advertising

Road Map

Focus: Publisher’s display decision in dynamic environment

I. Customization in online advertisementI publisher’s problem definitionI need for dynamic learning of ad performance

II. Stylized model for display-based online advertisementI limit of achievable performanceI policy construction and guarantees

III. Insights and takeaway messages

7 / 26

Page 24: Multi-armed Bandits: Applications to Online Advertising

Towards a Problem Formulation. . .

Publisher’s decision: ad/user performance

1. direct revenue: cost per click (cpc)

2. click probability:

user profile + ad mix→ click probability

use historical data on profiles, display and click

natural approach... fit a choice model

P {user clicks on ad} = f(ad, user profile, ad mix, β)

model parameters

additional considerations: display capacity. . .

8 / 26

Page 25: Multi-armed Bandits: Applications to Online Advertising

Towards a Problem Formulation. . .

Publisher’s decision: ad/user performance

1. direct revenue: cost per click (cpc)

2. click probability:

user profile + ad mix→ click probability

use historical data on profiles, display and click

natural approach... fit a choice model

P {user clicks on ad} = f(ad, user profile, ad mix, β)

model parameters

additional considerations: display capacity. . .

8 / 26

Page 26: Multi-armed Bandits: Applications to Online Advertising

Towards a Problem Formulation. . .

Publisher’s decision: ad/user performance

1. direct revenue: cost per click (cpc)

2. click probability:

user profile + ad mix→ click probability

use historical data on profiles, display and click

natural approach... fit a choice model

P {user clicks on ad} = f(ad, user profile, ad mix, β)

model parameters

additional considerations: display capacity. . .

8 / 26

Page 27: Multi-armed Bandits: Applications to Online Advertising

Towards a Problem Formulation. . .

Publisher’s decision: ad/user performance

1. direct revenue: cost per click (cpc)

2. click probability:

user profile + ad mix→ click probability

use historical data on profiles, display and click

natural approach... fit a choice model

P {user clicks on ad} = f(ad, user profile, ad mix, β)

model parameters

additional considerations: display capacity. . .

8 / 26

Page 28: Multi-armed Bandits: Applications to Online Advertising

Towards a Problem Formulation. . .

Publisher’s objective: ideally. . .

maximize expected revenue from interaction with users

maxad mix

∑ad in mix

cpc(ad) · f(ad, user profile, ad mix, β)

. . . dynamic environment

new contracts: limited or no history of past interaction. . .

contract expiration . . .

estimation accuracy vs profit maximization

9 / 26

Page 29: Multi-armed Bandits: Applications to Online Advertising

Towards a Problem Formulation. . .

Publisher’s objective: ideally. . .

maximize expected revenue from interaction with users

maxad mix

∑ad in mix

cpc(ad) · f(ad, user profile, ad mix, β)

. . . dynamic environment

new contracts: limited or no history of past interaction. . .

contract expiration . . .

estimation accuracy vs profit maximization

9 / 26

Page 30: Multi-armed Bandits: Applications to Online Advertising

Towards a Problem Formulation. . .

Publisher’s objective: ideally. . .

maximize expected revenue from interaction with users

maxad mix

∑ad in mix

cpc(ad) · f(ad, user profile, ad mix, β)

. . . dynamic environment

new contracts: limited or no history of past interaction. . .

contract expiration . . .

estimation accuracy vs profit maximization

9 / 26

Page 31: Multi-armed Bandits: Applications to Online Advertising

Towards a Problem Formulation. . .

Publisher’s objective: ideally. . .

maximize expected revenue from interaction with users

maxad mix

∑ad in mix

cpc(ad) · f(ad, user profile, ad mix, β)

. . . dynamic environment

new contracts: limited or no history of past interaction. . .

contract expiration . . .

estimation accuracy vs profit maximization

9 / 26

Page 32: Multi-armed Bandits: Applications to Online Advertising

Towards a Problem Formulation. . .

Publisher’s objective: ideally. . .

maximize expected revenue from interaction with users

maxad mix

∑ad in mix

cpc(ad) · f(ad, user profile, ad mix, β)

. . . dynamic environment

new contracts: limited or no history of past interaction. . .

contract expiration . . .

estimation accuracy vs profit maximization

9 / 26

Page 33: Multi-armed Bandits: Applications to Online Advertising

Related Literature

Learning approach to interactive marketing

Gooley and Lattin (2000)I message customization

Bertsimas and Mersereau (2007)I solve for each segment separately

Multi Armed Bandit (MAB) Literature

Slivkins (2009), Lu et al (2009)I side information: MAB in metric spaces

10 / 26

Page 34: Multi-armed Bandits: Applications to Online Advertising

Related Literature

Learning approach to interactive marketing

Gooley and Lattin (2000)I message customization

Bertsimas and Mersereau (2007)I solve for each segment separately

Multi Armed Bandit (MAB) Literature

Slivkins (2009), Lu et al (2009)I side information: MAB in metric spaces

10 / 26

Page 35: Multi-armed Bandits: Applications to Online Advertising

Roadmap

I. Customization in online advertisement

II. Stylized model for display-based online advertisement

III. Insights and takeaway messages

11 / 26

Page 36: Multi-armed Bandits: Applications to Online Advertising

Problem Formulation

Stylized model for display-based online advertisement

finite users (T ) arrive sequentially

finite pool of ads (N ) with given profit margins (wi)

ad-mix (s ∈ S). . .

I ad-slots are interchangeable, no budget constraints

CPC

ad index

display capacity, |s| ≤ C

objective: maximize revenue by suitable ad display policy

user clicks on at most one ad . . .

users are utility maximizers

U(user profile, ad) + ad-mix → click decision

12 / 26

Page 37: Multi-armed Bandits: Applications to Online Advertising

Problem Formulation

Stylized model for display-based online advertisement

finite users (T ) arrive sequentially

finite pool of ads (N ) with given profit margins (wi)

ad-mix (s ∈ S). . .

I ad-slots are interchangeable, no budget constraints

CPC

ad index

display capacity, |s| ≤ C

objective: maximize revenue by suitable ad display policy

user clicks on at most one ad . . .

users are utility maximizers

U(user profile, ad) + ad-mix → click decision

12 / 26

Page 38: Multi-armed Bandits: Applications to Online Advertising

Problem Formulation

Stylized model for display-based online advertisement

finite users (T ) arrive sequentially

finite pool of ads (N ) with given profit margins (wi)

ad-mix (s ∈ S). . .

I ad-slots are interchangeable, no budget constraints

CPC

ad index

display capacity, |s| ≤ C

objective: maximize revenue by suitable ad display policy

user clicks on at most one ad . . .

users are utility maximizers

U(user profile, ad) + ad-mix → click decision

12 / 26

Page 39: Multi-armed Bandits: Applications to Online Advertising

Problem Formulation

Stylized model for display-based online advertisement

finite users (T ) arrive sequentially

finite pool of ads (N ) with given profit margins (wi)

ad-mix (s ∈ S). . .

I ad-slots are interchangeable, no budget constraints

CPC

ad index

display capacity, |s| ≤ C

objective: maximize revenue by suitable ad display policy

user clicks on at most one ad . . .

users are utility maximizers

U(user profile, ad) + ad-mix → click decision

12 / 26

Page 40: Multi-armed Bandits: Applications to Online Advertising

Problem Formulation

Stylized model for display-based online advertisement

finite users (T ) arrive sequentially

finite pool of ads (N ) with given profit margins (wi)

ad-mix (s ∈ S). . .

I ad-slots are interchangeable, no budget constraints

CPC

ad index

display capacity, |s| ≤ C

objective: maximize revenue by suitable ad display policy

user clicks on at most one ad . . .

users are utility maximizers

U(user profile, ad) + ad-mix → click decision

12 / 26

Page 41: Multi-armed Bandits: Applications to Online Advertising

Problem Formulation: User Utilities

Logit model with user-specific mean utility

Ui = βi · x+ εi

utility of ad i

user profile

(unobserved. . .)

ad factors

noise (Gumbel)

user profile x is d-dimensional vector [observed]

ad factors βi is d-dimensional vector [to be estimated]

x

0.4

0.9

35

1

sport affinity

prob. male

exp. age

dummy

βi

0.3

1.9

−0.32.3

>

running shoes

our approach: Logistic regression (profiles)

fi(s, x, β) =exp {βi · x}

1 +∑j∈s exp {βj · x}

ad mix

13 / 26

Page 42: Multi-armed Bandits: Applications to Online Advertising

Problem Formulation: User Utilities

Logit model with user-specific mean utility

Ui = βi · x+ εi

utility of ad i

user profile (unobserved. . .)ad factors

noise (Gumbel)

user profile x is d-dimensional vector [observed]

ad factors βi is d-dimensional vector [to be estimated]

x

0.4

0.9

35

1

sport affinity

prob. male

exp. age

dummy

βi

0.3

1.9

−0.32.3

>

running shoes

our approach: Logistic regression (profiles)

fi(s, x, β) =exp {βi · x}

1 +∑j∈s exp {βj · x}

ad mix

13 / 26

Page 43: Multi-armed Bandits: Applications to Online Advertising

Problem Formulation: User Utilities

Logit model with user-specific mean utility

Ui = βi · x+ εi

utility of ad i

user profile

(unobserved. . .)

ad factors

noise (Gumbel)

user profile x is d-dimensional vector [observed]

ad factors βi is d-dimensional vector [to be estimated]

x

0.4

0.9

35

1

sport affinity

prob. male

exp. age

dummy

βi

0.3

1.9

−0.32.3

>

running shoes

our approach: Logistic regression (profiles)

fi(s, x, β) =exp {βi · x}

1 +∑j∈s exp {βj · x}

ad mix

13 / 26

Page 44: Multi-armed Bandits: Applications to Online Advertising

Problem Formulation: User Utilities

Logit model with user-specific mean utility

Ui = βi · x+ εi

utility of ad i

user profile

(unobserved. . .)

ad factors

noise (Gumbel)

user profile x is d-dimensional vector [observed]

ad factors βi is d-dimensional vector [to be estimated]

x

0.4

0.9

35

1

sport affinity

prob. male

exp. age

dummy

βi

0.3

1.9

−0.32.3

>

running shoes

our approach: Logistic regression (profiles)

fi(s, x, β) =exp {βi · x}

1 +∑j∈s exp {βj · x}

ad mix

13 / 26

Page 45: Multi-armed Bandits: Applications to Online Advertising

Problem Formulation: User Utilities

Logit model with user-specific mean utility

Ui = βi · x+ εi

utility of ad i

user profile

(unobserved. . .)

ad factors

noise (Gumbel)

user profile x is d-dimensional vector [observed]

ad factors βi is d-dimensional vector [to be estimated]

x

0.4

0.9

35

1

sport affinity

prob. male

exp. age

dummy

βi

0.3

1.9

−0.32.3

>

running shoes

our approach: Logistic regression (profiles)

fi(s, x, β) =exp {βi · x}

1 +∑j∈s exp {βj · x}

ad mix

13 / 26

Page 46: Multi-armed Bandits: Applications to Online Advertising

Problem Formulation: User Utilities

Logit model with user-specific mean utility

Ui = βi · x+ εi

utility of ad i

user profile

(unobserved. . .)

ad factors

noise (Gumbel)

user profile x is d-dimensional vector [observed]

ad factors βi is d-dimensional vector [to be estimated]

x

0.4

0.9

35

1

sport affinity

prob. male

exp. age

dummy

βi

0.3

1.9

−0.32.3

>

running shoes

our approach: Logistic regression (profiles)

fi(s, x, β) =exp {βi · x}

1 +∑j∈s exp {βj · x}

ad mix

13 / 26

Page 47: Multi-armed Bandits: Applications to Online Advertising

Problem Formulation

expected revenue from displaying ad mix s to user profile x:

r(s, x, β) =∑i∈s

wi ·

(exp {βi · x}

1 +∑j∈s exp {βj · x}

)

ad profit margin logit click prob.

profile Xt drawn from a finite set X according to distribution GI finite number of user segments. . .I G reflects histogram of population

ad i factors βi initially unknown for all ads

14 / 26

Page 48: Multi-armed Bandits: Applications to Online Advertising

Problem Formulation

expected revenue from displaying ad mix s to user profile x:

r(s, x, β) =∑i∈s

wi ·

(exp {βi · x}

1 +∑j∈s exp {βj · x}

)

ad profit margin logit click prob.

profile Xt drawn from a finite set X according to distribution GI finite number of user segments. . .I G reflects histogram of population

ad i factors βi initially unknown for all ads

14 / 26

Page 49: Multi-armed Bandits: Applications to Online Advertising

Problem Formulation

expected revenue from displaying ad mix s to user profile x:

r(s, x, β) =∑i∈s

wi ·

(exp {βi · x}

1 +∑j∈s exp {βj · x}

)

ad profit margin logit click prob.

profile Xt drawn from a finite set X according to distribution GI finite number of user segments. . .I G reflects histogram of population

ad i factors βi initially unknown for all ads

14 / 26

Page 50: Multi-armed Bandits: Applications to Online Advertising

Oracle Benchmark

Suppose publisher knows β a priori

formulate and solve an optimization problem

J∗(T |β) := sups(·)

E

[T∑t=1

r(s(t), Xt, β)

]known parameters

feasible ad policiesexpected revenue

Oracle policy: offer s∗(Xt, β) to user t

s∗(x, β) ∈ argmax {r(s, x, β) : s ∈ S}

expected revenue

15 / 26

Page 51: Multi-armed Bandits: Applications to Online Advertising

Oracle Benchmark

Suppose publisher knows β a priori

formulate and solve an optimization problem

J∗(T |β) := sups(·)

E

[T∑t=1

r(s(t), Xt, β)

]known parameters

feasible ad policiesexpected revenue

Oracle policy: offer s∗(Xt, β) to user t

s∗(x, β) ∈ argmax {r(s, x, β) : s ∈ S}

expected revenue

15 / 26

Page 52: Multi-armed Bandits: Applications to Online Advertising

Measuring Policy Performance

ad mix decision for feasible policies based on history of past

interaction and current user profile

performance of ad mix policy π:

revenue loss relative to oracle policy

R(π, T ) := J∗(T |β)− E

[T∑t=1

r(sπ(t), Xt, β)

]expected revenue

Main Q: how small can we make this revenue loss?

structure of an optimal policy?

16 / 26

Page 53: Multi-armed Bandits: Applications to Online Advertising

Measuring Policy Performance

ad mix decision for feasible policies based on history of past

interaction and current user profile

performance of ad mix policy π:

revenue loss relative to oracle policy

R(π, T ) := J∗(T |β)− E

[T∑t=1

r(sπ(t), Xt, β)

]expected revenue

Main Q: how small can we make this revenue loss?

structure of an optimal policy?

16 / 26

Page 54: Multi-armed Bandits: Applications to Online Advertising

Limit of Achievable Performance

Theorem [ Saure and Z (2012) ]

Any good policy π must incur revenue loss

R(π, T ) ≥∑i∈N

Ki log T

• Fix user profile x ∈ X :

N N (x)“Interesting” ads

wi ≥ r(s∗(x, β), x, β)

s∗(x, β)

Optimal ad-mix

“Uninteresting” ads

wi < r(s∗(x, β), x, β)

• Fix ad i ∈ N :

X Xii is “Interesting”

span(Oi)

Oii is optimal

Ei

additional exploration

Ki ∼ rank(Xi)− rank(Oi)

17 / 26

Page 55: Multi-armed Bandits: Applications to Online Advertising

Limit of Achievable Performance

Theorem [ Saure and Z (2012) ]

Any good policy π must incur revenue loss

R(π, T ) ≥∑i∈N

Ki log T

• Fix user profile x ∈ X :

N

N (x)“Interesting” ads

wi ≥ r(s∗(x, β), x, β)

s∗(x, β)

Optimal ad-mix

“Uninteresting” ads

wi < r(s∗(x, β), x, β)

• Fix ad i ∈ N :

X Xii is “Interesting”

span(Oi)

Oii is optimal

Ei

additional exploration

Ki ∼ rank(Xi)− rank(Oi)

17 / 26

Page 56: Multi-armed Bandits: Applications to Online Advertising

Limit of Achievable Performance

Theorem [ Saure and Z (2012) ]

Any good policy π must incur revenue loss

R(π, T ) ≥∑i∈N

Ki log T

• Fix user profile x ∈ X :

N

N (x)“Interesting” ads

wi ≥ r(s∗(x, β), x, β)

s∗(x, β)

Optimal ad-mix

“Uninteresting” ads

wi < r(s∗(x, β), x, β)

• Fix ad i ∈ N :

X Xii is “Interesting”

span(Oi)

Oii is optimal

Ei

additional exploration

Ki ∼ rank(Xi)− rank(Oi)

17 / 26

Page 57: Multi-armed Bandits: Applications to Online Advertising

Limit of Achievable Performance

Theorem [ Saure and Z (2012) ]

Any good policy π must incur revenue loss

R(π, T ) ≥∑i∈N

Ki log T

• Fix user profile x ∈ X :

N N (x)“Interesting” ads

wi ≥ r(s∗(x, β), x, β)

s∗(x, β)

Optimal ad-mix

“Uninteresting” ads

wi < r(s∗(x, β), x, β)

• Fix ad i ∈ N :

X Xii is “Interesting”

span(Oi)

Oii is optimal

Ei

additional exploration

Ki ∼ rank(Xi)− rank(Oi)

17 / 26

Page 58: Multi-armed Bandits: Applications to Online Advertising

Limit of Achievable Performance

Theorem [ Saure and Z (2012) ]

Any good policy π must incur revenue loss

R(π, T ) ≥∑i∈N

Ki log T

• Fix user profile x ∈ X :

N N (x)“Interesting” ads

wi ≥ r(s∗(x, β), x, β)

s∗(x, β)

Optimal ad-mix

“Uninteresting” ads

wi < r(s∗(x, β), x, β)

• Fix ad i ∈ N :

X Xii is “Interesting”

span(Oi)

Oii is optimal

Ei

additional exploration

Ki ∼ rank(Xi)− rank(Oi)

17 / 26

Page 59: Multi-armed Bandits: Applications to Online Advertising

Limit of Achievable Performance

Theorem [ Saure and Z (2012) ]

Any good policy π must incur revenue loss

R(π, T ) ≥∑i∈N

Ki log T

• Fix user profile x ∈ X :

N N (x)“Interesting” ads

wi ≥ r(s∗(x, β), x, β)

s∗(x, β)

Optimal ad-mix

“Uninteresting” ads

wi < r(s∗(x, β), x, β)

• Fix ad i ∈ N :

X Xii is “Interesting”

span(Oi)

Oii is optimal

Ei

additional exploration

Ki ∼ rank(Xi)− rank(Oi)

17 / 26

Page 60: Multi-armed Bandits: Applications to Online Advertising

Limit of Achievable Performance

Theorem [ Saure and Z (2012) ]

Any good policy π must incur revenue loss

R(π, T ) ≥∑i∈N

Ki log T

• Fix user profile x ∈ X :

N N (x)“Interesting” ads

wi ≥ r(s∗(x, β), x, β)

s∗(x, β)

Optimal ad-mix

“Uninteresting” ads

wi < r(s∗(x, β), x, β)

• Fix ad i ∈ N :

X

Xii is “Interesting”

span(Oi)

Oii is optimal

Ei

additional exploration

Ki ∼ rank(Xi)− rank(Oi)

17 / 26

Page 61: Multi-armed Bandits: Applications to Online Advertising

Limit of Achievable Performance

Theorem [ Saure and Z (2012) ]

Any good policy π must incur revenue loss

R(π, T ) ≥∑i∈N

Ki log T

• Fix user profile x ∈ X :

N N (x)“Interesting” ads

wi ≥ r(s∗(x, β), x, β)

s∗(x, β)

Optimal ad-mix

“Uninteresting” ads

wi < r(s∗(x, β), x, β)

• Fix ad i ∈ N :

X Xii is “Interesting”

span(Oi)

Oii is optimal

Ei

additional exploration

Ki ∼ rank(Xi)− rank(Oi)

17 / 26

Page 62: Multi-armed Bandits: Applications to Online Advertising

Limit of Achievable Performance

Theorem [ Saure and Z (2012) ]

Any good policy π must incur revenue loss

R(π, T ) ≥∑i∈N

Ki log T

• Fix user profile x ∈ X :

N N (x)“Interesting” ads

wi ≥ r(s∗(x, β), x, β)

s∗(x, β)

Optimal ad-mix

“Uninteresting” ads

wi < r(s∗(x, β), x, β)

• Fix ad i ∈ N :

X Xii is “Interesting”

span(Oi)

Oii is optimal

Ei

additional exploration

Ki ∼ rank(Xi)− rank(Oi)

17 / 26

Page 63: Multi-armed Bandits: Applications to Online Advertising

Limit of Achievable Performance

Theorem [ Saure and Z (2012) ]

Any good policy π must incur revenue loss

R(π, T ) ≥∑i∈N

Ki log T

• Fix user profile x ∈ X :

N N (x)“Interesting” ads

wi ≥ r(s∗(x, β), x, β)

s∗(x, β)

Optimal ad-mix

“Uninteresting” ads

wi < r(s∗(x, β), x, β)

• Fix ad i ∈ N :

X Xii is “Interesting”

span(Oi)

Oii is optimal

Ei

additional exploration

Ki ∼ rank(Xi)− rank(Oi)

17 / 26

Page 64: Multi-armed Bandits: Applications to Online Advertising

Limit of Achievable Performance

Theorem [ Saure and Z (2012) ]

Any good policy π must incur revenue loss

R(π, T ) ≥∑i∈N

Ki log T

• Fix user profile x ∈ X :

N N (x)“Interesting” ads

wi ≥ r(s∗(x, β), x, β)

s∗(x, β)

Optimal ad-mix

“Uninteresting” ads

wi < r(s∗(x, β), x, β)

• Fix ad i ∈ N :

X Xii is “Interesting”

span(Oi)

Oii is optimal

Ei

additional exploration

Ki ∼ rank(Xi)− rank(Oi)

17 / 26

Page 65: Multi-armed Bandits: Applications to Online Advertising

Limit of Achievable Performance

Theorem [ Saure and Z (2012) ]

Any good policy π must incur revenue loss

R(π, T ) ≥∑i∈N

Ki log T

• Fix user profile x ∈ X :

N N (x)“Interesting” ads

wi ≥ r(s∗(x, β), x, β)

s∗(x, β)

Optimal ad-mix

“Uninteresting” ads

wi < r(s∗(x, β), x, β)

• Fix ad i ∈ N :

X Xii is “Interesting”

span(Oi)

Oii is optimal

Ei

additional exploration

Ki ∼ rank(Xi)− rank(Oi)

17 / 26

Page 66: Multi-armed Bandits: Applications to Online Advertising

Limit of Achievable Performance

Theorem [ Saure and Z (2012) ]

Any good policy π must incur revenue loss

R(π, T ) ≥∑i∈N

Ki log T

• Fix user profile x ∈ X :

N N (x)“Interesting” ads

wi ≥ r(s∗(x, β), x, β)

s∗(x, β)

Optimal ad-mix

“Uninteresting” ads

wi < r(s∗(x, β), x, β)

• Fix ad i ∈ N :

X Xii is “Interesting”

span(Oi)

Oii is optimal

Ei

additional exploration

Ki ∼ rank(Xi)− rank(Oi)

17 / 26

Page 67: Multi-armed Bandits: Applications to Online Advertising

Limit of Achievable Performance

Theorem [ Saure and Z (2012) ]

Any good policy π must incur revenue loss

R(π, T ) ≥∑i∈N

Ki log T

• Fix user profile x ∈ X :

N N (x)“Interesting” ads

wi ≥ r(s∗(x, β), x, β)

s∗(x, β)

Optimal ad-mix

“Uninteresting” ads

wi < r(s∗(x, β), x, β)

• Fix ad i ∈ N :

X Xii is “Interesting”

span(Oi)

Oii is optimal

Ei

additional exploration

Ki ∼ rank(Xi)− rank(Oi)

17 / 26

Page 68: Multi-armed Bandits: Applications to Online Advertising

Qualitative Insights and Policy Design

Ad/profile exploration as source of revenue loss

for a given ad, there is no need to estimate mean utilities for everyprofile

I need to assess performance only on some profiles (Xi)I use information on set spanning such profiles

use information that does not contribute to revenue lossI use profiles for which an ad is optimal

information contributing to revenue loss must be cappedI performed on order log T users. . .

18 / 26

Page 69: Multi-armed Bandits: Applications to Online Advertising

Qualitative Insights and Policy Design

Ad/profile exploration as source of revenue loss

for a given ad, there is no need to estimate mean utilities for everyprofile

I need to assess performance only on some profiles (Xi)I use information on set spanning such profiles

use information that does not contribute to revenue lossI use profiles for which an ad is optimal

information contributing to revenue loss must be cappedI performed on order log T users. . .

18 / 26

Page 70: Multi-armed Bandits: Applications to Online Advertising

Qualitative Insights and Policy Design

Ad/profile exploration as source of revenue loss

for a given ad, there is no need to estimate mean utilities for everyprofile

I need to assess performance only on some profiles (Xi)I use information on set spanning such profiles

use information that does not contribute to revenue lossI use profiles for which an ad is optimal

information contributing to revenue loss must be cappedI performed on order log T users. . .

18 / 26

Page 71: Multi-armed Bandits: Applications to Online Advertising

Qualitative Insights and Policy Design

Ad/profile exploration as source of revenue loss

for a given ad, there is no need to estimate mean utilities for everyprofile

I need to assess performance only on some profiles (Xi)I use information on set spanning such profiles

use information that does not contribute to revenue lossI use profiles for which an ad is optimal

information contributing to revenue loss must be cappedI performed on order log T users. . .

18 / 26

Page 72: Multi-armed Bandits: Applications to Online Advertising

Policy design: Key ingredients

Intuition: force right frequency of ad i experimentation

on suitable estimation-set (Ei ∈ X )

order log T users

spanning Xi,

Construction:

estimate model parameter for ad i using only information on profilesin Ei

# clicks on ad i for profile x

# no clicks and ad i offered for profile xexp(βi · x)

adapt Ei to span a proxy for Xi . . .

use most explored profiles

for user t force order-(log t) exploration on Ei

19 / 26

Page 73: Multi-armed Bandits: Applications to Online Advertising

Policy design: Key ingredients

Intuition: force right frequency of ad i experimentation

on suitable estimation-set (Ei ∈ X )

order log T users

spanning Xi,

Construction:

estimate model parameter for ad i using only information on profilesin Ei

# clicks on ad i for profile x

# no clicks and ad i offered for profile xexp(βi · x)

adapt Ei to span a proxy for Xi . . .

use most explored profiles

for user t force order-(log t) exploration on Ei

19 / 26

Page 74: Multi-armed Bandits: Applications to Online Advertising

Policy design: Key ingredients

Intuition: force right frequency of ad i experimentation

on suitable estimation-set (Ei ∈ X )

order log T users

spanning Xi,

Construction:

estimate model parameter for ad i using only information on profilesin Ei

E# clicks on ad i for profile x

E# no clicks and ad i offered for profile x= exp(βi · x)

adapt Ei to span a proxy for Xi . . .

use most explored profiles

for user t force order-(log t) exploration on Ei

19 / 26

Page 75: Multi-armed Bandits: Applications to Online Advertising

Policy design: Key ingredients

Intuition: force right frequency of ad i experimentation

on suitable estimation-set (Ei ∈ X )

order log T users

spanning Xi,

Construction:

estimate model parameter for ad i using only information on profilesin Ei

# clicks on ad i for profile x

# no clicks and ad i offered for profile x≈ exp(βi · x)

adapt Ei to span a proxy for Xi . . .

use most explored profiles

for user t force order-(log t) exploration on Ei

19 / 26

Page 76: Multi-armed Bandits: Applications to Online Advertising

Policy design: Key ingredients

Intuition: force right frequency of ad i experimentation

on suitable estimation-set (Ei ∈ X )

order log T users

spanning Xi,

Construction:

estimate model parameter for ad i using only information on profilesin Ei

# clicks on ad i for profile x

# no clicks and ad i offered for profile x≈ exp(βi · x) , x ∈ Ei

adapt Ei to span a proxy for Xi . . .

use most explored profiles

for user t force order-(log t) exploration on Ei

19 / 26

Page 77: Multi-armed Bandits: Applications to Online Advertising

Policy design: Key ingredients

Intuition: force right frequency of ad i experimentation

on suitable estimation-set (Ei ∈ X )

order log T users

spanning Xi,

Construction:

estimate model parameter for ad i using only information on profilesin Ei

β̂i ∈{ρ ∈ Rd :

# clicks on ad i for profile x

# no clicks and ad i offered for profile x= exp(ρ · x) , x ∈ Ei

}

adapt Ei to span a proxy for Xi . . .

use most explored profiles

for user t force order-(log t) exploration on Ei

19 / 26

Page 78: Multi-armed Bandits: Applications to Online Advertising

Policy design: Key ingredients

Intuition: force right frequency of ad i experimentation

on suitable estimation-set (Ei ∈ X )

order log T users

spanning Xi,

Construction:

estimate model parameter for ad i using only information on profilesin Ei

β̂i ∈{ρ ∈ Rd :

# clicks on ad i for profile x

# no clicks and ad i offered for profile x= exp(ρ · x) , x ∈ Ei

}

adapt Ei to span a proxy for Xi . . .

use most explored profiles

for user t force order-(log t) exploration on Ei

19 / 26

Page 79: Multi-armed Bandits: Applications to Online Advertising

Policy design: Key ingredients

Intuition: force right frequency of ad i experimentation

on suitable estimation-set (Ei ∈ X )

order log T users

spanning Xi,

Construction:

estimate model parameter for ad i using only information on profilesin Ei

β̂i ∈{ρ ∈ Rd :

# clicks on ad i for profile x

# no clicks and ad i offered for profile x= exp(ρ · x) , x ∈ Ei

}

adapt Ei to span a proxy for Xi . . .

use most explored profiles

for user t force order-(log t) exploration on Ei

19 / 26

Page 80: Multi-armed Bandits: Applications to Online Advertising

Policy design: Key ingredients

Intuition: force right frequency of ad i experimentation

on suitable estimation-set (Ei ∈ X )

order log T users

spanning Xi, including Oi

Construction:

estimate model parameter for ad i using only information on profilesin Ei

β̂i ∈{ρ ∈ Rd :

# clicks on ad i for profile x

# no clicks and ad i offered for profile x= exp(ρ · x) , x ∈ Ei

}

adapt Ei to span a proxy for Xi . . .

use most explored profiles

for user t force order-(log t) exploration on Ei

19 / 26

Page 81: Multi-armed Bandits: Applications to Online Advertising

Policy design: Key ingredients

Intuition: force right frequency of ad i experimentation

on suitable estimation-set (Ei ∈ X )

order log T users

spanning Xi, including Oi

Construction:

estimate model parameter for ad i using only information on profilesin Ei

β̂i ∈{ρ ∈ Rd :

# clicks on ad i for profile x

# no clicks and ad i offered for profile x= exp(ρ · x) , x ∈ Ei

}

adapt Ei to span a proxy for Xi . . . use most explored profiles

for user t force order-(log t) exploration on Ei

19 / 26

Page 82: Multi-armed Bandits: Applications to Online Advertising

Policy design: Key ingredients

Intuition: force right frequency of ad i experimentation

on suitable estimation-set (Ei ∈ X )

order log T users

spanning Xi, including Oi

Construction:

estimate model parameter for ad i using only information on profilesin Ei

β̂i ∈{ρ ∈ Rd :

# clicks on ad i for profile x

# no clicks and ad i offered for profile x= exp(ρ · x) , x ∈ Ei

}

adapt Ei to span a proxy for Xi . . . use most explored profiles

for user t force order-(log t) exploration on Ei

19 / 26

Page 83: Multi-armed Bandits: Applications to Online Advertising

Policy design: Key ingredients

Intuition: force right frequency of ad i experimentation

on suitable estimation-set (Ei ∈ X )

order log T users

spanning Xi, including Oi

Construction:

estimate model parameter for ad i using only information on profilesin Ei

β̂i ∈{ρ ∈ Rd :

# clicks on ad i for profile x

# no clicks and ad i offered for profile x= exp(ρ · x) , x ∈ Ei

}

adapt Ei to span a proxy for Xi . . . use most explored profiles

for user t force order-(log t) exploration on Ei

19 / 26

Page 84: Multi-armed Bandits: Applications to Online Advertising

Structure of the Proposed Policy

Algorithm structure: π∗ = π(κ)← tuning parameter

Initialize exploration sets Ei for all ad i

for every user t:

I for ad i: get estimate for βi using exploration set Ei

I for ad i: use β̂ to update Ei to span X̂i (most explored)

I EXPLORE on ads for which user t profile:

1. is useful for estimation (Xt ∈ Ei)

2. is under-tested (displayed to ≤ κ log t such users)

I otherwise, EXPLOIT approximate oracle solution s∗(Xt, β̂)

20 / 26

Page 85: Multi-armed Bandits: Applications to Online Advertising

Structure of the Proposed Policy

Algorithm structure: π∗ = π(κ)← tuning parameter

Initialize exploration sets Ei for all ad i

for every user t:

I for ad i: get estimate for βi using exploration set Ei

I for ad i: use β̂ to update Ei to span X̂i (most explored)

I EXPLORE on ads for which user t profile:

1. is useful for estimation (Xt ∈ Ei)

2. is under-tested (displayed to ≤ κ log t such users)

I otherwise, EXPLOIT approximate oracle solution s∗(Xt, β̂)

20 / 26

Page 86: Multi-armed Bandits: Applications to Online Advertising

Structure of the Proposed Policy

Algorithm structure: π∗ = π(κ)← tuning parameter

Initialize exploration sets Ei for all ad i

for every user t:

I for ad i: get estimate for βi using exploration set Ei

I for ad i: use β̂ to update Ei to span X̂i (most explored)

I EXPLORE on ads for which user t profile:

1. is useful for estimation (Xt ∈ Ei)

2. is under-tested (displayed to ≤ κ log t such users)

I otherwise, EXPLOIT approximate oracle solution s∗(Xt, β̂)

20 / 26

Page 87: Multi-armed Bandits: Applications to Online Advertising

Structure of the Proposed Policy

Algorithm structure: π∗ = π(κ)← tuning parameter

Initialize exploration sets Ei for all ad i

for every user t:

I for ad i: get estimate for βi using exploration set Ei

I for ad i: use β̂ to update Ei to span X̂i (most explored)

I EXPLORE on ads for which user t profile:

1. is useful for estimation (Xt ∈ Ei)

2. is under-tested (displayed to ≤ κ log t such users)

I otherwise, EXPLOIT approximate oracle solution s∗(Xt, β̂)

20 / 26

Page 88: Multi-armed Bandits: Applications to Online Advertising

Structure of the Proposed Policy

Algorithm structure: π∗ = π(κ)← tuning parameter

Initialize exploration sets Ei for all ad i

for every user t:

I for ad i: get estimate for βi using exploration set Ei

I for ad i: use β̂ to update Ei to span X̂i (most explored)

I EXPLORE on ads for which user t profile:

1. is useful for estimation (Xt ∈ Ei)

2. is under-tested (displayed to ≤ κ log t such users)

I otherwise, EXPLOIT approximate oracle solution s∗(Xt, β̂)

20 / 26

Page 89: Multi-armed Bandits: Applications to Online Advertising

Structure of the Proposed Policy

Algorithm structure: π∗ = π(κ)← tuning parameter

Initialize exploration sets Ei for all ad i

for every user t:

I for ad i: get estimate for βi using exploration set Ei

I for ad i: use β̂ to update Ei to span X̂i (most explored)

I EXPLORE on ads for which user t profile:

1. is useful for estimation (Xt ∈ Ei)

2. is under-tested (displayed to ≤ κ log t such users)

I otherwise, EXPLOIT approximate oracle solution s∗(Xt, β̂)

20 / 26

Page 90: Multi-armed Bandits: Applications to Online Advertising

Structure of the Proposed Policy

Algorithm structure: π∗ = π(κ)← tuning parameter

Initialize exploration sets Ei for all ad i

for every user t:

I for ad i: get estimate for βi using exploration set Ei

I for ad i: use β̂ to update Ei to span X̂i (most explored)

I EXPLORE on ads for which user t profile:

1. is useful for estimation (Xt ∈ Ei)

2. is under-tested (displayed to ≤ κ log t such users)

I otherwise, EXPLOIT approximate oracle solution s∗(Xt, β̂)

20 / 26

Page 91: Multi-armed Bandits: Applications to Online Advertising

Performance of the Proposed Policy

Theorem [ Saure and Z (2012) ]

For suitable chosen tuning parameter κ,

R(π∗, T ) ≤ K∑i∈N

(rank(Xi)− rank(Oi)) log T +K,

where K, K > 0 are finite constants

policy is essentially optimal

Key results: for each profile

uninteresting ads displayed to finite (independent of T ) number of

users

ads in the optimal mix displayed outside that mix finitely many times

21 / 26

Page 92: Multi-armed Bandits: Applications to Online Advertising

Performance of the Proposed Policy

Theorem [ Saure and Z (2012) ]

For suitable chosen tuning parameter κ,

R(π∗, T ) ≤ K∑i∈N

(rank(Xi)− rank(Oi)) log T +K,

where K, K > 0 are finite constants

policy is essentially optimal

Key results: for each profile

uninteresting ads displayed to finite (independent of T ) number of

users

ads in the optimal mix displayed outside that mix finitely many times

21 / 26

Page 93: Multi-armed Bandits: Applications to Online Advertising

Performance of the Proposed Policy

Theorem [ Saure and Z (2012) ]

For suitable chosen tuning parameter κ,

R(π∗, T ) ≤ K∑i∈N

(rank(Xi)− rank(Oi)) log T +K,

where K, K > 0 are finite constants

policy is essentially optimal

Key results: for each profile

uninteresting ads displayed to finite (independent of T ) number of

users

ads in the optimal mix displayed outside that mix finitely many times

21 / 26

Page 94: Multi-armed Bandits: Applications to Online Advertising

Proof Sketch

discrete nature of optimization problem

parameter estimation with O(log t) tests

P{‖βi − β̂i‖∞ > ε

}≤ exp(−cκ log t) = 1

tcκ

balance exploration and exploitation error (κ > c−1)

R(π∗, T ) ≤ O

(κ log T +

T∑t=1

1

tcκ

)

tuning parameterthreshold error

min optimality gap

across profiles+ ⇒continuity of expected

revenue w.r.t β

threshold on

estimation error

22 / 26

Page 95: Multi-armed Bandits: Applications to Online Advertising

Proof Sketch

discrete nature of optimization problem

parameter estimation with O(log t) tests

P{‖βi − β̂i‖∞ > ε

}≤ exp(−cκ log t) = 1

tcκ

balance exploration and exploitation error (κ > c−1)

R(π∗, T ) ≤ O

(κ log T +

T∑t=1

1

tcκ

)

tuning parameterthreshold error

min optimality gap

across profiles+ ⇒continuity of expected

revenue w.r.t β

threshold on

estimation error

22 / 26

Page 96: Multi-armed Bandits: Applications to Online Advertising

Proof Sketch

discrete nature of optimization problem

parameter estimation with O(log t) tests

P{‖βi − β̂i‖∞ > ε

}≤ exp(−cκ log t) = 1

tcκ

balance exploration and exploitation error (κ > c−1)

R(π∗, T ) ≤ O

(κ log T +

T∑t=1

1

tcκ

)

tuning parameterthreshold error

min optimality gap

across profiles+ ⇒continuity of expected

revenue w.r.t β

threshold on

estimation error

22 / 26

Page 97: Multi-armed Bandits: Applications to Online Advertising

Numerical Illustration

4 products, 3 two-dimensional profiles

feasible set S := {s ⊂ N : |s| ≤ 2}, κ = 40

β =

(−1.30 2.00 2.75 3.00

3.00 2.00 2.75 −1.30

)X =

(0.1 0.5 0.9

0.9 0.5 0.1

)

Oracle solution

profile x1 x2 x3

opt. mix {1, 2} {2, 3} {2, 4}opt. revenue 0.587 0.546 0.578

uninteresting {3} - {3}

anon. mix {1, 2} {1, 2} {1, 2}anon. revenue 0.587 0.543 0.525

T

R(π∗, T )

log T

R(π∗, T )

T

P {optimal|x3}

P {optimal|x2}

P {optimal|x1}

23 / 26

Page 98: Multi-armed Bandits: Applications to Online Advertising

Numerical Illustration

4 products, 3 two-dimensional profiles

feasible set S := {s ⊂ N : |s| ≤ 2}, κ = 40

β =

(−1.30 2.00 2.75 3.00

3.00 2.00 2.75 −1.30

)X =

(0.1 0.5 0.9

0.9 0.5 0.1

)

Oracle solution

profile x1 x2 x3

opt. mix {1, 2} {2, 3} {2, 4}opt. revenue 0.587 0.546 0.578

uninteresting {3} - {3}

anon. mix {1, 2} {1, 2} {1, 2}anon. revenue 0.587 0.543 0.525

T

R(π∗, T )

log T

R(π∗, T )

T

P {optimal|x3}

P {optimal|x2}

P {optimal|x1}

23 / 26

Page 99: Multi-armed Bandits: Applications to Online Advertising

Numerical Illustration

4 products, 3 two-dimensional profiles

feasible set S := {s ⊂ N : |s| ≤ 2}, κ = 40

β =

(−1.30 2.00 2.75 3.00

3.00 2.00 2.75 −1.30

)X =

(0.1 0.5 0.9

0.9 0.5 0.1

)

Oracle solution

profile x1 x2 x3

opt. mix {1, 2} {2, 3} {2, 4}opt. revenue 0.587 0.546 0.578

uninteresting {3} - {3}

anon. mix {1, 2} {1, 2} {1, 2}anon. revenue 0.587 0.543 0.525

T

R(π∗, T )

log T

R(π∗, T )

T

P {optimal|x3}

P {optimal|x2}

P {optimal|x1}

23 / 26

Page 100: Multi-armed Bandits: Applications to Online Advertising

Numerical Illustration

4 products, 3 two-dimensional profiles

feasible set S := {s ⊂ N : |s| ≤ 2}, κ = 40

β =

(−1.30 2.00 2.75 3.00

3.00 2.00 2.75 −1.30

)X =

(0.1 0.5 0.9

0.9 0.5 0.1

)

Oracle solution

profile x1 x2 x3

opt. mix {1, 2} {2, 3} {2, 4}opt. revenue 0.587 0.546 0.578

uninteresting {3} - {3}

anon. mix {1, 2} {1, 2} {1, 2}anon. revenue 0.587 0.543 0.525

T

R(π∗, T )

log T

R(π∗, T )

T

P {optimal|x3}

P {optimal|x2}

P {optimal|x1}

23 / 26

Page 101: Multi-armed Bandits: Applications to Online Advertising

Numerical Illustration

4 products, 3 two-dimensional profiles

feasible set S := {s ⊂ N : |s| ≤ 2}, κ = 40

β =

(−1.30 2.00 2.75 3.00

3.00 2.00 2.75 −1.30

)X =

(0.1 0.5 0.9

0.9 0.5 0.1

)

Oracle solution

profile x1 x2 x3

opt. mix {1, 2} {2, 3} {2, 4}opt. revenue 0.587 0.546 0.578

uninteresting {3} - {3}anon. mix {1, 2} {1, 2} {1, 2}

anon. revenue 0.587 0.543 0.525

T

R(π∗, T )

log T

R(π∗, T )

T

P {optimal|x3}

P {optimal|x2}

P {optimal|x1}

23 / 26

Page 102: Multi-armed Bandits: Applications to Online Advertising

Roadmap

I. Customization in online advertisement

II. Stylized model for display-based online advertisement

III. Insights and takeaway messages

24 / 26

Page 103: Multi-armed Bandits: Applications to Online Advertising

Insights and Takeaway Messages

value of customizationI speed of learningI misspecification risk

cost of informationI “suboptimal” explorationI dependence on structure

T

R(π∗, T )

R(anonymous, T )

cost of information

value of customization

25 / 26

Page 104: Multi-armed Bandits: Applications to Online Advertising

Insights and Takeaway Messages

value of customizationI speed of learningI misspecification risk

cost of informationI “suboptimal” explorationI dependence on structure

T

R(π∗, T )

R(anonymous, T )

cost of information

value of customization

25 / 26

Page 105: Multi-armed Bandits: Applications to Online Advertising

Insights and Takeaway Messages

value of customizationI speed of learningI misspecification risk

cost of informationI “suboptimal” explorationI dependence on structure

T

R(π∗, T )

R(anonymous, T )

cost of information

value of customization

25 / 26

Page 106: Multi-armed Bandits: Applications to Online Advertising

Insights and Takeaway Messages

value of customizationI speed of learningI misspecification risk

cost of informationI “suboptimal” explorationI dependence on structure

T

R(π∗, T )

R(anonymous, T )

cost of information

value of customization

25 / 26

Page 107: Multi-armed Bandits: Applications to Online Advertising

Final Thoughts

concepts

I semi-myopic type policies [ avoid incomplete learning ]I minimal exploration neededI significant gains from customizing policies to application

analysis tools / machineryI information theoretic inequalities [ lower bounds ]I martingale methods, large deviation bounds [ analysis of policies ]I sequential hypothesis testing

related recent applications of MAB

I dynamic content referral [ Besbes, Gur and Z (2012a) ]I temperature tracking and restless bandits [ Besbes and Z (2012b) ]I personalization (Pandora, various recommendation systems etc)I dynamic design of experiments / screeningI cognitive radio [ Lai et al (2011) ]I mechanism design formulation [ Kakade et al (2012) ]

26 / 26

Page 108: Multi-armed Bandits: Applications to Online Advertising

Final Thoughts

concepts

I semi-myopic type policies [ avoid incomplete learning ]I minimal exploration neededI significant gains from customizing policies to application

analysis tools / machineryI information theoretic inequalities [ lower bounds ]I martingale methods, large deviation bounds [ analysis of policies ]I sequential hypothesis testing

related recent applications of MAB

I dynamic content referral [ Besbes, Gur and Z (2012a) ]I temperature tracking and restless bandits [ Besbes and Z (2012b) ]I personalization (Pandora, various recommendation systems etc)I dynamic design of experiments / screeningI cognitive radio [ Lai et al (2011) ]I mechanism design formulation [ Kakade et al (2012) ]

26 / 26

Page 109: Multi-armed Bandits: Applications to Online Advertising

Final Thoughts

concepts

I semi-myopic type policies [ avoid incomplete learning ]I minimal exploration neededI significant gains from customizing policies to application

analysis tools / machineryI information theoretic inequalities [ lower bounds ]I martingale methods, large deviation bounds [ analysis of policies ]I sequential hypothesis testing

related recent applications of MAB

I dynamic content referral [ Besbes, Gur and Z (2012a) ]I temperature tracking and restless bandits [ Besbes and Z (2012b) ]I personalization (Pandora, various recommendation systems etc)I dynamic design of experiments / screeningI cognitive radio [ Lai et al (2011) ]I mechanism design formulation [ Kakade et al (2012) ]

26 / 26