Top Banner
@schuilr 1
47

Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

Apr 15, 2017

Download

Technology

Lviv IT Arena
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

@schuilr 1

Page 2: Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

Case Study: Marktplaats.nl

@schuilr 2

Page 3: Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

Marktplaats.nl

•  Largest classifieds site in the Netherlands

•  One of the most visited websites in NL

•  Founded in 1999, acquired by eBay in 2004

•  Now headquarters to eBay Classifieds Group: 12 brands in 17 countries

@schuilr 3

Page 4: Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

Facts & Figures

•  1.3 million visitors / day–  desktop: 34%, mobile: 49%, tablet: 18%

•  9 million live listings–  350,000 new items / day

•  6 million unique search requests / day–  70 searches per second (average)

@schuilr 4

Page 5: Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

Data & Trends @ Marktplaats

Page 6: Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

Seasonal trends

@schuilr 6

0.00%

1.00%

2.00%

3.00%

4.00%

5.00%

6.00%

7.00%

8.00%

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Vraa

g

Week

skibroek

ski

skipak

snowboard

Winter sports!

Page 7: Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

Seasonal trends

@schuilr 7

Camping!

0.00%

0.50%

1.00%

1.50%

2.00%

2.50%

3.00%

3.50%

4.00%

4.50%

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Vraa

g

Week

caravans

campers

vouwwagen

Page 8: Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

Seasonal trends

@schuilr 8

0.00%

2.00%

4.00%

6.00%

8.00%

10.00%

12.00%

14.00%

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Vraa

g

Week

sinterklaas

kerst

Saint Nicolas & Christmas!

Page 9: Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

Weather, temperature, etc.

@schuilr 9

0"

5"

10"

15"

20"

25"

0.00%"

1.00%"

2.00%"

3.00%"

4.00%"

5.00%"

6.00%"

7.00%"

1" 3" 5" 7" 9" 11" 13" 15" 17" 19" 21" 23" 25" 27" 29" 31" 33" 35" 37" 39" 41" 43" 45" 47" 49" 51"

Tempe

ratuur)

Vraag)

Week)

vliegengordijn"

Temperatuur"

Fly curtains!

Page 10: Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

Weather, temperature, etc.

@schuilr 10

Heaters!0"

5"

10"

15"

20"

25"0.00%"

0.50%"

1.00%"

1.50%"

2.00%"

2.50%"

3.00%"

3.50%"

4.00%"

1" 3" 5" 7" 9" 11" 13" 15" 17" 19" 21" 23" 25" 27" 29" 31" 33" 35" 37" 39" 41" 43" 45" 47" 49" 51"

Tempe

ratuur)

Vraag)

Week)

kachel"

Temperatuur"

Reversed

Page 11: Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

Special events

@schuilr 11

0.00%

1.00%

2.00%

3.00%

4.00%

5.00%

6.00%

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Vraa

g

Week

oranje

Orange (“oranje”)!

World Cup Football

King’s Day

Page 12: Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

During a football game

@schuilr 12

20:45

&20:48

&20:51

&20:54

&20:57

&21:00

&21:03

&21:06

&21:09

&21:12

&21:15

&21:18

&21:21

&21:24

&21:27

&21:30

&21:33

&21:36

&21:39

&21:42

&21:45

&21:48

&21:51

&21:54

&21:57

&22:00

&22:03

&22:06

&22:09

&22:12

&22:15

&22:18

&22:21

&22:24

&22:27

&22:30

&22:33

&22:36

&22:39

&22:42

&22:45

&22:48

&22:51

&22:54

&22:57

&23:00

&23:03

&23:06

&23:09

&23:12

&23:15

&

Last&Friday& This&Friday&

Break

Kick-off 1 - 0 1 - 1 1 - 2 1 - 3 1 - 4 1 - 5 End

Page 13: Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

“Juichpakken”

0.00%$

5.00%$

10.00%$

15.00%$

20.00%$

25.00%$

1$ 3$ 5$ 7$ 9$ 11$ 13$ 15$ 17$ 19$ 21$ 23$ 25$ 27$ 29$ 31$ 33$ 35$ 37$ 39$ 41$ 43$ 45$ 47$ 49$ 51$

Vraag%

Week%

roy$donders$

juichpak$

Page 14: Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

Exploiting trends

@schuilr 14

Page 15: Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

“Nieuw & populair”

@schuilr 15

•  “Nieuw & populair” = trending products

•  Pay-per-click advertising model

•  Advertisers bid for clicks, similar to Google Adwords

•  Metric to optimize: �Revenue Per Mille (RPM) = CTR * bid * 1,000

Page 16: Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

First (minimal) version

•  Find top 100 “trending” keywords using Spark•  Randomly pick one of those keywords•  Display top 4 results for the selected keyword

@schuilr 16

Page 17: Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

Can we do better?

•  CTR and bid varies per keyword. Random selection gives average performance.

•  Doesn’t consider the user’s personal preferences

@schuilr 17

Page 18: Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

GLOBAL OPTIMIZATION PART I

@schuilr 18

Page 19: Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

One armed bandit = slot machine�

Problem:�How to pick between slot machines �

so that you maximize profit?

@schuilr 19

Page 20: Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

Exploration – Exploitation

•  Explore (learn)"Try out different candidates to learn how they perform over time

•  Exploit (earn)"Take advantage of what you’ve learned to maximize payoff (your current best guess)

@schuilr 20

Page 21: Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

Many different approaches

•  Epsilon First•  Epsilon Greedy•  Upper Confidence Bound•  Thompson Sampling•  LinUCB

@schuilr 21

Page 22: Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

Epsilon First

Time

Random

Learn: collect data for each

candidate

( split testing, A/B testing )

Best

Earn: show the best

performer

@schuilr 22

Page 23: Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

Epsilon First •  Simple and intuitive•  Lots of tools available (VWO, Optimizely, …)�

•  Average reward until exploration is finished•  What if the best candidate is no longer the best?

@schuilr 23

Page 24: Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

Epsilon Greedy

Best (90%)

Time

Random (10%)

Continuous exploration

@schuilr 24

Page 25: Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

Epsilon Greedy •  Very simple to implement and surprisingly effective•  Can deal with nonstationary problems

•  How to determine the optimal value for ε?

@schuilr 25

Page 26: Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

Upper Confidence Bound Basic idea:•  Calculate mean and a measure of uncertainty

(variance) for each candidate•  Pick current best performer based on mean +

uncertainty bonus

@schuilr 26

Page 27: Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

Measuring uncertainty

Observed mean: 0.50

95% certain that true mean ≤ 0.76

Uncertainty bonus: 0.26

@schuilr 27

Page 28: Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

More data = less uncertainty

95% certain that true mean ≤ 0.63

Uncertainty bonus: 0.13

@schuilr 28

Page 29: Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

Mean + uncertainty bonus

Upper Confidence Bound

A

B

C

Es)matedreward

Pick “A”!

@schuilr 29

Page 30: Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

Upper Confidence Bound •  Selecting “A” reduces uncertainty•  Candidate “C” now has the highest score

A

B

C

Es)matedreward

Pick “C”!

@schuilr 30

Page 31: Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

Upper Confidence Bound

•  Uses variance measure to automatically balance exploration with exploitation�

•  Deterministic; requires online learning (not suited for small-batch mode)

@schuilr 31

Page 32: Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

Thompson Sampling Basic idea:•  The number of pulls for a given lever should match

its actual probability of being the optimal lever�

•  Sample from the posterior for the mean of each lever:�

p(λ|X) = Gamma(conv + prior_conv, impr + prior_impr)

@schuilr 32

Page 33: Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

Few conversions Candidate Conversions Impressions Chance of being

winner

A (3.9%) 11 282 42%

B (3.3%) 2 61 39%

C (2.8%) 4 143 19%

@schuilr 33

Page 34: Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

More conversions Candidate Conversions Impressions Chance of being

winner

A (3.9%) 93 2,382 82%

B (3.3%) 66 2,011 13%

C (2.8%) 31 1,093 5%

@schuilr 34

Page 35: Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

Many conversions Candidate Conversions Impressions Chance of being

winner

A (3.9%) 892 22,882 97%

B (3.3%) 174 5,261 2%

C (2.8%) 66 2,343 1%

@schuilr 35

Page 36: Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

Lots of conversions Candidate Conversions Impressions Chance of being

winner

A (3.9%) 5,621 144,132 > 99%

B (3.3%) 256 7,761 < 1%

C (2.8%) 101 3,593 < 1%

@schuilr 36

Page 37: Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

Thompson Sampling

•  Weighted random sampling•  Works well in small-batch mode�

•  Doesn’t consider context (e.g. user’s personal preferences)

@schuilr 37

Page 38: Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

PERSONALIZATION PART II

@schuilr 38

Page 39: Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

LinUCB Basic idea:•  Define a “context” of information of the user•  Fit a per-candidate logistic regression model•  Applies the concept of Upper Confidence Bound

(UCB)–  mean + uncertainty bonus

@schuilr 39

Page 40: Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

Context •  Gender•  Recently viewed categories•  Current date•  Weather forecast•  …

Principal Component Analysis (PCA) to reduce sparseness and computation complexity

@schuilr 40

Page 41: Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

LinUCB Mean + uncertainty bonus:

μα(t) + σα(t)

@schuilr 41

Page 42: Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

Pruning

•  Periodically remove weakest performers•  Replace with new, unexplored “trending keywords”•  Rinse and repeat

@schuilr 42

Page 43: Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

Results

@schuilr 43

Random Optimized

× 2.8!

Page 44: Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

Endless possibilities

•  News homepage•  Online advertising•  Deciding which thumbnail to show on the SERP•  Etc, etc ...

@schuilr 44

Page 45: Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

Reading List “Bandit Algorithms for Website Optimization”�http://bit.ly/bandits-book

“Reinforcement Learning”�http://bit.ly/rl-book

@schuilr 45

Page 46: Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

@SCHUILR"LINKEDIN.COM/IN/ROBINSCHUIL

Дякую

@schuilr 46

Page 47: Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

References •  https://en.wikipedia.org/wiki/Multi-armed_bandit•  http://shop.oreilly.com/product/0636920027393.do•  https://webdocs.cs.ualberta.ca/~sutton/book/the-book.html•  http://www.slideshare.net/chucheng/efficient-approximate-thompson-sampling-for-search-query-recommendation•  http://www.slideshare.net/iliasfl/multiarmed-bandits-intro-examples-and-tricks•  http://www.slideshare.net/mgershoff/conductrics-bandit-basicsemetrics1016•  http://www.slideshare.net/MarkusOjala1/multi-armed-bandits-and-optimized-online-marketing-54679491

@schuilr 47