Top Banner
Click Prediction with adPredictor at Microsoft Advertising Joaquin Quiñonero Candela Thore Graepel Ralf Herbrich Thomas Borchert Microsoft Research & Microsoft adCenter
24

AdPredictor - Large Scale Bayesian Click-Through Rate Prediction in Microsoft's Bing Search Engine

Nov 26, 2015

Download

Documents

Rayssa Küllian

cx
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • Click Prediction with adPredictor

    at Microsoft Advertising

    Joaquin Quionero Candela Thore Graepel

    Ralf Herbrich

    Thomas Borchert

    Microsoft Research & Microsoft adCenter

  • Microsoft + Yahoo! = 1/3 US search market adPredictor predicts probability of click on ads for Microsoft Bing and Yahoo! search engines

  • flowers

  • $1.00

    $2.00

    $0.10

    * 10%

    * 4%

    * 50%

    =$0.10

    =$0.08

    =$0.05

    $0.80

    $1.25

    $0.05

    Efficient use of ad space

    Increased user satisfaction by better targeting

    Increased revenue by showing ads with high click-thru rate

    Importance of accurate probability estimates

    Over-simplified ranking function: this is not what is used in practice

  • Impression Level Predictions

    Sparse binary input features (many 10s of them)

    Some high cardinality (~100M), some low (

  • Sparse Linear Probit Regression

    1341201

    1570165

    2213187

    9215433

    Ad ID

    Exact Match

    Broad Match

    Match Type

    Position

    ML-1

    SB-1

    SB-2

    + pClick

  • Uncertainty: A Bayesian Treatment

    1341201

    1570165

    2213187

    9215433

    Ad ID

    Exact Match

    Broad Match

    Match Type

    Position

    ML-1

    SB-1

    SB-2

    p(pClick) +

  • A Linear Probit Model

    Notation = 1 if click is the vector of all weights = 1 if non-click is a sparse binary input vector

    Generalised linear model with weights vector :

    |, :=

    Inverse link function is the probit function:

    ; 0,1

    controls the steepness: it corresponds to the standard deviation of additive zero mean noise.

    9

  • 0

    click no click

    |, :=

    Observation Noise (Assume Known Noiseless Weights)

    10

    Think of as indicator variables that select weights: we will soon remove from the notation Example = = [1; 0; 0; 0; 1; 0; ; 0; 1]

  • Uncertainty About the Weights A Bayesian Treatment

    Factorizing Gaussian prior over the weights:

    = ; , 2

    =1

    Given (|,) the posterior is given by:

    |, = |,

    |, d

    Problem: This posterior cannot be represented compactly nor calculated in closed form

    11

  • Desiderata and Approximations

    We want

    The posterior to remain a factorized Gaussian

    Incremental online learning rather than batch

    This is how it is done

    Approximate inference with latent variables

    Single pass approximate (online) schedule

    12

  • Sum of posterior weights 0

    Predicting Average Probability of Click

    Now that our posterior over the weights is a factorizing Gaussian

    100%

    =

    =1

    2 + 2

    =1

  • Principled Exploration

    0

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    0% 20% 40% 60% 80% 100%

    p(p

    Clic

    k)

    pClick

    average: 25% (3 clicks out of 12 impressions)

    average: 30% (30 clicks out of 100 impressions)

  • Approximate Inference with Latent Variables

    Prior: = ; , 2

    Sum of active weights:

    , = =1

    Noisy version thereof: , = ; , 2

    The sign of determines click: , = sign

    15

    1

    1

  • Approximating () and ()

    -5 0 5 -5 0 5

    -5 0 5 -5 0 5 -5 0 5

    -5 0 5

    * =

    * =

    () () ()

    () () ()

  • Updating the Posterior

    No

    Clic

    k

    Clic

    k

    w1 w2 +

    s

    y Prediction Training/Update

  • Posterior Updates for the Click Event

  • The importance of joint updates

    0.01%

    0.10%

    1.00%

    10.00%

    100.00%

    0.01% 0.10% 1.00% 10.00% 100.00%

    Act

    ual

    CTR

    Predicted CTR

    adPredictor

    0.01%

    0.10%

    1.00%

    10.00%

    100.00%

    0.01% 0.10% 1.00% 10.00% 100.00%

    Act

    ual

    CTR

    Predicted CTR

    Naive Bayes

  • Calibration by Isotonic Regression

    0.01%

    0.10%

    1.00%

    10.00%

    100.00%

    0.01% 0.10% 1.00% 10.00% 100.00%

    Act

    ual

    CTR

    Predicted CTR

    Calibrated adPredictor

    0.01%

    0.10%

    1.00%

    10.00%

    100.00%

    0.01% 0.10% 1.00% 10.00% 100.00%

    Act

    ual

    CTR

    Predicted CTR

    Calibrated Naive Bayes

  • Calibration Cant Improve the ROC

    0.00%

    10.00%

    20.00%

    30.00%

    40.00%

    50.00%

    60.00%

    70.00%

    80.00%

    90.00%

    100.00%

    0.00% 20.00% 40.00% 60.00% 80.00% 100.00%

    Tru

    e P

    osi

    tive

    s

    False Positives

    Nave Bayes

    adPredictor

  • adPredictor Wrap Up

    Automatic learning rate

    Calibrated: 2% prediction means 2% clicks

    Use of very many features, even if correlated

    Modelling the uncertainty explicitly

    Natural exploration mode

  • Discussion (For Later)

    Sample selection bias and exploration Dynamics: forgetting with time Pruning uninformative weights Approximate parallel inference Hierarchical priors Input features the secret sauce Some of this is detailed in the ICML 2010 paper: Web-Scale Bayesian Click-Through Rate Prediction for Sponsored Search Advertising in Microsofts Bing Search Engine

    We are hiring! Please contact me if you are interested.

  • Thank you!

    [email protected]

    We are hiring! Please contact me if you are interested.