Top Banner
Florian Hartl [email protected] Large Scale CTR Prediction Lessons Learned
39

Large scale-ctr-prediction lessons-learned-florian-hartl

Apr 15, 2017

Download

Data & Analytics

PyData
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Large scale-ctr-prediction lessons-learned-florian-hartl

Florian [email protected]

Large Scale CTR PredictionLessons Learned

Page 2: Large scale-ctr-prediction lessons-learned-florian-hartl

Yelp’s MissionConnecting people with great

local businesses.

Page 3: Large scale-ctr-prediction lessons-learned-florian-hartl

92M 3272%108M

Yelp StatsAs of Q2 2016

Page 4: Large scale-ctr-prediction lessons-learned-florian-hartl

CTR Prediction

CTR: Click-Through RatepCTR: predicted CTR

QuestionHow likely is the user to click on the ad?

WhyProxy for relevance

5.5%

0.8%

9.2%

?

Page 5: Large scale-ctr-prediction lessons-learned-florian-hartl

Logistic Regression with

thousands of features,

trained and tested on

millions of samples.

Current pCTR Model

Kuvasz

Page 6: Large scale-ctr-prediction lessons-learned-florian-hartl

pCTR Model History

(CC) from Flickr: "Wednesday Freedom 11"by Parker Knight

(CC) from Flickr: "Icelandig sheepdog"by Thomas Quine(CC) from Flickr: by Craige Moore

FrenchBrittany

Icelandic Sheepdog

Jindo Kuvasz

Page 7: Large scale-ctr-prediction lessons-learned-florian-hartl

Lessons Learned(CC) from Flickr: "WEL" by luckyno3

Page 8: Large scale-ctr-prediction lessons-learned-florian-hartl

user feedbackservice

onlineoffline

data model

logs

Page 9: Large scale-ctr-prediction lessons-learned-florian-hartl

(CC) from Flickr: "The huge crossing" by Miroslav Petrasko

Infrastructure

(CC) from Flickr: "KOGI and WEL" by luckyno3

Page 10: Large scale-ctr-prediction lessons-learned-florian-hartl

user feedbackservice

onlineoffline

data model

logs

Page 11: Large scale-ctr-prediction lessons-learned-florian-hartl

user feedbackservice

logs

Log at source of online prediction→ Prevents downstream modifications of data

Logging

Page 12: Large scale-ctr-prediction lessons-learned-florian-hartl

user feedbackservice

onlineoffline

data model

logs

Page 13: Large scale-ctr-prediction lessons-learned-florian-hartl

data

logsprediction verification

Assert validity of logged data

Verification

model

Page 14: Large scale-ctr-prediction lessons-learned-florian-hartl

user feedbackservice

onlineoffline

data model

logsprediction verification

Page 15: Large scale-ctr-prediction lessons-learned-florian-hartl

data model

logsprediction verification

fastscalable

Make offline training iterations fast & scalable

Automation is key→ end-to-end pipeline→ automated visualizations

Tools: mrjob, Spark

Iterations

Page 16: Large scale-ctr-prediction lessons-learned-florian-hartl

Offline Training at Yelp

merge logs sampling feature extraction

model training evaluation

mrjobAWS EMR

daily scheduled pipelinekicked off manually

mrjobAWS EMR

Spark

mrjobAWS EMR

mrjobAWS EMR

mrjobAWS EMR

new features

(CC) from Flickr: "Cloud" by Jason Pratt

Page 17: Large scale-ctr-prediction lessons-learned-florian-hartl

Lessons Learned

InfrastructureLog at source of online predictionVerify predictionsMake offline iterations fast & scalable

Page 18: Large scale-ctr-prediction lessons-learned-florian-hartl

Model Comprehension

(CC) from Flickr: "Bella" by Maureen Lee

Page 19: Large scale-ctr-prediction lessons-learned-florian-hartl

user feedbackservice

onlineoffline

data model

logsprediction verification

fastscalable

Page 20: Large scale-ctr-prediction lessons-learned-florian-hartl

Focus on a single metric(but don't trust it blindly)

Evaluation

data model

prediction verification

evaluation

fastscalable

Page 21: Large scale-ctr-prediction lessons-learned-florian-hartl

Our Metric

Page 22: Large scale-ctr-prediction lessons-learned-florian-hartl

Focus on a single metric(but don't trust it blindly)

Create helpful visualizations

Tools: Zeppelin

Evaluation

data model

prediction verification

evaluation

fastscalable

Page 23: Large scale-ctr-prediction lessons-learned-florian-hartl

Visualizations...

feature 1feature 2feature 3

...

feature contribution

Feature contributionssd(feature) * coef

Feature value vs. CTR count

feature value

CTR

Page 24: Large scale-ctr-prediction lessons-learned-florian-hartl

user feedbackservice

onlineoffline

data model

logsprediction verification

evaluation

fastscalable

Page 25: Large scale-ctr-prediction lessons-learned-florian-hartl

logs

Beware of biased training data→ offline != online→ pCTR threshold

Thresholds

user feedbackservice

Page 26: Large scale-ctr-prediction lessons-learned-florian-hartl

pCTR Threshold

CTR pCTR

Model 1Good

CTR pCTR

Model 2Bad

CTR pCTR

Model 3Good

Page 27: Large scale-ctr-prediction lessons-learned-florian-hartl

pCTR Threshold

time

training data

Model 1 Model 2 Model 3 Model 4Idea:Frequent retraining

Better:Deliberate sampling of bad ads

CTR pCTR

Page 28: Large scale-ctr-prediction lessons-learned-florian-hartl

Online Evaluation

CTR pCTR

Model 1Good

CTR pCTR

Model 2Bad

CTR pCTR

Model 3Good

Page 29: Large scale-ctr-prediction lessons-learned-florian-hartl

Online Evaluation

CTR pCTR

Model 1Good

CTR pCTR

Model 2Bad

CTR pCTR

Model 3Good

Page 30: Large scale-ctr-prediction lessons-learned-florian-hartl

Combined Rescoring

new modelcurrent model

online

offline

Page 31: Large scale-ctr-prediction lessons-learned-florian-hartl

Combined Rescoring

new modelcurrent model

online

offline

evaluation

Page 32: Large scale-ctr-prediction lessons-learned-florian-hartl

Lessons Learned

InfrastructureLog at source of online predictionVerify predictionsMake offline iterations fast & scalable

Model ComprehensionEvaluate, evaluate, evaluateBe aware of threshold effects

Page 33: Large scale-ctr-prediction lessons-learned-florian-hartl

user feedbackservice

onlineoffline

data model

logsprediction verification

evaluation

fastscalable

Page 34: Large scale-ctr-prediction lessons-learned-florian-hartl

user feedbackservice

onlineoffline

data model

logsprediction verification

evaluation

fastscalable

simplicity

Page 35: Large scale-ctr-prediction lessons-learned-florian-hartl

simplicity

rule-based approach

simple models

Occam's razor

appropriate metric

documentation

"Simple Made Easy"

Page 36: Large scale-ctr-prediction lessons-learned-florian-hartl

user feedbackservice

onlineoffline

data model

logsprediction verification

evaluation

fastscalablewell documented

fastscalablewell documented

simplicity

Page 37: Large scale-ctr-prediction lessons-learned-florian-hartl

user feedbackservice

onlineoffline

data model

logsprediction verification

evaluation

fastscalablewell documented

fastscalablewell documented

simplicity

Page 38: Large scale-ctr-prediction lessons-learned-florian-hartl

Lessons Learned

Above all, keep it simple.

InfrastructureLog at source of online predictionVerify predictionsMake offline iterations fast & scalable

Model ComprehensionEvaluate, evaluate, evaluateBe aware of threshold effects

Page 39: Large scale-ctr-prediction lessons-learned-florian-hartl

@YelpEngineering

engineeringblog.yelp.com

github.com/yelp

yelp.com/careers