Copyright © 2015 Criteo Large-Scale Real-Time Product Recommendation at Criteo Simon Dollé RecSys FR, December 1 st , 2015
Jan 17, 2016
Copyright © 2015 Criteo
Large-Scale Real-Time Product Recommendation at Criteo
Simon Dollé
RecSys FR, December 1st, 2015
Copyright © 2015 Criteo
Copyright © 2015 Criteo
We buy
Ad spaces
Copyright © 2015 Criteo
We buy
Ad spaces
We sell
Clicks
Copyright © 2015 Criteo
We buy
Ad spaces
We sell
Clicksthat convert
Copyright © 2015 Criteo
We buy
Ad spaces
We sell
Clicksthat converta lot
Copyright © 2015 Criteo
We buy
Ad spaces
We sell
Clicksthat converta lot
We take the risk
10 000 displays
10 000 displays
leads to
50 clicks
10 000 displays
leads to
50 clicks
leads to
1 sale
3 billion ads/day3 billion products
10ms to pick relevant products
7 data centers15 000 servers
1200-node hadoop cluster
Catalog data3B+ products
Browsing history2B events / day
Catalog data3B+ products
Ad display data20B events / day
Browsing history2B events / day
Catalog data3B+ products
Copyright © 2015 Criteo
How do we do it ?
Copyright © 2015 Criteo
Recommend products for a user
• What we want: reco(user) = products
• 1B users x 3B products !• But we need to scale and keep it fresh
• What we can do :
Pre-select products offline
Refine scoring online to get final candidates
Bob saw orange shoes
Bob saw orange shoes
Some candidate products
Historical
Bob saw orange shoes
Some candidate products
Historical
Most viewed
Bob saw orange shoes
Some candidate products
Historical
Most viewed
Bob saw orange shoes
Some candidate products
Historical
Similar
Most viewed
Bob saw orange shoes
Some candidate products
Historical
Similar
Most viewed
Bob saw orange shoes
Some candidate products
Historical
Similar
Complementary
Most viewed
Recommendation Service20K qps
HADOOPBrowsing
history
Recommendation Service
50B
20K qps
Preselection computation Map-Reduce jobs
HADOOPBrowsing
history
Preselections
Recommendation Service
50B
12h
20K qps
Preselection computation Map-Reduce jobs
500M
Copyright © 2015 Criteo
Online: sources
Similarities Most viewed Most bought
Copyright © 2015 Criteo
Online: merge of products
Similarities Most viewed Most bought
Copyright © 2015 Criteo
ML model
• Logistic regression models because : • They scale• They are fast• They can handle lots of features
Product-specific User-specific User-product interactions Display-specific
HADOOPBrowsing
history
Recommendation Service
50B
12h
20K qps
Preselection computation Map-Reduce jobs
500M
Preselections
HADOOPBrowsing
history
Prediction models
Recommendation Service
50B
12h
6h
20K qps
Preselection computation Map-Reduce jobs
500M
Preselections
HADOOPBrowsing
history
Prediction models
Recommendation Service
50B
12h
6h
20K qps
Display, Click, Sale logs
Preselection computation Map-Reduce jobs
500M
Preselections
HADOOPBrowsing
history
Prediction models
Recommendation Service
50B
12h
6h
20K qps
Display, Click, Sale logs
Preselection computation Map-Reduce jobs
500M
Preselections
Copyright © 2015 Criteo
Online: scoring
Similarities Most viewed Most bought
0,02 0,12 0,06 0,18 0,03 0,05 0,01 0,005 0,011 0,013 0,004 0,007
Copyright © 2015 Criteo
Online: scoring
Similarities Most viewed Most bought
0,18 0,12 0,06 0,05 0,03 0,02 0,013 0,011 0,01 0,007 0,005 0,004
Copyright © 2015 Criteo
Online: candidates
0,18 0,12 0,06 0,05 0,03 0,02 0,013 0,011 0,01 0,007 0,005 0,004
SHOP SHOP SHOP SHOP
-50%
Copyright © 2015 Criteo
What’s next ?
Copyright © 2015 Criteo
What’s next for us: Upcoming challenges
• Long(er)-term user profiles
Copyright © 2015 Criteo
What’s next for us: Upcoming challenges
• Long(er)-term user profiles
• More and better product information (images, semantic, NLP)
Copyright © 2015 Criteo
What’s next for us: Upcoming challenges
• Long(er)-term user profiles
• More and better product information (images, semantic, NLP)
• Instant-update of similarities
Copyright © 2015 Criteo
What’s next for us: Upcoming challenges
• Long(er)-term user profiles
• More and better product information (images, semantic, NLP)
• Instant-update of similarities
• Joint product scoring • (score full banner and not products independently)
Copyright © 2015 Criteo
What’s next for you: Fancy a try?
On your own:
With us !
http://labs.criteo.com/jobs/
• We published datasets for click prediction• 4GB display-click data: Kaggle challenge in 2014 http://bit.ly/1vgw2XC• 1TB Display-Click data (industry’s largest dataset): http://bit.ly/1PyH4Vq
• 4 billion of observations• 156 billion feature-value• available on Microsoft Azure• used by edX (UC Berkeley)
Copyright © 2015 Criteo
Copyright © 2015 Criteo
Questions?
Copyright © 2015 Criteo
Thank you [email protected]
@simondolle@recsysfr
Credits: Creative Stall, Gilbert Bages