Rocket Fuel Big Data and Artificial Intelligence for Digital Advertising Abhijit Pol Marilson Campos Designing Data Pipelines July, 2013
Jun 27, 2015
Rocket FuelBig Data and Artificial Intelligence for Digital Advertising
Abhijit PolMarilson Campos
Designing Data Pipelines
July, 2013
What We Do?
Data Partners*
Optimize
Bid Request
Rocket Fuel Winning Ad
Ad Request
Ad Served to User
Page Request
Bid & Ad
Web Browser
Rocket Fuel Platform
Real-time BidderAutomated Decisions
Response Prediction
Model
Publishers
User Engagement Recorded
User Engages with Ad
Refresh learning
Campaign & User Data
Warehouse
Qualify Audience
Some Exchange Partners
AdExchange
Ads & Budget
How Big Is This Problem Each Day?
Trades on NASDAQ
Facebook Page Views
Searches on Google
Bid Requests Considered by Rocket Fuel
How Big Is This Problem Each Day?
Trades on NASDAQ
Facebook Page Views
Searches on Google
Bid Requests Considered by Rocket Fuel
~5 billion
10 million
30 billion
~20 billion
BIG DATA + AI
Advertising That Learns
Outline
•Architecture Evolution•Hurdles and Challenges Faced•Data Pipelines Best Practices
Architecture for Growth
•20 GB/month to 2 PB/month in 3 years•New and complex requirements•More consumers•Rapid growth
How We Started?
Architecture 2.0
Current Architecture
Outline
•Architecture Evolution•Hurdles and Challenges Faced•Data Pipelines Best Practices
Hurdles and Challenges Faced
•Exponential data growth and user queries•Network issues•Bots•Bad user queries
Outline
•Architecture Evolution•Hurdles and Challenges Faced•Data Pipelines Best Practices
Data Pipeline Design Best Practices
Job Design
ConsistencyJob Features
Avoid Re-work Golden Input
Shadow ClusterData Collection
Dashboard
Job Design / Consistency
• Idempotent
•Execution by different users
•Account for Execution Time
Job Execution Timeline
Job Features / Re-Work
•Smaller Jobs
•Record completion of steps
Recording completion times
Start
Is mark already there?
Step of workflow, job or script
Yes
No
Execute work for the step.
Create the mark
End
Collect other data (Optional)
Golden Input / Shadow Cluster
• Integration tests on realistic data sets.
•Safe environment to innovate.
Data Collection - Delivery time view
J
Data product
Workflow Workflow
Job
Job
Job Job
Job Job
Job
Job
JobJob
Job
Hive/Pig SSH Script
J J… J
J
Hive
J J J
Pig
…
Data collection : Data profiles view
Data product
Data set
Data set
= Data Set
= Transformation
Record Size & Type
Job Counts
Join success ratios Data Set Consistency
Data Collection Hierarchy
wk_external_events
wk_build_profile
user_profile
extract_fields
consolidate_metrics
load_into_data_centers
extract_features
compact_user_profile
Workflow/Job/Script StepData Product
Golden Input / Shadow Cluster
• Integration tests on realistic data sets.
•Safe environment to innovate.
Dashboard
• Delivery Time• Data Profile Ratios• Counters• Alarms
Thank you
www.rocketfuel.com