Data Science at Flurry
Post on 13-May-2015
385 Views
Preview:
DESCRIPTION
Transcript
Data Science at Flurry
Soups Ranjan, PhDsoups@flurry.com
We all know that when we’re talking about mobile, we’re talking about apps
Source: Nielsen- State of the App Nation 2012 Report and June 2013 Cross Platform Report
Time spent on mobile devices
Flurry has the deepest insight into consumer behavior on mobile
Flurry Facebook Google Millenial Media twitter JumpTap0
200
400
600
800
1,000
1,200
1,400
1200
875
700
400320
100
Source: Data gathered from public statements/filings by Companies; Facebook denotes property and Network; Google Reach denotes sites and Network
Monthly Device Reach (Millions)
• Flurry Analytics– Track users, sessions, events and crashes
• Flurry AppCircle– Advertise with Flurry to acquire new users for your app
• Flurry AppSpot– Monetize your app traffic via ads
Flurry Product Overview
• AppCircle: Advertiser configuration to set an ad:– Ad type: CPI, CPC, CP Video – Corresponding Bid– Ad format: Banner or Interstitial– Targeting (Age, Gender, Device, Location, Persona)
• AppCircle Bidder:– Optimally Acquire Ad-Space inventory where ads can be
shown
AppCircle – Advertise to Acquire Users
Cost Model (Bid Price Estimation)
Bid Request (user, pub, exchange)
{Eligible Ads}
History of Bid, win-price
(Ad1, Bid1, P(win)1) … (Adn, Bidn, P(win)n)
Revenue model
(Ad1, AdvBid1, P(conv)1)…(Adn, AdvBidn, P(conv)n)
History of Ad Impressions, conversions
{Ads}
Budget Pacing
Advertiser Goals (α,β)
Ad, AdvBid, Daily Budget,
Spend
Ad Selector (Pick ad and its bid price)
Bid ad on Exchange
{Eligible Ads}
AppCircle Bidder Strategy
Bidder Ad Selection Model - I
Ad Selection Model:Select Ad(adv,pub,exchange,user) =
argmax (Pwinα (Revenue(adv,pub,exchange,user) – β Cost(adv,pub,exchange,user)))
• Maximize margin model (α = β = 1):Select Ad(adv,pub,exchange,user) =
argmax (Pwin (Revenue(adv,pub,exchange,user) – Cost(adv,pub,exchange,user)))
– May lead to lower advertiser fill rate, as we will then only bid to show an advertiser's ad when we are guaranteed to win at price lower than advertiser's bid
Ad Rev (ecpm)
Cost P(win) Rank
Adv1 1.50 1.30 0.30 0.3 * (1.5-1.3) = 0.06
Adv2 0.60 0.50 0.70 0.7 * (0.6-0.5) = 0.07
Bidder Ad Selection Model - II
• Maximize fill rate for advertiser (α =1, β = 0): Select Ad(adv,pub,exchange,user) =
argmax (Pwin Revenue(adv,pub,exchange,user))
– We select the ad that maximizes our revenue goals– however, we only bid if the revenue > cost
Ad Rev (ecpm)
Cost P(win) Rank
Adv1 1.50 1.30 0.30 0.3 * (1.5) = 0.45
Adv2 0.60 0.50 0.70 0.7 * (0.6) = 0.42
Ad Revenue Optimization problem: – Max: P(conv) * bid– Conversion Prediction Model: Max
P(conv)
Historical Estimation:
- Past conversion rate as a predictor for future conversion rates
ML Conversion Prediction Model: – Features: Publisher, Ad, User, Time, Location
AppCircle: Ad Revenue Optimization:
u1
User id
Con
v-pr
ob
Conv-prob for users who sawAd1 in Pub1’s app
Avg conv-prob
Bidder Cost Model
Cost model: We don’t know about other players in the auction Best we can do is to predict based off of our wins and losses
1) If historically we win on auctions for users in Kansas City => 2) Most likely, other bidders not interested in Kansas City users => 3) Next time, we’ll lower our bid for Kansas City users =>4) If we still win those Kansas City users, continue (1-3) =>5) If not, we will revise our bid back up
Machine Learnt model gives us both: Cost and P(win)
Multi-class Classification model (Logistic Regression) to predict win-price based on ad impression
Machine Learning based Bidder Cost Model
P(win) ~ 1.0 P(win) ~ 0.0
Win-price=28cWin-price=27c
Win-price=52 c
No Win
AppCircle Conversion Rates: Local Hour of Day
Regression weights for localHourOfDay
Local Hour of Day (0-23 hours)
Co
effi
cie
nt
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Conversion Probability by locaHourOfDay
local hour of day (0-23 hours)
12 noon
4 pm6 pm
7 pm
Machine Learning Workflow
• How much data is enough?• Parallelize Feature Generation vs. Model Generation• Interpretable vs. Black-box models• Batch vs. Online learning• Time to Score a Model• Unbalanced Data• Over-fitting & Regularization
Recommender System
Recommender System as an Ad-ranking method
Given users and apps they have installed in the past, what other apps are they likely to install?
Given users and their app usage (time-spent), what new apps are they likely to highly engage with?
1hr
1.6hr
1hr
0.6hr
1.5hr
1.2hr
2hr
2.1hr
2hr
3hr
0.3hr
0.1hr0.3hr
2hr 0.8hr
Recommender System
• Item-Item based Collaborative Filtering:– Missing value prediction
App1
App2
App3
App4
Engagement Model – Android All
• Category of SocialApp: Social• Number of users of SocialApp: 2,227• Number of predicted users of SocialApp: 1,131
SocialApp SocialApp
Engagement Model – Android All
• Category of SocialApp: Social• Number of users of SocialApp: 2,227• Number of predicted users of SocialApp: 1,131
SocialAppSocialApp
Other Flurry Data Science Problems
Age and Gender Estimation
Click Fraud Detection
Optimize AppSpot Waterfall
top related