Powering The Social Economy
Nov 01, 2014
Powering The Social Economy
How do we Make Good Forecasts?
The Architecture vs The Practice (aka: Form vs Function)
Platforms for Big Data storage, processing & analytics.
VS
Actual applications of Data-at-Scale
Themes for This Morning
How DataSift Manages, Processes & Delivers
Data Visualization via Tableau
Causal Inference & Statistical Modeling
Movies & Coffee
Who am I?
Tim Shea
@SheaNineSeven
Data Scientist & Sales Engineer at DataSift
Focus on Alliances & Channels:
Tableau, Alteryx, Microstrategy, Informatica, SAP
Data Science as a Practice:
Disambiguation, Classification, Causality
What is DataSift?
Social Data Platform Full “Firehose” Access 2 Billion Posts per Day ½ Trillion Posts Historical Archive
Really Intense Architecture Diagram
We Make it Simple for You Focus on Filtering Big Data < Relevant Data Enrichments: - Demographics - Links - Emotion & Intent - Learned Classification
Demo
DataSift: Beyond “Social Listening”
Ex. “Does Social have anything to do with my Business?”
Line Charts and Graphs
Vs
Operationalized Decision Making
“The Enterprise”
DataSift Enterprise customers are building:
1. Demand Forecasting 2. Critical Event Processing
3. Market Segmentation/Statistical Classification 4. Establishing Correlative Relationships(**)
Causality
Necessary…Connection?
Does Event A cause Event B?
Fighting Crime…Fights Crime(?)
Does The Past have anything at all to do with The Future?
Defending Your Hypotheses
How can I create & defend my Hypotheses?
How do I communicate my findings to Laypeople (non-Data Scientists) like your Boss?
Risk Management in Hollywood
Movies
Through the Lens of:
DataSift - What we do as a Social Data Platform
Tableau - How to Make Sense of a Mountain of Data
Good Data & Good Tools
Risk Management is Hard
Q: What is a “Sure Bet”?
Q: Should I spend $100MM making this movie?
Q: How can I make this process less risky?
Enter DataSift & Tableau
Example
Return Every: Tweet
Facebook Post Instagram Photo
Bitly Click
For What? Every single Movie released in 2013
Compare it With
Tableau
What Data do we Have?
1. Intuition
2. Social => Box Correlation?
3. Prove It
4. Defend the Model
The Model
Y = a + bX
Y = Box Office (the predicted) X = Social Volume (the predictor)
B = Coefficient A = Some offset
Defend the Model v1
P-value: There is an X% chance that the Null Hypothesis is true.
Null Hypothesis: The linear coefficient is equal to zero.
Defend the Model v2
P-value (again): We can be (100 – X)% confident that the correlation were seeing can be explained by our model.
R-Squared: Our model explains about Y% of the variability (points
outside the regression line) given “Sum of Least Squared”
Defend the Model v3
Every Bitly click predicts about $240 in Box Office Sales
I’m extremely confident (99%) that this is not due to chance.
With ~96% confidence we can rely on this model in the future.
The Model (cont)
Y “is predicted by” a + bX
Box Office = 0 + $240 * (# bitly clicks) Box Office = 0 + $130 * (# tweets)
Benchmarking
If my Bitly #’s drop below $240
If my Twitter #’s drop below $130
If my Instagram #’s drop below $2809
If my Facebook #’s drop below $3871
Other Considerations
Other Considerations
Residuals
Other Regression (Logarithmic, Exponential, Polynomial)
“Overfitting”
Additional Dimensions DataSift Social Data:
Gender Income
Geography “Influence”
Industry vs Consumers
Thanks for Listening!