Top Banner
Powering The Social Economy
49

Big data camp la futures so bright tim-shea

Nov 01, 2014

Download

Data & Analytics

Big Data Camp LA 2014, • The Future's so bright (You can barely make any predictions about it) by Timothy Shea of DataSift
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Big data camp la   futures so bright tim-shea

Powering The Social Economy

Page 2: Big data camp la   futures so bright tim-shea
Page 3: Big data camp la   futures so bright tim-shea

How do we Make Good Forecasts?

Page 4: Big data camp la   futures so bright tim-shea

The Architecture vs The Practice (aka: Form vs Function)

Platforms for Big Data storage, processing & analytics.

VS

Actual applications of Data-at-Scale

Page 5: Big data camp la   futures so bright tim-shea

Themes for This Morning

How DataSift Manages, Processes & Delivers

Data Visualization via Tableau

Causal Inference & Statistical Modeling

Movies & Coffee

Page 6: Big data camp la   futures so bright tim-shea

Who am I?

Tim Shea

@SheaNineSeven

Data Scientist & Sales Engineer at DataSift

Page 7: Big data camp la   futures so bright tim-shea

Focus on Alliances & Channels:

Tableau, Alteryx, Microstrategy, Informatica, SAP

Data Science as a Practice:

Disambiguation, Classification, Causality

Page 8: Big data camp la   futures so bright tim-shea

What is DataSift?

Social Data Platform Full “Firehose” Access 2 Billion Posts per Day ½ Trillion Posts Historical Archive

Page 9: Big data camp la   futures so bright tim-shea

Really Intense Architecture Diagram

Page 10: Big data camp la   futures so bright tim-shea

We Make it Simple for You Focus on Filtering Big Data < Relevant Data Enrichments: - Demographics - Links - Emotion & Intent - Learned Classification

Page 11: Big data camp la   futures so bright tim-shea

Demo

Page 12: Big data camp la   futures so bright tim-shea

DataSift: Beyond “Social Listening”

Ex. “Does Social have anything to do with my Business?”

Line Charts and Graphs

Vs

Operationalized Decision Making

Page 13: Big data camp la   futures so bright tim-shea
Page 14: Big data camp la   futures so bright tim-shea

“The Enterprise”

DataSift Enterprise customers are building:

1.  Demand Forecasting 2.  Critical Event Processing

3.  Market Segmentation/Statistical Classification 4.  Establishing Correlative Relationships(**)

Page 15: Big data camp la   futures so bright tim-shea

Causality

Page 16: Big data camp la   futures so bright tim-shea

Necessary…Connection?

Does Event A cause Event B?

Page 17: Big data camp la   futures so bright tim-shea

Fighting Crime…Fights Crime(?)

Page 18: Big data camp la   futures so bright tim-shea

Does The Past have anything at all to do with The Future?

Page 19: Big data camp la   futures so bright tim-shea

Defending Your Hypotheses

How can I create & defend my Hypotheses?

How do I communicate my findings to Laypeople (non-Data Scientists) like your Boss?

Page 20: Big data camp la   futures so bright tim-shea

Risk Management in Hollywood

Page 21: Big data camp la   futures so bright tim-shea

Movies

Through the Lens of:

DataSift - What we do as a Social Data Platform

Tableau - How to Make Sense of a Mountain of Data

Good Data & Good Tools

Page 22: Big data camp la   futures so bright tim-shea
Page 23: Big data camp la   futures so bright tim-shea
Page 24: Big data camp la   futures so bright tim-shea
Page 25: Big data camp la   futures so bright tim-shea

Risk Management is Hard

Q: What is a “Sure Bet”?

Q: Should I spend $100MM making this movie?

Q: How can I make this process less risky?

Page 26: Big data camp la   futures so bright tim-shea

Enter DataSift & Tableau

Page 27: Big data camp la   futures so bright tim-shea

Example

Return Every: Tweet

Facebook Post Instagram Photo

Bitly Click

For What? Every single Movie released in 2013

Page 28: Big data camp la   futures so bright tim-shea

Compare it With

Page 29: Big data camp la   futures so bright tim-shea

Tableau

Page 30: Big data camp la   futures so bright tim-shea

What Data do we Have?

Page 31: Big data camp la   futures so bright tim-shea

1. Intuition

Page 32: Big data camp la   futures so bright tim-shea
Page 33: Big data camp la   futures so bright tim-shea

2. Social => Box Correlation?

Page 34: Big data camp la   futures so bright tim-shea
Page 35: Big data camp la   futures so bright tim-shea

3. Prove It

Page 36: Big data camp la   futures so bright tim-shea
Page 37: Big data camp la   futures so bright tim-shea
Page 38: Big data camp la   futures so bright tim-shea

4. Defend the Model

Page 39: Big data camp la   futures so bright tim-shea

The Model

Y = a + bX

Y = Box Office (the predicted) X = Social Volume (the predictor)

B = Coefficient A = Some offset

Page 40: Big data camp la   futures so bright tim-shea

Defend the Model v1

P-value: There is an X% chance that the Null Hypothesis is true.

Null Hypothesis: The linear coefficient is equal to zero.

Page 41: Big data camp la   futures so bright tim-shea

Defend the Model v2

P-value (again): We can be (100 – X)% confident that the correlation were seeing can be explained by our model.

R-Squared: Our model explains about Y% of the variability (points

outside the regression line) given “Sum of Least Squared”

Page 42: Big data camp la   futures so bright tim-shea

Defend the Model v3

Every Bitly click predicts about $240 in Box Office Sales

I’m extremely confident (99%) that this is not due to chance.

With ~96% confidence we can rely on this model in the future.

Page 43: Big data camp la   futures so bright tim-shea

The Model (cont)

Y “is predicted by” a + bX

Box Office = 0 + $240 * (# bitly clicks) Box Office = 0 + $130 * (# tweets)

Page 44: Big data camp la   futures so bright tim-shea

Benchmarking

If my Bitly #’s drop below $240

If my Twitter #’s drop below $130

If my Instagram #’s drop below $2809

If my Facebook #’s drop below $3871

Page 45: Big data camp la   futures so bright tim-shea

Other Considerations

Page 46: Big data camp la   futures so bright tim-shea

Other Considerations

Residuals

Other Regression (Logarithmic, Exponential, Polynomial)

“Overfitting”

Page 47: Big data camp la   futures so bright tim-shea

Additional Dimensions DataSift Social Data:

Gender Income

Geography “Influence”

Industry vs Consumers

Page 48: Big data camp la   futures so bright tim-shea

Getting Started

[email protected]

@sheanineseven

http://bit.ly/DataSiftBigDataCamp

Page 49: Big data camp la   futures so bright tim-shea

Thanks for Listening!