Top Banner
Numbersense: Clearing the Fog of Big Data Kaiser Fung INFORMS NYC Luncheon 9/18/2013 Monday, September 23, 2013
40

Numbersense: Clearing the Fog of Big Data

Feb 04, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Numbersense: Clearing the Fog of Big Data

Numbersense: Clearing the Fog

of Big Data

Kaiser FungINFORMS NYC Luncheon

9/18/2013

Monday, September 23, 2013

Page 2: Numbersense: Clearing the Fog of Big Data

Big Data studies

! Observational data

! Co-opted

! Seemingly exhaustive N

! Fused data

! No controls

Monday, September 23, 2013

Page 3: Numbersense: Clearing the Fog of Big Data

Flight Delay: the data

! U.S. domestic commercial flights

! 1987 to 2008

! 123 million records

! 29 variables

Wicklin (2009)Monday, September 23, 2013

Page 4: Numbersense: Clearing the Fog of Big Data

Which airline had a lower delay rate?

Monday, September 23, 2013

Page 5: Numbersense: Clearing the Fog of Big Data

Which airline had a lower delay rate?

Monday, September 23, 2013

Page 6: Numbersense: Clearing the Fog of Big Data

Ask the “right” question

Monday, September 23, 2013

Page 7: Numbersense: Clearing the Fog of Big Data

Alaska’s On-time Performance

10.9%Alaska is the industry leader in on-time flights

Monday, September 23, 2013

Page 8: Numbersense: Clearing the Fog of Big Data

Flight Delay: the data

! U.S. domestic commercial flights

! 1987 to 2008

! 123 million records

! 29 variables

Wicklin (2009)

In which variable(s) does Simpson’s paradox lurk?

A priori?

Monday, September 23, 2013

Page 9: Numbersense: Clearing the Fog of Big Data

!"#$%&'(#%)#')*#%+(#%)#(,'(&-$.%&'(#%+$+*/0#0%&'(#%12-34*/5%6"#(#%+(#%&'(#%6"#'(-#05%&'(#%)'-$60%',%7-#85%&'(#%3'&)*#9-6/5%&'(#%3'$:*-360%+$;%&'(#%3'$,20-'$<%="#(#%-0%*#00%3*+(-6/5%*#00%3'$0#$020%+$;%*#00%3'$:-;#$3#<

Monday, September 23, 2013

Page 10: Numbersense: Clearing the Fog of Big Data

Big Data: Producers

user interactions

web logs:distributedservers

dataware house

datamarts

displays

Excel

Dashboards

Data Cubes

Models

Forecasts

Monday, September 23, 2013

Page 11: Numbersense: Clearing the Fog of Big Data

user interactions

displays

Excel

Dashboards

Data Cubes

Models

Forecasts

web logs... datamarts

Strategies

Tactics

Plans

Big Data: Consumers

Monday, September 23, 2013

Page 12: Numbersense: Clearing the Fog of Big Data

Moneyball

Monday, September 23, 2013

Page 13: Numbersense: Clearing the Fog of Big Data

Monday, September 23, 2013

Page 14: Numbersense: Clearing the Fog of Big Data

Statistics !=

Math

David S. Moore, The Basic Practice of Statistics, ~2007Monday, September 23, 2013

Page 15: Numbersense: Clearing the Fog of Big Data

The Obesity Epidemic

Monday, September 23, 2013

Page 16: Numbersense: Clearing the Fog of Big Data

The Obesity Epidemic

Monday, September 23, 2013

Page 17: Numbersense: Clearing the Fog of Big Data

Quetelet’s Index (1830)

Monday, September 23, 2013

Page 18: Numbersense: Clearing the Fog of Big Data

BMI Critics (2000-)

Monday, September 23, 2013

Page 19: Numbersense: Clearing the Fog of Big Data

BMI Critics (2000-)

Monday, September 23, 2013

Page 20: Numbersense: Clearing the Fog of Big Data

BMI Critics (2000-)

Monday, September 23, 2013

Page 21: Numbersense: Clearing the Fog of Big Data

Taking eyes off the ball

>?*6"'2."%@A?%-0%+%;-(#36%&#+02(#&#$6%',%,+6%+$;%+%B#66#(%&#+02(#%',%+;-)'0-6/%6"+$%CDE5%-6%-0%$'6%+%;-0#+0#%3'((#*+6#<F

Monday, September 23, 2013

Page 22: Numbersense: Clearing the Fog of Big Data

Taking eyes off the ball

>?*6"'2."%@A?%-0%+%;-(#36%&#+02(#&#$6%',%,+6%+$;%+%B#66#(%&#+02(#%',%+;-)'0-6/%6"+$%CDE5%-6%-0%$'6%+%;-0#+0#%3'((#*+6#<F

Monday, September 23, 2013

Page 23: Numbersense: Clearing the Fog of Big Data

The more things change

35 % 39 %

1 % 25 %

74%

26%BMI

DXA

Obese

Obese

Not Obese

Not Obese

DXA Totals

BMI Totals

All Pa!ents

100%

26 % 48 %

0 % 26 %

36 % 64 % 26 % 74 %

74%

26%

DXA

Obese

Obese

Not Obese

Not Obese

DXA Totals

BMI Totals

Female Pa!ents

BMI

100%

Monday, September 23, 2013

Page 24: Numbersense: Clearing the Fog of Big Data

n-U-isance

0

50

100

150

200

250

300

350

15 20 25 30 35 40

Mortality Ra!o

Body Mass Index

over-weight

obese extremely obese

Monday, September 23, 2013

Page 25: Numbersense: Clearing the Fog of Big Data

Reinstall Windows

Monday, September 23, 2013

Page 26: Numbersense: Clearing the Fog of Big Data

Trust, not Truth

Monday, September 23, 2013

Page 27: Numbersense: Clearing the Fog of Big Data

A team of psychologists performed personality tests on 100 professionals, of which 30 were engineers and 70 were lawyers. Here is a brief description of one of the subjects:

Jack is a 45-year-old man. He is married and has four children. He is generally conservative, careful, and ambitious. He shows no interest in political or social issues and spends most of his free time on his many hobbies, which include

home carpentry, sailing, and mathematics.

Kahneman and Tversky (1974)

Embarrassment of Riches

Monday, September 23, 2013

Page 28: Numbersense: Clearing the Fog of Big Data

What is the probability that Jack is one of the 30 engineers?

A. 10 - 40 %

B. 41 - 60 %

C. 61 - 80 %

D. 81 - 100 %

Kahneman and Tversky (1974); Vanity FairMonday, September 23, 2013

Page 29: Numbersense: Clearing the Fog of Big Data

The Law of Small Numbers is even more relevant in the era of Big Data

Monday, September 23, 2013

Page 30: Numbersense: Clearing the Fog of Big Data

Target knows your daughter is pregnant

... before you do

Monday, September 23, 2013

Page 31: Numbersense: Clearing the Fog of Big Data

Customer Acquisition

“Right around the birth of a child... parents are exhausted and overwhelmed and their shopping patterns and brand loyalties are up for grabs.”

Monday, September 23, 2013

Page 32: Numbersense: Clearing the Fog of Big Data

Customer Acquisition

“We knew that if we could identify them in their second trimester, there’s a good chance we could capture them for years.”

Monday, September 23, 2013

Page 33: Numbersense: Clearing the Fog of Big Data

Brochure Design

“We started mixing in all these ads for things we knew pregnant women would never buy, so the baby ads looked random.”

Monday, September 23, 2013

Page 34: Numbersense: Clearing the Fog of Big Data

Brochure Design

“We’d put an ad for wineglasses next to infant clothes. That way, it looked like all the products were chosen by chance.”

Monday, September 23, 2013

Page 35: Numbersense: Clearing the Fog of Big Data

Brochure Design

“As long as a pregnant woman thinks she hasn’t been spied on, she’ll use the coupons.”

Monday, September 23, 2013

Page 36: Numbersense: Clearing the Fog of Big Data

The Model

Buy baby products

soon3

4

21

Ward made 4 of 25 related purchases

Coco Bu!er Lo"on XXL Purse

Zinc/Magnesium Supplement

Bright Blue Rug

Pregnancy Score = 87%

Monday, September 23, 2013

Page 37: Numbersense: Clearing the Fog of Big Data

Mad Dad

Model Says

14%

10 90

4%

6%

76%

20

80

Pregnant Not

Pregnant

Not

Reality

100

False posi!ve rate:

False nega!ve rate:

Posi!ve predic!ve value:

Incidence:

1490

410

= 16%

= 40%

620

= 30%

10100

= 10%

3x

Monday, September 23, 2013

Page 38: Numbersense: Clearing the Fog of Big Data

Sending Mixed Messages

Model Says

14%

10 90

4%

6%

76%

20

80

Pregnant Not

Pregnant

Not

Reality

100

False posi!ve rate:

False nega!ve rate:

Posi!ve predic!ve value:

Incidence:

1490

410

= 16%

= 40%

620

= 30%

10100

= 10%

3x

Monday, September 23, 2013

Page 39: Numbersense: Clearing the Fog of Big Data

Intuition+Data!!Data+TheoryTrust, ! Truth!!Humans+DataLaw of small numbers

Monday, September 23, 2013

Page 40: Numbersense: Clearing the Fog of Big Data

Thank you!Twitter: @junkcharts!Gmail: JunkCharts

Monday, September 23, 2013