Top Banner
Mind the Gap How holes in your data can lead to stories Thomas Hargrove, Scripps News Washington Bureau Jennifer LaFleur, Center for Investigative Reporting NICAR Baltimore: 2 p.m. March 1, 2014 Salon DEF
35

Mind the Gap NICAR14 (holes in data)

Apr 22, 2015

Download

Documents

 
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Mind the Gap NICAR14 (holes in data)

Mind the GapHow holes in your data can

lead to stories

Thomas Hargrove, Scripps News Washington BureauJennifer LaFleur, Center for Investigative Reporting

NICAR Baltimore: 2 p.m. March 1, 2014 Salon DEF

Page 2: Mind the Gap NICAR14 (holes in data)

• Never assume data are whole – check !!!• Simple techniques like sorting• Many of these we use to integrity check• Graphing over time• Matching to other data sets• Statistical tools

F

Page 3: Mind the Gap NICAR14 (holes in data)

• Look for research already done on the topic• Find experts• Talk to reporters who have done similar stories• If possible, talk to records personnel who

assembled the data• Follow data to their source – usually people

F

Page 4: Mind the Gap NICAR14 (holes in data)

• Finding stories in the holes– Agencies failure to report– Varying reporting rules across geography or agency– Government computer system failures– Find patterns among missing records– Find the reasons behind missing records

E

Page 5: Mind the Gap NICAR14 (holes in data)
Page 6: Mind the Gap NICAR14 (holes in data)

Dr. David IcoveResearcher, University of Tennessee

Retired member of FBI Behavioral Science Unit

How This Project Started

Page 7: Mind the Gap NICAR14 (holes in data)

For many years, NFIRS reported only 5%of buildingfires are intentionally set in U.S.

Page 8: Mind the Gap NICAR14 (holes in data)
Page 9: Mind the Gap NICAR14 (holes in data)

The Impossible Variance of America’s Rate of Arson: 2006 to 2011•  

• Department State Fires Arson Rate•  • Indianapolis IN 1,207 0%• San Diego CA 1,022 0• New York City NY 18,988 1• Gwinnett CountyGA 1,678 2• Houston TX 7,740 2• Arlington TX 1,511 3• Chicago IL 5,075 4• Los Angeles City CA 7,975 10• Phoenix AZ 5,359 12• Memphis TN 5,331 16• Tulsa OK 3,076 22• Gary IN 424 28• Cleveland OH 5,742 28• Toledo OH 2,544 28• Saginaw MI 1,377 32• Dayton OH 1,930 33• Buffalo NY 1,606 33• Youngstown OH 2,125 36• Highland Park MI 748 45• North Las VegasNV 435 49

Page 10: Mind the Gap NICAR14 (holes in data)

How Rare is Arson?

Page 11: Mind the Gap NICAR14 (holes in data)

But They Should Have Reported:

Page 12: Mind the Gap NICAR14 (holes in data)

“Arson is grossly under reported. The true rate, I believe, is 40% to 50% -- in that range.”

--Bill Degnan, President National

Association of State Fire Marshals.

Page 13: Mind the Gap NICAR14 (holes in data)

“There isn’t a day that goes by that I don’t think: ‘Man, I was a monster.’ I’m just thankful

no one was hurt,”

--Kenneth AllenMuncie, Indiana

Page 14: Mind the Gap NICAR14 (holes in data)

The Allen Conspiracy:46 people set 73 home and vehicle fires to

collect $3.8 million from insurance

Page 15: Mind the Gap NICAR14 (holes in data)

Lessons Learned from 1 million fires:

• 54,860 fires at ‘unlucky’ buildings that, like Allen’s home, experienced multiple fires but none of which were reported as arson.

• 42,434 fires at buildings that experienced foreclosure, according to the national mortgage monitoring firm RealtyTrac.

• 3,561 fires that had multiple points of ignition, suggesting someone set several fires at once.

• 77,596 fires in unoccupied or vacant buildings.

Page 16: Mind the Gap NICAR14 (holes in data)

What’s Next?• Collecting data on 4.8 million fires• Calculate geographic rates by merging

aggregated fire counts to Census Bureau tract data

• Correlate rates of suspicious fires to tracts with unusually high occurrences of fire

• Contact local fire/police authorities to determine if serial arson is suspected or should be investigated

Page 17: Mind the Gap NICAR14 (holes in data)
Page 18: Mind the Gap NICAR14 (holes in data)

Local gap-mining stories

Page 19: Mind the Gap NICAR14 (holes in data)
Page 20: Mind the Gap NICAR14 (holes in data)
Page 21: Mind the Gap NICAR14 (holes in data)
Page 22: Mind the Gap NICAR14 (holes in data)
Page 23: Mind the Gap NICAR14 (holes in data)
Page 24: Mind the Gap NICAR14 (holes in data)

Here’s FBI data you were never supposed to see

Page 25: Mind the Gap NICAR14 (holes in data)
Page 26: Mind the Gap NICAR14 (holes in data)
Page 27: Mind the Gap NICAR14 (holes in data)

Truck accidents by year and agency

Page 28: Mind the Gap NICAR14 (holes in data)

Sometimes you find piles

Page 29: Mind the Gap NICAR14 (holes in data)

Sometimes you find piles

Page 30: Mind the Gap NICAR14 (holes in data)
Page 31: Mind the Gap NICAR14 (holes in data)

Statistical tools

• Time series correlation – are your ups and downs real?

• Project/predict data and compare to actual results. What causes differences?

• Population counts are pretty accurate. Use them to determine reporting rates

• Regression with dummy variables

Page 32: Mind the Gap NICAR14 (holes in data)
Page 33: Mind the Gap NICAR14 (holes in data)

Make sure the holes are real

EE000132 might actually be the same as EE-000-132

Page 34: Mind the Gap NICAR14 (holes in data)

A word of caution

• Do spot checks to make sure what you found is real

• Run your findings by experts• If possible, engage government sources of

data early. They may not be the enemy.• Challenge your assumptions. Data are only a

clue, never an end results

Page 35: Mind the Gap NICAR14 (holes in data)

Jennifer LaFleur [email protected] @j_la28Thomas Hargrove [email protected]

202-408-2703Arson Project syntax files:

https://www.dropbox.com/l/LPB7l3kpz7wxvGsHSdTOy9

A copy of this presentation will be at www.jenster.com/2014

Questions?