Mind the Gap How holes in your data can lead to stories Thomas Hargrove, Scripps News Washington Bureau Jennifer LaFleur, Center for Investigative Reporting NICAR Baltimore: 2 p.m. March 1, 2014 Salon DEF
Mind the GapHow holes in your data can
lead to stories
Thomas Hargrove, Scripps News Washington BureauJennifer LaFleur, Center for Investigative Reporting
NICAR Baltimore: 2 p.m. March 1, 2014 Salon DEF
• Never assume data are whole – check !!!• Simple techniques like sorting• Many of these we use to integrity check• Graphing over time• Matching to other data sets• Statistical tools
F
• Look for research already done on the topic• Find experts• Talk to reporters who have done similar stories• If possible, talk to records personnel who
assembled the data• Follow data to their source – usually people
F
• Finding stories in the holes– Agencies failure to report– Varying reporting rules across geography or agency– Government computer system failures– Find patterns among missing records– Find the reasons behind missing records
E
Dr. David IcoveResearcher, University of Tennessee
Retired member of FBI Behavioral Science Unit
How This Project Started
For many years, NFIRS reported only 5%of buildingfires are intentionally set in U.S.
The Impossible Variance of America’s Rate of Arson: 2006 to 2011•
• Department State Fires Arson Rate• • Indianapolis IN 1,207 0%• San Diego CA 1,022 0• New York City NY 18,988 1• Gwinnett CountyGA 1,678 2• Houston TX 7,740 2• Arlington TX 1,511 3• Chicago IL 5,075 4• Los Angeles City CA 7,975 10• Phoenix AZ 5,359 12• Memphis TN 5,331 16• Tulsa OK 3,076 22• Gary IN 424 28• Cleveland OH 5,742 28• Toledo OH 2,544 28• Saginaw MI 1,377 32• Dayton OH 1,930 33• Buffalo NY 1,606 33• Youngstown OH 2,125 36• Highland Park MI 748 45• North Las VegasNV 435 49
How Rare is Arson?
But They Should Have Reported:
“Arson is grossly under reported. The true rate, I believe, is 40% to 50% -- in that range.”
--Bill Degnan, President National
Association of State Fire Marshals.
“There isn’t a day that goes by that I don’t think: ‘Man, I was a monster.’ I’m just thankful
no one was hurt,”
--Kenneth AllenMuncie, Indiana
The Allen Conspiracy:46 people set 73 home and vehicle fires to
collect $3.8 million from insurance
Lessons Learned from 1 million fires:
• 54,860 fires at ‘unlucky’ buildings that, like Allen’s home, experienced multiple fires but none of which were reported as arson.
• 42,434 fires at buildings that experienced foreclosure, according to the national mortgage monitoring firm RealtyTrac.
• 3,561 fires that had multiple points of ignition, suggesting someone set several fires at once.
• 77,596 fires in unoccupied or vacant buildings.
What’s Next?• Collecting data on 4.8 million fires• Calculate geographic rates by merging
aggregated fire counts to Census Bureau tract data
• Correlate rates of suspicious fires to tracts with unusually high occurrences of fire
• Contact local fire/police authorities to determine if serial arson is suspected or should be investigated
Local gap-mining stories
Here’s FBI data you were never supposed to see
Truck accidents by year and agency
Sometimes you find piles
Sometimes you find piles
Statistical tools
• Time series correlation – are your ups and downs real?
• Project/predict data and compare to actual results. What causes differences?
• Population counts are pretty accurate. Use them to determine reporting rates
• Regression with dummy variables
Make sure the holes are real
EE000132 might actually be the same as EE-000-132
A word of caution
• Do spot checks to make sure what you found is real
• Run your findings by experts• If possible, engage government sources of
data early. They may not be the enemy.• Challenge your assumptions. Data are only a
clue, never an end results
Jennifer LaFleur [email protected] @j_la28Thomas Hargrove [email protected]
202-408-2703Arson Project syntax files:
https://www.dropbox.com/l/LPB7l3kpz7wxvGsHSdTOy9
A copy of this presentation will be at www.jenster.com/2014
Questions?