Mind the Gap NICAR14 (holes in data)

Post on 22-Apr-2015

165 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

Transcript

Mind the GapHow holes in your data can

lead to stories

Thomas Hargrove, Scripps News Washington BureauJennifer LaFleur, Center for Investigative Reporting

NICAR Baltimore: 2 p.m. March 1, 2014 Salon DEF

• Never assume data are whole – check !!!• Simple techniques like sorting• Many of these we use to integrity check• Graphing over time• Matching to other data sets• Statistical tools

F

• Look for research already done on the topic• Find experts• Talk to reporters who have done similar stories• If possible, talk to records personnel who

assembled the data• Follow data to their source – usually people

F

• Finding stories in the holes– Agencies failure to report– Varying reporting rules across geography or agency– Government computer system failures– Find patterns among missing records– Find the reasons behind missing records

E

Dr. David IcoveResearcher, University of Tennessee

Retired member of FBI Behavioral Science Unit

How This Project Started

For many years, NFIRS reported only 5%of buildingfires are intentionally set in U.S.

The Impossible Variance of America’s Rate of Arson: 2006 to 2011•  

• Department State Fires Arson Rate•  • Indianapolis IN 1,207 0%• San Diego CA 1,022 0• New York City NY 18,988 1• Gwinnett CountyGA 1,678 2• Houston TX 7,740 2• Arlington TX 1,511 3• Chicago IL 5,075 4• Los Angeles City CA 7,975 10• Phoenix AZ 5,359 12• Memphis TN 5,331 16• Tulsa OK 3,076 22• Gary IN 424 28• Cleveland OH 5,742 28• Toledo OH 2,544 28• Saginaw MI 1,377 32• Dayton OH 1,930 33• Buffalo NY 1,606 33• Youngstown OH 2,125 36• Highland Park MI 748 45• North Las VegasNV 435 49

How Rare is Arson?

But They Should Have Reported:

“Arson is grossly under reported. The true rate, I believe, is 40% to 50% -- in that range.”

--Bill Degnan, President National

Association of State Fire Marshals.

“There isn’t a day that goes by that I don’t think: ‘Man, I was a monster.’ I’m just thankful

no one was hurt,”

--Kenneth AllenMuncie, Indiana

The Allen Conspiracy:46 people set 73 home and vehicle fires to

collect $3.8 million from insurance

Lessons Learned from 1 million fires:

• 54,860 fires at ‘unlucky’ buildings that, like Allen’s home, experienced multiple fires but none of which were reported as arson.

• 42,434 fires at buildings that experienced foreclosure, according to the national mortgage monitoring firm RealtyTrac.

• 3,561 fires that had multiple points of ignition, suggesting someone set several fires at once.

• 77,596 fires in unoccupied or vacant buildings.

What’s Next?• Collecting data on 4.8 million fires• Calculate geographic rates by merging

aggregated fire counts to Census Bureau tract data

• Correlate rates of suspicious fires to tracts with unusually high occurrences of fire

• Contact local fire/police authorities to determine if serial arson is suspected or should be investigated

Local gap-mining stories

Here’s FBI data you were never supposed to see

Truck accidents by year and agency

Sometimes you find piles

Sometimes you find piles

Statistical tools

• Time series correlation – are your ups and downs real?

• Project/predict data and compare to actual results. What causes differences?

• Population counts are pretty accurate. Use them to determine reporting rates

• Regression with dummy variables

Make sure the holes are real

EE000132 might actually be the same as EE-000-132

A word of caution

• Do spot checks to make sure what you found is real

• Run your findings by experts• If possible, engage government sources of

data early. They may not be the enemy.• Challenge your assumptions. Data are only a

clue, never an end results

Jennifer LaFleur jlafleur@cironline.org @j_la28Thomas Hargrove hargrovet@scripps.com

202-408-2703Arson Project syntax files:

https://www.dropbox.com/l/LPB7l3kpz7wxvGsHSdTOy9

A copy of this presentation will be at www.jenster.com/2014

Questions?

top related