Top Banner
Getting it the rightest you can Thomas Hargrove, Scripps News John Perry, Atlanta Journal - Constitution Janet Roberts, Reuters Jennifer LaFleur , Reveal | The Center for Investigative Reporting IRE 2015 CAR Conference, Atlanta
27

Getting it the rightest

Jul 14, 2015

Download

Automotive

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Getting it the rightest

Getting it the rightest

you can

Thomas Hargrove, Scripps NewsJohn Perry, Atlanta Journal-Constitution

Janet Roberts, ReutersJennifer LaFleur, Reveal | The Center for Investigative Reporting

IRE 2015 CAR Conference, Atlanta

Page 2: Getting it the rightest

Beware duplicates

Every time Saint Paul, Minn., housing inspectors made follow-up visits to check on violations, all of the data entries from the previous visit were logged again. So every violation was listed in the database multiple times.

Do integrity checks from your desk

Page 3: Getting it the rightest

Beware dates

Did 592,000 people in Ohio really vote before they registered?

Do integrity checks from your desk

Page 4: Getting it the rightest

Does it make sense?

“We select things for publication just to make available a wide scope of data to the public ... There is some burden on the public to decide whether or not to use the material.”

--Kathleen McGuire, Sourcebook of Criminal Justice Statistics

(a/k/a: The case of the disappearing lifers)

Do integrity checks from your desk

Page 5: Getting it the rightest

Do the data conform to the real world?

Are half of the records male, half female?

In a national data set, are about 13 percent of the records from California?

Are racial minorities adequately represented?

Do integrity checks from your desk

Page 6: Getting it the rightest

Check for patterns in missing data.

Do patterns render estimates inaccurate?

Do integrity checks from your desk

Page 7: Getting it the rightest

Think like a statistician

Do integrity checks from your desk

a/k/a: How George Will became the darling of statistics teachers

"In 1992-93, none of the five states with the highest teachers'

salaries were among the 15 states with the highest SAT scores.

And the 10 states with the lowest per pupil spending included

four . . . among the 10 states with the highest SAT scores."

--George Will, 1993

Page 8: Getting it the rightest

Statistical checks: From the simple to the sophisticated

Do integrity checks from your desk

R-squared = .82

ss2 = 43 + 0.95(ss1)

Descriptive statistics:

Frequency

Average

Mode

Page 9: Getting it the rightest

Beware the documentation

Do integrity checks with other sources

Yes, that’s Harold Spaeth’s view and mostly I think he’s right, though I’d substitute the word more “efficient” for more “accurate.”

--Lee Epstein

(Find a power user, and compare notes.)

Page 10: Getting it the rightest

What’s missing?

An estimated 30 percent of felony convictions are missing from the Minnesota public convictions file.

(ask the keepers of the data)

Do integrity checks with other sources

Page 11: Getting it the rightest

Check those codes

Do integrity checks with other sources

(a/k/a: The codes are not what they seem)

Data spanned six years. Sometime in those six years, the violation codes changed. No one in the Housing Violations Bureau knew when the switch was made, and no one had definitions for the previous codes.

(a/k/a: Why to pull some paper records)

Page 12: Getting it the rightest

Beware elements of change

Do integrity checks with other sources

The “feename” – name of the property owner –in the Saint Paul Housing Bureau’s code violations database is pulled in from property tax rolls. It shows the current owner. That person may not have owned the property at the time of the violation.

(a/k/a: Why to pull some paper records)

Page 13: Getting it the rightest

Summarize cases by institutions, then spot check results.

Do integrity checks with other sources

Is it true only 6 percent of hospital emergency cases are transferred from other hospitals?

Page 14: Getting it the rightest

Beware nulls!Technology bites

Null scariness from the FDA’s MAUDE database

http://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfmaude/search.cfm

Page 15: Getting it the rightest

Beware nulls!Technology bites

We want to explore reports involving Promus heart stents , but NOT the Promus Element devices.

First, let’s see what’s in there for Promus.

Page 16: Getting it the rightest

Beware nulls!Technology bites

There are 50 records that mention Promus. We can see by scrolling that four are the Promus Element that we wish to exclude.

Page 17: Getting it the rightest

Beware nulls!Technology bites

Let’s get rid of those Elements.

Page 18: Getting it the rightest

Beware nulls!Technology bites

50 – 4 = …..????

Page 19: Getting it the rightest

Beware nulls!Technology bites

You’re supposed to have 46 records, but you got 30. What are the missing 16 records?

Page 20: Getting it the rightest

Beware nulls!Technology bites

Right:

Wrong:

Page 21: Getting it the rightest

Beware false joins in "encrypted“ data.

Technology bites

Medicare 5 percent sample: Doctors IDs were encrypted in some files, not in others.

Page 22: Getting it the rightest

Don’t alter original data.

As you report and just before you publish

Make a copy of the original data file. Put it somewhere and don’t touch it again.

Don’t edit an original column or field. Make a copy and edit that.

Page 23: Getting it the rightest

Document as you go

As you report and just before you publish

Keep track of all of your queries so you can retrace your steps or find where you went wrong.

As you integrity check your data, annotate the queries to remember what you learned.

Page 24: Getting it the rightest

Cross check

As you report and just before you publish

If you summed data in SQL, can you reproduce the results in a pivot table?

If you’re summing, do a list. Make sure there‘s nothing wacky in that list that would cause your count to be wrong; e.g., duplicates.

If you have various data sources that should yield the same conclusions, do they?

Page 25: Getting it the rightest

Beware the single case

As you report and just before you publish

Never report on one data record without pulling the paper report or talking to the person in question.

What if it was a data entry error?

What if there are circumstances you don’t understand?

Page 26: Getting it the rightest

Recreate the wheel

As you report and just before you publish

For every fact, number, finding in your story, write an original query or formula to support it.

Go back to your original data.

Try to arrive at the same conclusion in different ways.

Page 27: Getting it the rightest

Fear is your friend