Transcript

The Art of Acquiring Data

Hunting for Data

Data sources

● Public agencies (local, county, state, federal)● Data.gov sites● Social networking sites (often APIs)

● Nonprofits/industry experts● Academic institutions● Manually gathered

Unknown Unknowns

Not everything is on the web. I swear.

A universe of data never sees the light of day on the Web. How do you find it?● Seek ye the nerds● Interview gov employees● Academics, experts can shine light or

provide custom data they compiled

If agency officials won’t helpFollow the bread crumbs:● Gov forms● Public contracts (esp. for vendor software)● Software manuals● Don’t forget about those academics/experts!

Friendly FOIAs

● Negotiate data with officials● Craft targeted request● Send FOIA, if at all, as a formality

Not-so-friendly FOIAs

● Negotiate first (see Friendly FOIAs)● Know your rights

○ response deadlines○ legit exemptions

● Seek expert advice (CalAware, CFAC)● Follow through on requests

We have data! Let’s start writing!

Dimensions of Data

Identity

● What do the fields mean? (ask for a data dictionary)

● What are the data types in each column?● Missing data? Dupes? Absurd values? Other

mistakes?

Provenance

What is the origin story and chain of custody for your data?● Hand-keyed from gov forms?● “self reported” using web form?● Generated by automated system?● What data validations exist?● Data dump or output from reporting system?

Context

● What rules and regs surround the data?● How comprehensive is the data? ● Other overlapping data sets?● Other complementary data sets?

En Fin

● Data is lurking online and off.● More (data) bees with honey.● Don’t just get the data. Know the data.

Ping me.

Serdar Tumgoren@zstumgorenzstumgoren@gmail.comhttp://www.slideshare.net/serdartumgoren

top related