Cold, Dark, and Lonely An Archive Moves Online Bryan Beecher IT Director ICPSR
Nov 17, 2014
Cold, Dark, and LonelyAn Archive Moves OnlineBryan BeecherIT DirectorICPSR
What’s ICPSR?
• Inter-university Consortium for Political and Social Research
• Clients– Higher education– US Government– Our “hot, flat, and crowded” world
• In business since 1962
What do we do?
• Acquire, curate, and deliver social science data to researchers, students, policy-makers, etc.– JSTOR of data
• Cover many different fields– Political Science, Economics,
Sociology, Demography, Criminal Justice, and many more
What content do we curate?
• Primarily survey data– Also aggregate government data
(such as Census data)
• Tabular– Rows = respondents– Columns = variables
• SAS, SPSS, Stata, even Excel
ICPSR and OAIS
• Clients deposit data (ingest)• ICPSR normalizes content into plain
text data (ASCII, Unicode) and “setups” for stat pkgs + adds metadata (ingest + data mgmt)
• Preserves content (archival storage)• Makes it available to others (access)
Access
• Mechanisms have evolved over time• Tapes + USPS• FTP• Gopher• Web
Archival Storage
• Historically kept two copies on tape– Off-line, local (Ann Arbor, MI)
• Worked, but– Expensive– Cannot browse– Are the bits OK?
• “The Warehouse”…
But then in 2006…
• Created Chief Preservation Officer role– Nancy McGovern
• Assigned Archival Storage engineering and operations to the IT shop– Bryan Beecher
2006 - 2008
• Digital Preservation Management program begins
• Warehouse cleared, closed• Tapes read, checked, destroyed• 6TB of content over 600k unique files• Lots of files
– Not so “cold” and “dark” any more…
Fedora, Part 1
• Lots of files, not so much metadata• Always know the aggregate object
(“study” number)• Use simple Fedora Content Model
(the data “keepsake”) to store the content
• Small step from “files” to “objects”
Fedora, Part 2
• Would really like “smarter” objects– Strongly typed– Well defined relationships– Rich services
• Definitely possible, particularly for more modern content (post 2002)
• If only we had the time and money…
NSF EAGER grant
• EArly-concept Grants for Exploratory Research– Eighteen months for 1.5 people
• Deliverables– CMA for social science data and docs– Packaging tools to create FOXML– Nifty SDeps and SDefs
Thank you!
Bryan Beechertechaticpsr.blogspot.com