ICPSR AT 50: Facilitating Research and Data Sharing Part III: Data Management IASSIST Vancouver, BC May 31, 2011
ICPSR AT 50:Facilitating Research
and Data Sharing
Part III: Data ManagementIASSIST Vancouver, BCMay 31, 2011
Data Management begins at 11:45
Data Management Agenda
• Data Management Plans
• Computing & Data Sharing in Secure Environments
• Managing Restricted Contracts
The Statement Heard Round the Research World:
• The National Science Foundation has released a new requirement for proposal submissions regarding the management of data generated using NSF support. Starting in January, 2011, all proposals must include a data management plan (DMP).
• The plan should be short, no more than two pages, and will be submitted as a supplementary document. The plan will need to address two main topics: – What data are generated by your research? – What is your plan for managing the data?
Data Management in Demand
ICPSR conducts webinars on data management plans:
• November 8, 2010: 134 attend
• January 12, 2011: 535 attend
• February 17, 2011: 71 attend
Guidelines for Download
ICPSR’s DMP Blog - FAQs
http://datamanagementplans.blogspot.com/
ICPSR’s DMP Statistics
• January 2011: 3,984 views• January – April 2011: 7,802 views• Where are they coming from?
– 5,527 Direct (bookmarked, etc.)– 3,370 from Google search– 878 from NSF
Improving Data Management
• Potential increase in demand for data management services as a result of grant/contract requirements
• Increase in demand for processing, analysis, and distribution of sensitive data
• Resulted in improvements focused on secure computing and data sharing environments at ICPSR
Three Angles of Security
• Secure Ingest• Secure Computing in the Cloud• Secure Online Application & Tracking
ICPSR Secure Data Services
We'd tell you more, but then we'd have to kill you.
Two services; one platform
Secure Data Environment
• Serves ICPSR staff• Protects against
accidental data leakage
• Uses firewalls, virtualized workstations to access content
• Keeps the bad guys out
Virtual Data Enclave
• Serves ICPSR users• Protects against
accidental data leakage
• Uses firewalls, virtualized workstations to access content
• Keeps the bad guys out
One technology platform to rule them all
Technology components
• Needed to stand up the services quickly and with little working capital for investment
• Selected a strategy of investing in storage, and "renting" access and security services• EMC NS 120 Network Attached Storage device• University of Michigan "desktop virtualization"
product, the Virtual Desktop Infrastructure (VDI) service
• University of Michigan "firewall virtualization" product, the Virtual Firewall service
EMC NAS
• Leverages existing infrastructure at ICPSR and experience with EMC products
• Two NAS units (NS 120 model)
o Private NAS - home to all secure data
o Semi-Private NAS - home to all other content, such as web site content, downloadable files, etc
• Each unit is attached to a different virtual network (VLAN); more on this later
Staff install EMC fiber-channel-attached storage
Virtual Desktop Infrastructure Service• University of Michigan service
o Information Technology Services is the providero Virtualization as a Service (VaaS)
• ICPSR was a pilot user
• Enables access to content on the Private NAS via virtualized environment
o Easier to updateo Easier to secureo Enables more secure remote access
• Uses the UMich Active Directory system for authentication, authorization, and accounting
• Priced comparably to Amazon's cloud (EC2)
Staff access secure data through the SDE
Network topology• Former network topology was flat; every device had a routable
IP address
• New topology is highly segmented; seven VLANs
• Physical systems - three VLANs
o Publico Semi-Publico Private
• Virtual systems - four VLANs
o SDEo VDEo Summer Program virtual labo Web site testing
Secure Data Environment
• Content enters via our Deposit System
• Content exits via one of two mechanisms
o turnover for content entering Archival Storage and/or Dissemination systems
o data airlock for other stuff
• Both exit points can be monitored, controlled, reviewed, audited, etc.
• Technology and strategic direction may be moving faster than culture
Staff react to new restrictions
Virtual Data Enclave• Not suitable for "enclave-only" data
• Highly suitable for data ordinarily shared via a restricted-use agreement
o Alternative to shipping out sensitive data on removable media and hoping that nothing goes wrong
• Does shift cost burden (virtual workstation, storage) and risk burden (data security) from data analyst to data provider
o Who pays?
o How?
I have used the ICPSR VDE, and it is fantastic.
Oz Noori - Detroit 1-8-7
This is a paid celebratory endorsement
Restricted Use Contracting System (RCS)
Purpose• Enables data processors (internal) to set up
contracts with restricted data with terms of use and contract behavior preferences
• Enables end-users to apply for restricted data online & track progress
• Enables ICPSR user support to manage contracts and track end-users
Overview of ICPSR’s RCS
Application Steps
50 Years of Research Data
• Data Exploration• Data Sharing• Data Management
Presenter Contact Information
• Peter Granda – [email protected]• Linda Detterman – [email protected] • Sanda Ionescu – [email protected]• Elizabeth Moss – [email protected]• Steve Burling – [email protected]
Enjoy Vancouver & IASSIST 2011!