This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Motivation: “… one of the major challenges of this scientific generation: how to develop the new methods, management structures and technologies to manage the diversity, size, and complexity of current and future data sets and data streams.”
Response: DataNet creates “a set of exemplar national and global data research infrastructure organizations” to address this challenge.
Sustainable Environment Actionable Data (SEAD) - DataNet
• SEAD Strategy― Serve scientists and
researchers in the “long tail” of science
― Leverage social media for discovery of data, interest, and expertise
― Move data curation upstream in the data life cycle of science
― Take advantage of existing domain and institutional infrastructures (Institutional Repositories, ICPSR) for long-term preservation
SEAD Partners - http://sead-data.net
SEAD TEAMS
Margaret Hedstrom-PI, Ann Zimmerman-Co-PI, Karen Woollams, George Alter (ICPSR), Bryan Beecher (ICPSR), Jude Yew
Beth Plale-Co-PI, Katy Börner, Robert H. McDonald, Robert Light, Kavitha Chandrasekar, Stacy Kowalczyk, Robert Ping
James Myers-Co-PI, Ram Prasanna Govind Krishnan, Lindsay Todd
Praveen Kumar-Co-PI, Md Aktaruzzaman, Terry McLaren (NCSA), Rob Kooper (NCSA), Luigi Marini (NCSA)
Michigan
Indiana
Rensselaear
Illinois
SEAD 18 month Pilot Phase• Domain Engagement:
– National Center for Earth Systems Dynamics (NCESD), Illinois River Basin Observatory
– Requirements, Use Cases, Prioritization of Data Types and Services• Active and Social Curation
– Pilot Active Content Repository, VIVO deployments– Exemplar services for Data Ingest, Discovery, Re-use, Curation
(Tupelo/Medici)• CI for Long-term Access (Virtual Archive)
– Data model, protocol design/development– Pilot Federated Repository infrastructure
• Education, Outreach, and Training– Post-doc mentoring– Web site, training materials, meetings, workshops, …
• Project Oversight– Management, reporting, committees– Business model development
Sustainability Science
7
Science
Technology
Economics
Poverty & Justice
Policy
Cooperation
Data challenges• Heterogeneity
of all kinds• Multiple scales• Multidisciplinar
y• Many small
datasets
The long tail of scientific research
• Small and derived data sets• Heterogeneous data• Multiple sources of data• Short-lived data with long-term
value• Value of data grows when
combined & integrated
SEAD notions of defined Data Phases
• Phases of data lifecycle acknowledge and accommodate the difference between public data and data still in work by a researcher.
• Research Data Phase: data set is research data collection, owned by individual and under their control. – Data need not be licensed at this time because it is not
ready for broader release – Data need not have permanent IDs because still work in
progress – Corresponds to first existence in Active Curation Repository
• Published Phase: Owner of research data collection determines that dataset is ready for publication– License terms set– Persistent ID – Made available as part of public profile in VIVO– Activated by user-controlled publish event
An Active Content Repository based on standard global IDs and semantic web technologies - to collect and integrate data, metadata, and provenance information from multiple sources.
Active and Social Curation Services supporting automated and interactive use of SEAD- leveraging standard web application/web service toolkits and virtual machine infrastructure
Active and Social Curation
Active Content
Repository
VIVO/Linked Data
Dissemination Packages
Wide-Area File System
UserContributor
Active and Social Curation Services supporting automated and interactive use of SEAD- leveraging standard web application/web service toolkits and virtual machine infrastructure
CI Technical Approach
SEAD CI Technical Approach
Appraisal and
Selection SEAD Trusted Digital Repository Federation (OAIS compliant)
License terms• Please cite as: McDonald, R.H. et. al. Building a Data Discovery Network
for Sustainability Science. 3rd International VIVO Conference, Miami, FL, 24 August 2012. Available from: [http://slidesha.re/Q9q8VW]
• Thanks to Margaret Hedstrom, who’s guided the team through the (really) lengthy review process and to Jim Myers, Beth Plale, Praveen Kumar, Terry McLaren, Luigi Marini, Kavitha Chandrasekar and others who provided content for this presentation.
• The concepts and software being leveraged in SEAD represent the work of a broad range of people over multiple years – their contributions have been critical to launching SEAD.
• This document is released under the Creative Commons Attribution 3.0 Unported license (http://creativecommons.org/licenses/by/3.0/). This license includes the following terms: You are free to share – to copy, distribute and transmit the work and to remix – to adapt the work under the following conditions: attribution – you must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). For any reuse or distribution, you must make clear to others the license terms of this work.