Top Banner
Moving Beyond Planning to Implementation: Open-Source Tools… Josh Young Ocean Sciences Meeting February 24, 2016
38

2016 Ocean Sciences Meeting tutorial

Apr 11, 2017

Download

Science

Josh Young
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 2016 Ocean Sciences Meeting tutorial

Moving Beyond Planning to Implementation: Open-Source Tools…

Josh YoungOcean Sciences Meeting

February 24, 2016

Page 2: 2016 Ocean Sciences Meeting tutorial

Who is Unidata?

Page 3: 2016 Ocean Sciences Meeting tutorial

Why at Ocean Sciences?

Page 4: 2016 Ocean Sciences Meeting tutorial

ScopeImagine a project:• that includes a well-thought out and

documented data management plan, • and robust implementation of that

plan through out the project and beyond. • This talk is not for that project; it is

for the rest of us.

Page 5: 2016 Ocean Sciences Meeting tutorial

So why do we care about data management?

• Internal reasons: do good research, write papers, get tenure, win more grants.

• External reasons: public access & reproducibility Risk of becoming dark data

(Heidorn, 2008)

Page 6: 2016 Ocean Sciences Meeting tutorial

Why care about external access?• Intangibles for an Investigator• Maybe someday I’ll benefit from someone else’s data• Maybe I’ll learn something through informal dialogue• Most science funding is from public resources and

should/could be considered a public trust resource• Peer pressure

• Tangibles for an Investigator• Increased efficiency• My funders require it.

Page 7: 2016 Ocean Sciences Meeting tutorial

So why do we care about data management?

• Internal reasons: do good research, write papers, get tenure, win more grants.

• External reasons: greater impact

Page 8: 2016 Ocean Sciences Meeting tutorial

Workflows Internal

Page 9: 2016 Ocean Sciences Meeting tutorial

Public-Access Workflows

Page 10: 2016 Ocean Sciences Meeting tutorial

What is the DMRC & do we really need another Data Plan Project?

• Probably not• The DMRC is not a Data Plan tool• Unidata community requested help

with implementation• Therefore, the DMRC is primarily a

curated list of tools for implementation

Page 11: 2016 Ocean Sciences Meeting tutorial

The DMRC

Page 12: 2016 Ocean Sciences Meeting tutorial

What the DMRC Offers• Highlights requirements from

funding agencies;• Points to Best Practices

developed by others in the Data Management space;

• Sorts available tools by best practice;

• Details available tools.

Page 13: 2016 Ocean Sciences Meeting tutorial

Requirements• Highlight data management

funding requirements from NASA, NOAA, NSF• These are the agencies that fund

our community so we try to stay up to date, but remember the agency posted information is always the authority

Page 14: 2016 Ocean Sciences Meeting tutorial

Activity Best Practices & Possible Tools

Activity column based on DataOne Best Practices

Page 17: 2016 Ocean Sciences Meeting tutorial

The DMRC Points to Tools

Page 18: 2016 Ocean Sciences Meeting tutorial

The DMRC Points to Tools

Page 19: 2016 Ocean Sciences Meeting tutorial

The DMRC Points to Tools

Page 20: 2016 Ocean Sciences Meeting tutorial

The DMRC Explains the LDM

Page 21: 2016 Ocean Sciences Meeting tutorial

The DMRC Explains the TDS

Page 22: 2016 Ocean Sciences Meeting tutorial

The DMRC Explains RAMADDA

Page 23: 2016 Ocean Sciences Meeting tutorial

What We Are Exploring• Dataverse by Harvard • Designed for sharing, archiving,

and citing data• Allows you to create a DOI• Allows you to store and make

data accessible in perpetuity

Page 24: 2016 Ocean Sciences Meeting tutorial

What We Are ExploringKnown Dataverse Characteristics:• Largest single file limited to 10GB• No limit to number of files• Users create their own Dataverse• Designate private or public• Open to data from all science disciplines• Does not corrupt at least some software

files (e.g. IDV bundles)• FREE

Page 25: 2016 Ocean Sciences Meeting tutorial

What We Are ExploringPossible Dataverse Contributions:• Description (providing DOIs)• Sharing (access for perpetuity) • Preservation (static copy for

perpetuity)• Cost (free) very suitable for projects

that might otherwise become long-tail data

Page 26: 2016 Ocean Sciences Meeting tutorial

Activity Best Practices & Possible Tools

Activity column based on DataOne Best Practices

Page 27: 2016 Ocean Sciences Meeting tutorial

Open Source Access to Code

Page 28: 2016 Ocean Sciences Meeting tutorial

We Welcome Your Resource Suggestions!

• Please visit: http://goo.gl/forms/Ngp4Xu9nGr

Page 29: 2016 Ocean Sciences Meeting tutorial

Example Workflow Implementation

• Radar and Lidar data from the University of Wyoming King Air

• Millersville University Plains Elevated Convection at Night (PECAN) data

• North Carolina State University WRF North Atlantic Model Outputs

?

Page 30: 2016 Ocean Sciences Meeting tutorial

Part of a larger effort: Agile Data Curation

• Means taking implementable steps to improve data management for external access.

• Philosophically, it attempts to apply lessons from agile software development to data management.

Page 31: 2016 Ocean Sciences Meeting tutorial

Agile Curation Principles, 2nd Generation

(J.Young, K.Benedict, & C. Lenhardt, AGU 2015 Fall Meeting)

1) Delivery, access, use and citation of research data are the primary measures of success.

2) Maximize the impact of research data through the continuous integration of curation activities

3) Support unanticipated needs for and uses of research data (and documentation) and develop flexible systems to capture new uses.

Page 32: 2016 Ocean Sciences Meeting tutorial

Agile Curation Principles, 2nd Generation

4) Make data open and accessible as early in the process as possible.

5) Encourage crowd-sourced / community feedback to improve and enhance the data. Provide basic metadata for data available early in the process even if the data are not finalized.

6) Identify key individuals in a research project that have the requisite motivation, knowledge, or ability to learn and get out of their way.

Page 33: 2016 Ocean Sciences Meeting tutorial

Agile Curation Principles, 2nd Generation continued

7) Data creators and data curators should work closely throughout the data life story to ensure the most efficient and streamlined process.

8) Identify the most effective method(s) for maintaining close communication between the data creators and curators involved and use them.

9) Target the steady delivery of incremental improvements to research data discovery, access and use that is consistent with a sustainable level of effort and available funding.

Page 34: 2016 Ocean Sciences Meeting tutorial

Agile Curation Principles, 2nd Generation continued

10) Start with the basics and only make systems more complex as needed, while maintaining a low bar to entry.

11)Continuous attention to technical excellence and good design enhances agility.

12)Continuously develop a community of data providers, curators and users that participate in the evolution of the research data systems.

Page 35: 2016 Ocean Sciences Meeting tutorial

We Welcome Your Stories• Please email: [email protected]

Page 36: 2016 Ocean Sciences Meeting tutorial

Balancing infrastructure development & scientific advancement to create sustainable, multidisciplinary solutions

M. Chan

• Advance science• Meet grand challenges• Leverage shared

cyberinfrastructure technology

NSF’s EarthCube

CyberInfrastructure

Science

RCNsBuildingBlocks

InteractiveActivities

End UserWorkshops EC

Committees

GOALS

Page 37: 2016 Ocean Sciences Meeting tutorial

Get Involved!Science

Committee

Technology & Architecture Committee

Liaison Team

LEADERSHIP

COUNCILOffice

Council of Data

Facilities

Engagement Team

• Talk to EarthCube Participants!

• Attend EarthCube Workshops!

• Join the mailing list at earthcube.org

• Apply for funding (EC Travel Grants, Distinguished Lecturers)

• Follow on twitter @earthcube

Page 38: 2016 Ocean Sciences Meeting tutorial

Unidata is one of the University

Corporation for Atmospheric Research (UCAR)'s Community Programs (UCP), and is funded

primarily by the National Science Foundation (Grant NSF-1344155).