Top Banner
DATA MANAGEMENT PLANNING FEBRUARY 21, 2013 Lizzy Rolando, Research Data Librarian
42

Data Management Planning - 02/21/13

Oct 18, 2014

Download

Education

These are the slides from the Data Management Planning Class, taught 2/21/13 at the Georgia Tech Library.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data Management Planning - 02/21/13

DATA MANAGEMENT

PLANNING FEBRUARY 21, 2013

Lizzy Rolando, Research Data Librarian

Page 2: Data Management Planning - 02/21/13

Objectives 2

Understand the current climate around data

management and data sharing

Learn about the basic elements of a data

management plan

Explore some of the best practices for data

documentation, long-term preservation, and data

sharing

Work with the DMPTool to create a data

management plan

Page 3: Data Management Planning - 02/21/13

What is Data Management? 3

Page 4: Data Management Planning - 02/21/13
Page 5: Data Management Planning - 02/21/13

Why Data Management?

Good for You

5

Photo taken by the U.S. Army Research Development and Engineering Command

Page 6: Data Management Planning - 02/21/13

Why Data Management? 6

Good for Science

Image from http://xnat.org/

Page 7: Data Management Planning - 02/21/13

7

Van Noorden, R. (2011). Science publishing:

The trouble with retractions, Nature 478, 26-

28. doi:10.1038/478026a

Page 8: Data Management Planning - 02/21/13

Why Data Management? 8

Required by Funding Agencies

Page 9: Data Management Planning - 02/21/13

Funding Agency Requirements

Funding Agency Requirement

NSF* • Must include a 2-page DMP in proposal

• Materials collected during research should be shared

NIH • Papers must be submitted to PubMed

• Projects with over $500,000 funding must share data and include

Data Sharing Plan in proposal

USDA • National Institute of Food and Agriculture requires all data to be

submitted to public domain without restriction

NOAA • Soon requiring that all grants include a data sharing plan, which

must also be shared

• All data should be made visible, accessible and independently

understandable to users, within 2 years of end of grant

NASA • Data should be made freely and widely available.

• A data sharing plan and evidence of any past sharing activities

should be included as part of the technical proposal

CDC • All data should be released and/or shared as soon as feasible

9

Page 10: Data Management Planning - 02/21/13

Exciting News! 10

Beginning January 14, 2013, the Biographical

Sketch(es) for an NSF grant proposal will include

a section on “Products,” and no longer

“Publications.” This way, applicants can include not

just publications, but also datasets, software,

patents and copyrights.

Page 11: Data Management Planning - 02/21/13

Basic DMP Components

The NSF requires a 2-page data management plan with every grant proposal.

Data Description

Data and metadata standards

Data access and sharing policies

Data re-use and re-distribution

Data preservation and archiving Depending on the funding source and the directorate/division/program, data management plan requirements may differ.

11

Page 12: Data Management Planning - 02/21/13

Data Description 12

What kinds of data will you produce?

Numerical data, simulations, text sequences, etc.

Experimental, observational, simulation

Raw, derived

How will you acquire the data?

How will you process the data?

How much data will you collect?

Are you using any existing data?

What QA/QC procedures will you use?

Page 13: Data Management Planning - 02/21/13

Recommendations 13

A short description of your project helps to give

context to why you are collecting the data.

Survey existing data sources.

Can be a narrative paragraph, table, or list.

Keep all raw data separate from analyzed data,

and maintain versions of data during analysis.

Implement QA/QC procedures.

Ex. Two people independently record data

Ex. Tools to audit spreadsheets

Page 14: Data Management Planning - 02/21/13

Example (taken from Oceanography DMP)

14

The project will collect and analyze the following data:

Conductivity and temperature from moorings and shipboard CTD surveys

Horizontal currents from Lowered ADCP and moorings.

Horizontal currents from shipboard sonar

Fine and micro-scale velocity from the WHOI High Resolution Profiler

Fine and micro-scale temperature from fast-response thermistors (pods)

Page 15: Data Management Planning - 02/21/13

Data and Metadata Formats 15

What metadata will you create/include with data?

i.e. What does someone else need to know about your

data in order to reuse them?

Where will this be recorded? How? What format?

Will you use a community metadata standard?

Will you conform to community terminology?

Page 16: Data Management Planning - 02/21/13

Recommendations 16

Use metadata standards common in your discipline.

Include a “readme.txt” file that describes the who, what, where, when and why of the data, at a bare minimum.

Make sure you have recorded the information that you would need if you were trying to use someone else’s data.

Check with the data repository where you hope to store your data – sometimes they require a particular metadata standard.

Use files names that are understandable to humans.

Make sure you record units and have headers for rows and columns in your tables.

Notes about the data should be recorded alongside the data by the data collectors.

Thesauri

Page 17: Data Management Planning - 02/21/13

Example 17

From NEES (Network for Earthquake Engineering Simulation)

Page 18: Data Management Planning - 02/21/13

Example 18

From NCAR

(National Center

for Atmospheric

Research)

Page 19: Data Management Planning - 02/21/13

Example (from NASA SEAC4RS DMP) 19

Appendix A SEAC4RS data file naming convention:

dataID_locationID_YYYYMMDD_R#.extension

The only allowed characters are: a-z A-Z 0-9_.- (that is, upper case and lower case alphanumeric, underscore, period, and hyphen). Fields are described as follows:

dataID: an identifier of measured parameter/species, instrument, or model (e.g., O3; NxOy; and PTRMS). For DC3 and SEAC4RS data files, the PIs are required to use “DC3-” or “SEAC4RS-” as prefixes for their DataIDs, i.e., DC3-O3 and SEAC4RS-NxOy.

locationID: an identifier of airborne platform or ground station, e.g., GV, DC8. Specific locationIDs for each deployment will be provided on the data website.

R#: data revision number. For field data, revision number will start from letter “A”, e.g., RA, RB, … etc. Numerical values will be used for the preliminary and final data, e.g., R1, R2, R3 … etc.

Extension: “ict” for ICARTT files, “h4” for HDF 4 files and “h5” for HDF 5 files.

For example, the filename for the DC-8 Diode Laser Spectrometer H2O measurement made on June, 1, 2012 flight may be: DC3-DLH-H2O_DC8_20120601_RA.ict (for field data) or

DC3-DLH-H2O_DC8_20120601_R1.ict (for final data)

Appendix B Summary of ICARTT format metadata requirements (also required for HDF 5 files):

Platform and associated location data: Geographic location and altitude will be embedded as part of the data file or provided via a link to the archival location of the aircraft navigational data.

Data Source Contact Information: phone number, mailing information, and e-mail address shall be given for themeasurement Co-I and one alternate contact.

Data Information: Clear definition of measured quantities will be given in plain English, avoiding the use of undefined acronyms, along with reporting units and limitation of data use if applicable.

Measurement Description: A simple description of the measurement technique with reference to readme file and relevant journal publication.

Measurement Uncertainty: Overall uncertainty will need to be given as a minimum. Ideally, precision and accuracy will be provided explicitly. The confidence level associated with the reported uncertainties will also need to be specified for the reported uncertainties if it is applicable. The measurement uncertainty can be reported as constants for entire flights or as separate variables. Measurement uncertainty is required by the ICARTT data file format.

Data Quality Flags: definition of flag codes for missing data (not reported due to instrument malfunction or calibration) and detection limits.

Data Revision Comments: Provide sufficient discussion about the rationale for data revision. The discussions should focus on highlighting issues, solutions, assumptions, and impact.

Page 20: Data Management Planning - 02/21/13

Policies for Access and Sharing 20

Are your data sensitive, so access by others needs

to be restricted?

What license or publishing model will you use for

your data?

How will you make your data accessible to others?

What data will you make available and at what

stage of your research?

Do you have protocols, such as IRB, that you need to

comply with? If so, how will you do so?

Page 21: Data Management Planning - 02/21/13

Recommendations 21

Apply an open license to data that you will share.

Explain why you cannot share data, if that is the case.

For example, the data used in your research are proprietary.

Anonymize any sensitive data.

Use a repository that can mediate data sharing if data cannot be sufficiently anonymized

Comply with IRB restrictions.

That should be obvious, but we’ll say it anyways

Be aware of Georgia Tech Policy…

Page 22: Data Management Planning - 02/21/13

Example (from ICPSR) 22

“ICPSR will make the research data from this project available to the broader social

science research community.

Public-use data files: These files, in which direct and indirect identifiers have been

removed to minimize disclosure risk, may be accessed directly through the ICPSR

Web site. After agreeing to Terms of Use, users with an ICPSR MyData account

and an authorized IP address from a member institution may download the data,

and non-members may purchase the files.

Restricted-use data files: These files are distributed in those cases when removing

potentially identifying information would significantly impair the analytic

potential of the data. Users (and their institutions) must apply for these files,

create data security plans, and agree to other access controls.

Timeliness: The research data from this project will be supplied to ICPSR before

the end of the project so that any issues surrounding the usability of the data can

be resolved. Delayed dissemination may be possible. The Delayed Dissemination

Policy allows for data to be deposited but not disseminated for an agreed-upon

period of time (typically one year).”

Page 23: Data Management Planning - 02/21/13

Policies and Provisions for Re-use 23

Who do you expect will want to or can reuse your

data?

Should there be restrictions on who or how your

data can be reused?

How should others indicate that they have used your

data?

How long will your data be available to others for

reuse?

Does your institution have rules about data?

Page 24: Data Management Planning - 02/21/13

Recommendations 24

Imagine the broadest possible audience for your

data.

Place as few restrictions on your data as you can.

Link your published articles to the data underlying

those data.

Use a repository that can make your data available

far into the future.

Funding Agency Suggested Length of Time for Private Data Retention

NIH No later than the acceptance for publication of main findings from final data set

NOAA 2 years after data collection

NSF-Engineering Directorate 3 years after the end of the project or public release, whichever comes first

NSF-Earth Sciences Division 2 years after data collection

NSF-Ocean Sciences Division 2 years after data collection

Page 25: Data Management Planning - 02/21/13

Example (from USC) 25

“USC’s policy is to encourage, wherever appropriate, research data to be shared with the general public through internet access. This public access will be regulated by the university in order to protect privacy and confidentiality concerns, as well to respect any proprietary or intellectual property rights. Administrators will consult with the university’s legal office to address any concerns on a case-by-case basis, if necessary. Terms of use will include requirements of attribution along with disclaimers of liability in connection with any use or distribution of the research data, which may be conditioned under some circumstances.”

Page 26: Data Management Planning - 02/21/13

Archiving and Preservation 26

What formats for your data will you use? Are they

preservation friendly?

What repository or data archive can take your

data when you are finished?

How do they preserve/share your data?

What are their access policies?

Is any extra work needed to prepare data for the

repository?

Who will be responsible for final preservation?

Page 27: Data Management Planning - 02/21/13

Recommendations 27

Appraise your data, selecting those with long-term value, and document your choices.

Use preservation friendly digital formats.

Non-proprietary, commonly used

You may need to transform data into new format.

Find a repository that will take your data, and plan to comply with their policies early on.

Look into using SMARTech!

P.I.’s should ultimately be responsible for dealing with the final disposition of the data.

Page 28: Data Management Planning - 02/21/13

Example (from DataOne)

28

Short Term:

The data product will be updated monthly reflecting updates to the record, revisions due to

recalibration of standard gases, and identification and flagging of any errors. The date of the update

will be included in the data file and will be part of the data file name. Versions of the data product

that have been revised due to errors/updates (other than new data) will be retained in an archive

system. A revision history document will describe the revisions made. Daily and monthly backups of the

data files will be retained at the Keeling Group Lab (http://scrippsco2.ucsd.edu, accessed 05/2011),

at the Scripps Institution of Oceanography Computer Center, and at the Woods Hole Oceanographic

Institution’s Computer Center.

Long Term:

Our intent is that the long term high quality final data product generated by this project will be

available for use by the research and policy communities in perpetuity. The raw supporting data will be

available in perpetuity as well, for use by researchers to confirm the quality of the Mauna Loa Record.

The investigators have made arrangements for long term stewardship and curation at the Carbon

Dioxide Information and Analysis Center (CDIAC), Oak Ridge National Laboratory (see letter of

support). The standardized metadata record for the Mauna Loa CO2 data will be added to the

metadata record database at CDIAC, so that interested users can discover the Mauna Loa CO2 record

along with other related Earth science data. CDIAC has a standardized data product citation including

DOI, that indicates the version of the Mauna Loa Data Product and how to obtain a copy of that

product.

Page 29: Data Management Planning - 02/21/13

Never Fear! 29

Page 30: Data Management Planning - 02/21/13

DMPTool 30

Developed by a number of academic universities in

response to funding agency mandates

https://dmp.cdlib.org/

Page 31: Data Management Planning - 02/21/13

Step 1: Sign In 31

Choose Georgia Tech

Page 32: Data Management Planning - 02/21/13

Shibboleth… 32

Page 33: Data Management Planning - 02/21/13

Step 2: Create a Plan 33

Select a Funding Agency.

Email is sent to

Georgia Tech

Library.

Page 34: Data Management Planning - 02/21/13

Creating and Naming your Plan 34

Strongly Recommend

Naming Plan “[Insert

Proposal Title Here]

Data Management

Plan”.

Page 35: Data Management Planning - 02/21/13

Step 3: One Section at a Time 35

Sections are

different

depending on

funding

source.

Georgia Tech

and DataONE

have resources

available for

every section.

Enter your

answers here.

Page 36: Data Management Planning - 02/21/13

Some Sections Have Extra Advice 36

Georgia Tech

specific help

text

Page 37: Data Management Planning - 02/21/13

Almost There 37

You should

save after

every section,

but definitely

save at the

very end.

You’re so close

to the end!

Page 38: Data Management Planning - 02/21/13

Step 4: Export 38

Now that you have

the content, you can

export your plan.

Page 39: Data Management Planning - 02/21/13

Step 5: Share plan 39

Send your plan to the Research Data

Librarian (Me!) to look over your plan.

Have your colleagues look at your plan.

Do you know your grant officer?

Page 40: Data Management Planning - 02/21/13

Step 6: Finish and Start Research! 40

Add plan to proposal or distribute among

research team

Begin your newly funded research!