Data Stewardship for Researchers Carly Strasser, PhD California Digital Library @carlystrasser [email protected]UC Riverside April 2013 From Calisphere, Couretsy of UC Riverside, California Museum of Photography Tips, Tools, & Guidance From Calisphere, Courtesy ofThousand Oaks Library
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
1. Planning 2. Data collection & organization 3. Quality control & assurance 4. Metadata 5. Workflows 6. Data stewardship & reuse
Best Practices for Data Management
Temperature data
Salinity data
Data import into R
Analysis: mean, SD
Graph production
Quality control & data cleaning “Clean” T
& S data
Summary statistics
Data in R format
5. Workflows
Workflow: how you get from the raw data to the final products of your research
Simple workflows: flow charts
• R, SAS, MATLAB • Well-‐documented code is…
Easier to review Easier to share Easier to repeat analysis
5. Workflows
Workflow: how you get from the raw data to the final products of your research
Simple workflows: commented scripts
# % $
&
Fancy Schmancy workflows: Kepler Resulting output
5. Workflows
https://kepler-‐project.org
Workflows enable
Reproducibility can someone independently validate findings?
Transparency
others can understand how you arrived at your results
Executability
others can re-‐run or re-‐use your analysis
5. Workflows
From Flickr by merlinprincesse
1. Planning 2. Data collection & organization 3. Quality control & assurance 4. Metadata 5. Workflows 6. Data stewardship & reuse
Best Practices for Data Management
Use stable formats csv, txt, tiff
Create back-‐up copies original, near, far
Periodically test ability to restore information
6. Data stewardship & reuse
Modified from R. Cook
Store your data in a repository
Institutional archive
Discipline/specialty archive
6. Data stewardship & reuse
From Flickr by torkildr
Allows readers to find data products
Get credit for data and publications
Promotes reproducibility
Better measure of research impact
Modified from R. Cook
6. Data stewardship & reuse
Practice Data Citation
Example: Sidlauskas, B. 2007. Data from: Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny: a case study from characiform fishes. Dryad Digital Repository. doi:10.5061/dryad.20
Learn more at www.datacite.org
From
Flickr by Globa
l X
Planning
A document that describes what you will do with your data during your research and after you complete the project
What is a data management plan?
From Flickr by Gavinzac
• Saves time • Increases efficiency • Easier to use data • Others can
understand & use data
• Credit for data products
• Funders require it
Why bother?
DMP supplement may include: 1. the types of data, samples, physical collections, software, curriculum
materials, and other materials to be produced in the course of the project
2. the standards to be used for data and metadata format and content (where existing standards are absent or deemed inadequate, this should be documented along with any proposed solutions or remedies)
3. policies for access and sharing including provisions for appropriate protection of privacy, confidentiality, security, intellectual property, or other rights or requirements
4. policies and provisions for re-‐use, re-‐distribution, and the production of derivatives
5. plans for archiving data, samples, and other research products, and for preservation of access to them
NSF DMP Requirements
From Grant Proposal Guidelines:
• Types of data • Existing data • How/when/where created?
• How processed?
• Quality control
• Security • Who is responsible
1. Types of data & other information
biology.kenyon.edu
C. Strasser
From Flickr by Lazurite
Wired.com
• Metadata needed • How captured • Standards
2. Data & metadata standards
• Obligation to share
• How/when/where available
• Getting access • Copyright / IP • Permission restrictions • Embargo periods • Ethics/privacy • How cited
3. Policies for access & sharing 4. Policies for re-‐use & re-‐distribution
• What & where
• Metadata
• Who’s responsible
5. Plans for archiving & preservation
From Flickr by theManWhoSurfedTooMuch
Don’t forget the budget
dorrvs.com
NSF’s Vision*
DMPs and their evaluation will grow & change over time
Peer review will determine next steps
Community-‐driven guidelines
Evaluation will vary with directorate, division, & program officer
*Unofficially
From
Flickr by dipster1
Toolbox
E-‐notebooks & online science
• NoteBook • ORNL eNote • Evernote • Google Docs • Blogs • wikis • TheLabNotebook.com • NoteBookMaker
TheLabNotebook.com!
Step-by-step wizard for generating DMP
Create | edit | re-use | share | save | generate
Open to community
dmptool.org
List of repositories: databib.org
Where should I put my data?
NSF funded DataNet Project Office of Cyberinfrastructure
www.dataone.org
B
C A
• Data Education Tutorials • Database of best practices &
software tools • Primer on data management • Investigator Toolkit
www.dataone.org
Intercept researchers where they already
work
dataup.cdlib.org
Open Source Tool Add-‐in & Web
Application
csv & xlsx
dataup.cdlib.org
Free
Features Best practices check Generate metadata Generate citation