Top Banner
SHARING YOUR DATA Kathleen Fear October 2, 2014
27
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: How and Why to Share Your Data

SHARING YOUR DATAKathleen Fear

October 2, 2014

Page 2: How and Why to Share Your Data

What is the Data Services Center?• Numeric and statistical data services

• Finding and providing access to datasets• Planned: statistical consulting

• Spatial data services• Creating and acquiring GIS data

• Research data services

Page 3: How and Why to Share Your Data

WHY?

Page 4: How and Why to Share Your Data

Why?

Funders and publishers say you have to…

…and it’s a good thing to do for science…

…and for you.

Page 5: How and Why to Share Your Data

Open Data citation advantage• Papers that make data available are cited 9 – 69% more

(Dorch, 2012; Sears, 2011; Henneken and Accomazzi, 2011; Pienta et al., 2010; Piwowar et al., 2007)

• Why? (Piwowar and Vision, 2013)

• Data reuse• Credibility signaling• Increased visibility• Early view• Selection bias

Page 6: How and Why to Share Your Data

WHAT?

Page 7: How and Why to Share Your Data

You don’t have to share all your data

with anyone who wants it

“at no more than incremental cost and within a reasonable time” (NSF)

“indicate the criteria for deciding who can receive your data” (NIH)

“All data necessary to understand, assess, and extend the conclusions of the manuscript must be

available to any reader.” (Science)

Page 8: How and Why to Share Your Data

Consider granularity:• What would someone need to reproduce your results?

Data

Processed FinalRaw

Scripts, code libraries, etc.

Metadata

Page 9: How and Why to Share Your Data

Consider timing:• Before publication? At the time of publication?

• Consider restrictions, embargo, etc. for data that can’t be immediately shared freely• Check with UR Ventures if you have concerns about protecting

patent interests

• Staggered release: metadata, then data later

Page 10: How and Why to Share Your Data

Consider usability:• Could someone with comparable expertise look at your

data and understand how to use it?• Is it clear how different files relate to each other?• Are your variable names meaningful? File names descriptive?• Include README.txt file or codebook in top level of directory

• Are special tools or software needed to use your data?• Are your files in a proprietary format? Will future users be able to

open them?• Include the necessary tools, or make the data available in open

formats

Page 11: How and Why to Share Your Data

HOW?

Page 12: How and Why to Share Your Data

Why can’t I keep in on my computer?• Poor success rates for data sharing requests (Vines et al.,

2013; Savage and Vickers, 2009; Wicherts et al., 2006)

• The older the article, the harder to get the data (Vines et al,

2014):• Odds of a dataset being reported as extant decline by 17% per

year• Odds of finding a working email for first, last, or corresponding

author decline by 7% a year

Page 13: How and Why to Share Your Data

Why can’t I keep it on my computer?

“Sure I will send you those data, but it's like seven computers ago, and so please allow me some time to hunt

them down” (Wicherts and Bakker, 2012)

• Most refusals are not to protect ongoing work, but because (Vines et al., 2014):• The data are on a computer that got stolen…• The data are in my parents’ attic…• The data are definitely on one of these zip disks…

• …and it will take hours for me to get them, if I can get them at all.

Page 14: How and Why to Share Your Data

Set it and forget it: put your data in a repository

• Long-term commitment to data preservation

• Reuse tracking and usage statistics

• Permanent URL / DOI enables data citation

Page 15: How and Why to Share Your Data

Set it and forget it: put your data in a repository

1. Find a disciplinary repository or database• Repository directories: re3data.org; biosharing.org• Typically managed by specialists in the field

Page 16: How and Why to Share Your Data

Set it and forget it: put your data in a repository

1. Find a disciplinary repository or database• Repository directories: re3data.org; biosharing.org• Typically managed by specialists in the field

2. Use a general-purpose repository• UR Research: https://urresearch.rochester.edu/home.action

Page 17: How and Why to Share Your Data

• Library-hosted• 2GB soft limit• Backed up, secure• Free!

Page 18: How and Why to Share Your Data

Set it and forget it: put your data in a repository

1. Find a disciplinary repository or database• Repository directories: re3data.org; biosharing.org• Typically managed by specialists in the field

2. Use a general-purpose repository• UR Research: https://urresearch.rochester.edu/home.action• Dryad: http://datadryad.org

Page 19: How and Why to Share Your Data

What is Dryad?

Page 20: How and Why to Share Your Data

• Integration with journal submission processes (http://datadryad.org/pages/integratedJournals)

• Not free: $80/submission. But we provide vouchers!

Page 21: How and Why to Share Your Data

How to get a voucher• Proposal should include:

• A description of the project to which the data is related;• A description of the data to be archived, including the format(s)

and approximate total size. The RCL will fully fund datasets up to 10GB, with larger data considered on a case-by-case basis.

• Send proposal to [email protected]

Page 22: How and Why to Share Your Data

But my data’s bigger than that…• An upcoming option: REACTUR (Research data Archiving

and Curation at the University of Rochester)

• River Campus Libraries + CIRC = easy data sharing for large datasets

• $200 / TB / year

• Piloting now, hope to be available for all in Spring 2015

Page 23: How and Why to Share Your Data

Set it and forget it: put your data in a repository

1. Find a disciplinary repository or database• Repository directories: re3data.org; biosharing.org• Typically managed by specialists in the field

2. Use a general-purpose repository• UR Research: https://urresearch.rochester.edu/home.action• Dryad: http://datadryad.org • REACTUR

Page 24: How and Why to Share Your Data

RepositoryAmount of

data accepted

CostAbility to

restrict data?Publisher

integration?

UR Research Up to 2GB FreeYes, highly

customizableNo

FigShare

Up to 1GB private,

unlimited public

Free Yes Yes

Dryad Up to 10GB$80 per

submission up to 10GB

No Yes

REACTUR Unlimited$200 / TB /

yearYes Not yet

Page 25: How and Why to Share Your Data

A little help:• Call me! (Or email, or drop by.)

5-6882

Carlson 313E

[email protected]

• At URMC, contact:

Donna Berryman

5-6877

[email protected]

Linda Hasman

5-3399

[email protected]

Page 26: How and Why to Share Your Data

Data Workshops• 1st and 3rd Thursdays @ noon, Carlson Library Rm. 310

Fall 2014 Spring 2015

SeptemberWriting a successful data management plan January

R 101

Intro to R SpatialIntro to GIS I

OctoberSharing your data

FebruaryUsing the DMPTool

Intro to GIS II Georeferencing maps

November

Finding and using data from ICPSR March

Basic database design

Web mapping: Google Refine, Open LayersIntro to GIS III

DecemberData visualization

AprilTools for qualitative research

--- Mapping real-world data

Page 27: How and Why to Share Your Data

References• Dorch, B. (2012). On the Citation Advantage of linking to data. Retrieved from http://hprints.org/hprints-

00714715• Henneken, E. A., & Accomazzi, A. (2011). Linking to Data - Effect on Citation Rates in Astronomy.

arXiv:1111.3618 [astro-Ph]. Retrieved from http://arxiv.org/abs/1111.3618• Pienta, A. M., Alter, G. C., & Lyle, J. A. (2010). The Enduring Value of Social Science Research: The Use and

Reuse of Primary Research Data. Retrieved from http://deepblue.lib.umich.edu/handle/2027.42/78307• Piwowar, H. A., Day, R. S., & Fridsma, D. B. (2007). Sharing Detailed Research Data Is Associated with

Increased Citation Rate. PLoS ONE, 2(3). doi:10.1371/journal.pone.0000308• Piwowar, H. A., & Vision, T. J. (2013). Data reuse and the open data citation advantage. PeerJ, 1.

doi:10.7717/peerj.175• Sears, J. R. (2011). Data Sharing Effect on Article Citation Rate in Paleoceanography. AGU Fall Meeting

Abstracts, 53, 1628.• Savage, C. J., & Vickers, A. J. (2009). Empirical Study of Data Sharing by Authors Publishing in PLoS Journals.

PLoS ONE, 4(9), e7078. doi:10.1371/journal.pone.0007078• Vines, T. H., Albert, A. Y. K., Andrew, R. L., Debarre, F., Bock, D. G., Franklin, M. T., … Rennison, D. J. (2014).

The Availability of Research Data Declines Rapidly with Article Age. Current Biology, 24(1), 94–97. doi:10.1016/j.cub.2013.11.014

• Vines, T. H., Andrew, R. L., Bock, D. G., Franklin, M. T., Gilbert, K. J., Kane, N. C., … Yeaman, S. (2013). Mandated data archiving greatly improves access to research data. The FASEB Journal, 27(4), 1304–1308. doi:10.1096/fj.12-218164

• Wicherts, J. M., & Bakker, M. (2012). Publish (your data) or (let the data) perish! Why not publish your data too? Intelligence, 40(2), 73–76. doi:10.1016/j.intell.2012.01.004

• Wicherts, J. M., Borsboom, D., Kats, J., & Molenaar, D. (2006). The poor availability of psychological research data for reanalysis. The American Psychologist, 61(7), 726–728. doi:10.1037/0003-066X.61.7.726