SHARING YOUR DATA Kathleen Fear October 2, 2014
Aug 19, 2015
What is the Data Services Center?• Numeric and statistical data services
• Finding and providing access to datasets• Planned: statistical consulting
• Spatial data services• Creating and acquiring GIS data
• Research data services
Why?
Funders and publishers say you have to…
…and it’s a good thing to do for science…
…and for you.
Open Data citation advantage• Papers that make data available are cited 9 – 69% more
(Dorch, 2012; Sears, 2011; Henneken and Accomazzi, 2011; Pienta et al., 2010; Piwowar et al., 2007)
• Why? (Piwowar and Vision, 2013)
• Data reuse• Credibility signaling• Increased visibility• Early view• Selection bias
You don’t have to share all your data
with anyone who wants it
“at no more than incremental cost and within a reasonable time” (NSF)
“indicate the criteria for deciding who can receive your data” (NIH)
“All data necessary to understand, assess, and extend the conclusions of the manuscript must be
available to any reader.” (Science)
Consider granularity:• What would someone need to reproduce your results?
Data
Processed FinalRaw
Scripts, code libraries, etc.
Metadata
Consider timing:• Before publication? At the time of publication?
• Consider restrictions, embargo, etc. for data that can’t be immediately shared freely• Check with UR Ventures if you have concerns about protecting
patent interests
• Staggered release: metadata, then data later
Consider usability:• Could someone with comparable expertise look at your
data and understand how to use it?• Is it clear how different files relate to each other?• Are your variable names meaningful? File names descriptive?• Include README.txt file or codebook in top level of directory
• Are special tools or software needed to use your data?• Are your files in a proprietary format? Will future users be able to
open them?• Include the necessary tools, or make the data available in open
formats
Why can’t I keep in on my computer?• Poor success rates for data sharing requests (Vines et al.,
2013; Savage and Vickers, 2009; Wicherts et al., 2006)
• The older the article, the harder to get the data (Vines et al,
2014):• Odds of a dataset being reported as extant decline by 17% per
year• Odds of finding a working email for first, last, or corresponding
author decline by 7% a year
Why can’t I keep it on my computer?
“Sure I will send you those data, but it's like seven computers ago, and so please allow me some time to hunt
them down” (Wicherts and Bakker, 2012)
• Most refusals are not to protect ongoing work, but because (Vines et al., 2014):• The data are on a computer that got stolen…• The data are in my parents’ attic…• The data are definitely on one of these zip disks…
• …and it will take hours for me to get them, if I can get them at all.
Set it and forget it: put your data in a repository
• Long-term commitment to data preservation
• Reuse tracking and usage statistics
• Permanent URL / DOI enables data citation
Set it and forget it: put your data in a repository
1. Find a disciplinary repository or database• Repository directories: re3data.org; biosharing.org• Typically managed by specialists in the field
Set it and forget it: put your data in a repository
1. Find a disciplinary repository or database• Repository directories: re3data.org; biosharing.org• Typically managed by specialists in the field
2. Use a general-purpose repository• UR Research: https://urresearch.rochester.edu/home.action
Set it and forget it: put your data in a repository
1. Find a disciplinary repository or database• Repository directories: re3data.org; biosharing.org• Typically managed by specialists in the field
2. Use a general-purpose repository• UR Research: https://urresearch.rochester.edu/home.action• Dryad: http://datadryad.org
• Integration with journal submission processes (http://datadryad.org/pages/integratedJournals)
• Not free: $80/submission. But we provide vouchers!
How to get a voucher• Proposal should include:
• A description of the project to which the data is related;• A description of the data to be archived, including the format(s)
and approximate total size. The RCL will fully fund datasets up to 10GB, with larger data considered on a case-by-case basis.
• Send proposal to [email protected]
But my data’s bigger than that…• An upcoming option: REACTUR (Research data Archiving
and Curation at the University of Rochester)
• River Campus Libraries + CIRC = easy data sharing for large datasets
• $200 / TB / year
• Piloting now, hope to be available for all in Spring 2015
Set it and forget it: put your data in a repository
1. Find a disciplinary repository or database• Repository directories: re3data.org; biosharing.org• Typically managed by specialists in the field
2. Use a general-purpose repository• UR Research: https://urresearch.rochester.edu/home.action• Dryad: http://datadryad.org • REACTUR
RepositoryAmount of
data accepted
CostAbility to
restrict data?Publisher
integration?
UR Research Up to 2GB FreeYes, highly
customizableNo
FigShare
Up to 1GB private,
unlimited public
Free Yes Yes
Dryad Up to 10GB$80 per
submission up to 10GB
No Yes
REACTUR Unlimited$200 / TB /
yearYes Not yet
A little help:• Call me! (Or email, or drop by.)
5-6882
Carlson 313E
• At URMC, contact:
Donna Berryman
5-6877
Linda Hasman
5-3399
Data Workshops• 1st and 3rd Thursdays @ noon, Carlson Library Rm. 310
Fall 2014 Spring 2015
SeptemberWriting a successful data management plan January
R 101
Intro to R SpatialIntro to GIS I
OctoberSharing your data
FebruaryUsing the DMPTool
Intro to GIS II Georeferencing maps
November
Finding and using data from ICPSR March
Basic database design
Web mapping: Google Refine, Open LayersIntro to GIS III
DecemberData visualization
AprilTools for qualitative research
--- Mapping real-world data
References• Dorch, B. (2012). On the Citation Advantage of linking to data. Retrieved from http://hprints.org/hprints-
00714715• Henneken, E. A., & Accomazzi, A. (2011). Linking to Data - Effect on Citation Rates in Astronomy.
arXiv:1111.3618 [astro-Ph]. Retrieved from http://arxiv.org/abs/1111.3618• Pienta, A. M., Alter, G. C., & Lyle, J. A. (2010). The Enduring Value of Social Science Research: The Use and
Reuse of Primary Research Data. Retrieved from http://deepblue.lib.umich.edu/handle/2027.42/78307• Piwowar, H. A., Day, R. S., & Fridsma, D. B. (2007). Sharing Detailed Research Data Is Associated with
Increased Citation Rate. PLoS ONE, 2(3). doi:10.1371/journal.pone.0000308• Piwowar, H. A., & Vision, T. J. (2013). Data reuse and the open data citation advantage. PeerJ, 1.
doi:10.7717/peerj.175• Sears, J. R. (2011). Data Sharing Effect on Article Citation Rate in Paleoceanography. AGU Fall Meeting
Abstracts, 53, 1628.• Savage, C. J., & Vickers, A. J. (2009). Empirical Study of Data Sharing by Authors Publishing in PLoS Journals.
PLoS ONE, 4(9), e7078. doi:10.1371/journal.pone.0007078• Vines, T. H., Albert, A. Y. K., Andrew, R. L., Debarre, F., Bock, D. G., Franklin, M. T., … Rennison, D. J. (2014).
The Availability of Research Data Declines Rapidly with Article Age. Current Biology, 24(1), 94–97. doi:10.1016/j.cub.2013.11.014
• Vines, T. H., Andrew, R. L., Bock, D. G., Franklin, M. T., Gilbert, K. J., Kane, N. C., … Yeaman, S. (2013). Mandated data archiving greatly improves access to research data. The FASEB Journal, 27(4), 1304–1308. doi:10.1096/fj.12-218164
• Wicherts, J. M., & Bakker, M. (2012). Publish (your data) or (let the data) perish! Why not publish your data too? Intelligence, 40(2), 73–76. doi:10.1016/j.intell.2012.01.004
• Wicherts, J. M., Borsboom, D., Kats, J., & Molenaar, D. (2006). The poor availability of psychological research data for reanalysis. The American Psychologist, 61(7), 726–728. doi:10.1037/0003-066X.61.7.726